# ASSOCIATION RULES

### Dataset:


Use the Online retail dataset to apply the association rules.


#### Pre-process -

- Pre-processthe dataset to ensure it is suitable for Association rules, this may include handling missing values, removing duplicates, and converting the data to appropriate format. 

#### Association Rule Mining:

• Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.


• Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.



• Set appropriate threshold for support, confidence and lift to extract meaning full rules.


#### Analysis and Interpretation :



• Analyse the generated rules to identify interesting patterns and relationships between the products.



• Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules.



 #### Insights into Customer (Passenger) Behavior

### Pre-process the Data

In [1]:
import pandas as pd


# Load the dataset
df = pd.read_excel('Online Retail.xlsx')

# Display basic information about the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7500 entries, 0 to 7499
Data columns (total 1 columns):
 #   Column                                                                                                                                                                                                                           Non-Null Count  Dtype 
---  ------                                                                                                                                                                                                                           --------------  ----- 
 0   shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil  7500 non-null   object
dtypes: object(1)
memory usage: 58.7+ KB


In [2]:
df.head(7500)

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt
...,...
7495,"butter,light mayo,fresh bread"
7496,"burgers,frozen vegetables,eggs,french fries,ma..."
7497,chicken
7498,"escalope,green tea"


In [3]:
df.columns = ['Transaction']


In [4]:
df.head()

Unnamed: 0,Transaction
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt


In [5]:
# Transform the data into a list of lists
transactions = df['Transaction'].apply(lambda x: x.split(','))

In [6]:
transactions.head()

0                           [burgers, meatballs, eggs]
1                                            [chutney]
2                                    [turkey, avocado]
3    [mineral water, milk, energy bar, whole wheat ...
4                                     [low fat yogurt]
Name: Transaction, dtype: object

### Association Rule Mining -

In [7]:
!pip install mlxtend



In [8]:
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Initialize the transaction encoder
te = TransactionEncoder()

# Fit and transform the transaction data
te_ary = te.fit(transactions).transform(transactions)

# Convert the transaction data to a DataFrame
df = pd.DataFrame(te_ary, columns=te.columns_)

# Apply the Apriori algorithm to find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

# Display the first few rules
rules.head()


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(avocado),(mineral water),0.0332,0.238267,0.011467,0.345382,1.449559,0.003556,1.163629,0.320785
1,(mineral water),(avocado),0.238267,0.0332,0.011467,0.048125,1.449559,0.003556,1.01568,0.407144
2,(cake),(burgers),0.081067,0.0872,0.011467,0.141447,1.622103,0.004398,1.063185,0.417349
3,(burgers),(cake),0.0872,0.081067,0.011467,0.131498,1.622103,0.004398,1.058068,0.420154
4,(chocolate),(burgers),0.163867,0.0872,0.017067,0.10415,1.194377,0.002777,1.01892,0.194639


### Analysis and Interpretation

In [9]:
# Filter the rules based on support, confidence, and lift
rules = rules[(rules['support'] >= 0.01) & (rules['confidence'] >= 0.5) & (rules['lift'] >= 1)]

# Sort the rules by confidence in descending order
rules = rules.sort_values('confidence', ascending=False)

# Display the first few rules
rules.head()

# Interpret the results
for index, rule in rules.iterrows():
    print(f"Rule: {rule['antecedents']} -> {rule['consequents']}")
    print(f"Support: {rule['support']}")
    print(f"Confidence: {rule['confidence']}")
    print(f"Lift: {rule['lift']}")
    print("====================================")


Rule: frozenset({'ground beef', 'eggs'}) -> frozenset({'mineral water'})
Support: 0.010133333333333333
Confidence: 0.5066666666666666
Lift: 2.1264689423614995
Rule: frozenset({'ground beef', 'milk'}) -> frozenset({'mineral water'})
Support: 0.011066666666666667
Confidence: 0.503030303030303
Lift: 2.1112072035407237
