## ASSOCIATION RULES

### Objective
The Objective of this assignment is to introduce students to rule mining techniques, particularly focusing on market basket analysis and provide hands on experience.

In [3]:
#Import Libraries
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import association_rules,apriori
import warnings
warnings.filterwarnings('ignore')
from mlxtend.preprocessing import TransactionEncoder

In [13]:
df = pd.read_csv(r"https://raw.githubusercontent.com/rohitmaind/ExcelR_Assignments/main/Datasets/Online-retail.csv")
df

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt
...,...
7495,"butter,light mayo,fresh bread"
7496,"burgers,frozen vegetables,eggs,french fries,ma..."
7497,chicken
7498,"escalope,green tea"


In [26]:
#Data Preprocessing:
df.dtypes

Unnamed: 0,0
"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil",object


In [28]:
# Display the first few rows of the dataset
print(df.head())

# Display the column names and data types
print(df.info())

  shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0                             burgers,meatballs,eggs                                                                                                                                                                             
1                                            chutney                                                                                                                                                                             
2                                     turkey,avocado                                                                                                                                                                             
3  mineral water,milk,energy bar,whole wheat rice...                                            

In [29]:
# Data Preprocessing
# Split the items in each transaction into a list
df['Transaction'] = df.iloc[:, 0].apply(lambda x: x.split(','))

In [30]:

# Apply TransactionEncoder to transform the list of items into a one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit_transform(df['Transaction'])
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)

In [31]:
# Implement Apriori Algorithm
frequent_itemsets = apriori(df_encoded, min_support=0.001, use_colnames=True)


In [32]:
# Generate the rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)


In [33]:
# Sort rules by lift
rules = rules.sort_values(by='lift', ascending=False)


In [34]:
# Display top 10 rules
print(rules.head(10))

                                          antecedents  \
33927                           (pasta, french fries)   
33930                (mushroom cream sauce, escalope)   
33773                                  (eggs, shrimp)   
33776                          (pasta, mineral water)   
33929            (mushroom cream sauce, french fries)   
33928                               (pasta, escalope)   
33933                          (mushroom cream sauce)   
33924                 (pasta, french fries, escalope)   
33932                                         (pasta)   
33925  (mushroom cream sauce, french fries, escalope)   

                                          consequents  antecedent support  \
33927                (mushroom cream sauce, escalope)            0.003067   
33930                           (pasta, french fries)            0.005733   
33773                          (pasta, mineral water)            0.014133   
33776                                  (eggs, shrimp)           

### Interview Questions:
1. What is lift and why is it important in Association rules?
Lift measures the strength of an association rule by comparing the observed co-occurrence of items to their expected co-occurrence if they were independent. It is important because a lift value greater than 1 indicates a positive association, while a value less than 1 indicates a negative association.

2. What is support and confidence? How do you calculate them?
Support: This measure gives an idea of how frequent an itemset is in all the transactions.
Support(X,Y)= Transactions containing both X and Y / Total number of transactions
Confidence: This measure defines the likeliness of occurrence of consequent in the cart given that the cart already has the antecedents.
Confidence(X,Y)= Transactions containing both X and Y / Transactions containing X

3. What are some limitations or challenges of Association rule mining?
Association rule mining can generate an overwhelming number of rules, many of which may be trivial or redundant. It can struggle with handling large datasets efficiently, leading to high computational costs. Additionally, it may not capture complex relationships beyond simple co-occurrences.

## Conclusion
The Apriori algorithm successfully identified frequent itemsets and generated association rules. Key findings include:

1. **Strong Associations**: Rules with high lift values indicate strong relationships between items.
2. **Useful Metrics**: Support measures frequency, while confidence measures likelihood of the consequent given the antecedent.
3. **Challenges**: The algorithm can produce many rules, which may be overwhelming and computationally intensive.

Recommendations include focusing on actionable rules and optimizing for large datasets.