# **ASSOCIATION RULES** #

### Dataset:

Use the Online retail dataset to apply the association rules.

### Data Preprocessing:

Pre-process the dataset to ensure it is suitable for Association rules, this may include handling missing values, removing duplicates, and converting the data to appropriate format.  

### Association Rule Mining:

•	Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.

•	 Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.

•	Set appropriate threshold for support, confidence and lift to extract meaning full rules.

### Analysis and Interpretation:

•	Analyse the generated rules to identify interesting patterns and relationships between the products.

•	Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules.


importing the libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

loading and reading the dataset

In [3]:
df=pd.read_excel("F:\DATA SCIENCE_ExcelR\Assignments\Association Rules\Online retail.xlsx",header=None)
df

Unnamed: 0,0
0,"shrimp,almonds,avocado,vegetables mix,green gr..."
1,"burgers,meatballs,eggs"
2,chutney
3,"turkey,avocado"
4,"mineral water,milk,energy bar,whole wheat rice..."
...,...
7496,"butter,light mayo,fresh bread"
7497,"burgers,frozen vegetables,eggs,french fries,ma..."
7498,chicken
7499,"escalope,green tea"


In [8]:
df.duplicated().sum()

2325

In [18]:
df_new=df.drop_duplicates()

In [20]:
df_new.shape

(5176, 1)

checking for null values

In [22]:
df_new.isnull().sum()

0    0
dtype: int64

no null values

In [25]:
df_new.dropna(inplace=True)

we are dropping the null values if present 

In [28]:
df_new.head()

Unnamed: 0,0
0,"shrimp,almonds,avocado,vegetables mix,green gr..."
1,"burgers,meatballs,eggs"
2,chutney
3,"turkey,avocado"
4,"mineral water,milk,energy bar,whole wheat rice..."


In [32]:
df_new.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5176 entries, 0 to 7500
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       5176 non-null   object
dtypes: object(1)
memory usage: 80.9+ KB


In [34]:
len(df_new)

5176

In [38]:
transactions=df_new[0].apply(lambda x: x.split(','))
transactions=transactions.tolist()

In [40]:
transactions[1]

['burgers', 'meatballs', 'eggs']

In [42]:
from mlxtend.preprocessing import TransactionEncoder

In [44]:
te=TransactionEncoder()
te_array=te.fit(transactions).transform(transactions)
basket=pd.DataFrame(te_array,columns=te.columns_)

In [46]:
basket = basket[basket.sum(axis=1) > 0]

In [48]:
basket.head()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,False,True,True,False,True,False,False,False,False,False,...,False,True,False,False,True,False,False,True,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,True,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False


In [50]:
basket_view = basket.astype(int)

In [52]:
basket_view.head()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,0,1,1,0,1,0,0,0,0,0,...,0,1,0,0,1,0,0,1,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,1,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0


In [54]:
from mlxtend.frequent_patterns import apriori,association_rules

In [56]:
frequent_items = apriori(basket, min_support=0.02, use_colnames=True)
frequent_items.sort_values(by='support', ascending=False).head()

Unnamed: 0,support,itemsets
37,0.299845,(mineral water)
52,0.229521,(spaghetti)
15,0.208076,(eggs)
11,0.205178,(chocolate)
19,0.19262,(french fries)


In [58]:
rules = association_rules(frequent_items, metric="lift", min_threshold=1)
rules.sort_values(by='lift', ascending=False).head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
122,(ground beef),(herb & pepper),0.135819,0.066461,0.022798,0.167852,2.525588,1.0,0.013771,1.121843,0.698989,0.127018,0.10861,0.255438
123,(herb & pepper),(ground beef),0.066461,0.135819,0.022798,0.343023,2.525588,1.0,0.013771,1.31539,0.647056,0.127018,0.239769,0.255438
206,"(spaghetti, mineral water)",(ground beef),0.085008,0.135819,0.02473,0.290909,2.141885,1.0,0.013184,1.218717,0.582651,0.126108,0.179465,0.236493
207,(ground beef),"(spaghetti, mineral water)",0.135819,0.085008,0.02473,0.182077,2.141885,1.0,0.013184,1.118678,0.61691,0.126108,0.106087,0.236493
114,(tomatoes),(frozen vegetables),0.091963,0.12983,0.022604,0.245798,1.893232,1.0,0.010665,1.153763,0.519585,0.113482,0.133271,0.209953


In [60]:
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head(10)

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(burgers),(chocolate),0.024536,0.21562,1.050892
1,(chocolate),(burgers),0.024536,0.119586,1.050892
2,(burgers),(eggs),0.036128,0.317487,1.525826
3,(eggs),(burgers),0.036128,0.17363,1.525826
4,(burgers),(french fries),0.029366,0.258065,1.339761
5,(french fries),(burgers),0.029366,0.152457,1.339761
6,(green tea),(burgers),0.024923,0.14726,1.29409
7,(burgers),(green tea),0.024923,0.219015,1.29409
8,(burgers),(milk),0.025502,0.224109,1.318166
9,(milk),(burgers),0.025502,0.15,1.318166


## Interview Questions:

#### 1.	What is lift and why is it important in Association rules?

Lift measures how strongly two items are associated compared to being independent.

Lift(A->B) = Support(A U B) / Support(A) * Support(B)

* Lift > 1: Positive association

* Lift = 1: Independent

* Lift < 1: Negative association


It helps find meaningful relationships between items.

#### 2.	What is support and Confidence. How do you calculate them?

Support: Frequency of both items together.

Support(A->B) = Transactions with A and B / Total transactions

Confidence: Probability of buying B when A is bought.

Confidence(A->B) = Support(A U B) / Support (B)

#### 3.	What are some limitations or challenges of Association rules mining?

* Generates too many rules.

* May find trivial or unimportant patterns.

* Hard to choose proper support/confidence thresholds.

* Doesn’t show causation or order of events.
