<a href="https://colab.research.google.com/github/mns017/Association-Rule-Mining-for-Retail-Transactions/blob/main/Association_Rule_Mining_for_Retail_Transactions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)


# **Step 1: Import Required Libraries and Suppress Warnings**
**Explanation:**

In this step, we import all the necessary Python libraries required for data manipulation and association rule mining.


*   pandas is used for data loading and preprocessing.
*   mlxtend.frequent_patterns provides implementations of the Apriori algorithm and association rule generation.
*   Warnings related to deprecated functions are suppressed to keep the notebook output clean and readable.

This step ensures that the environment is properly set up before data analysis begins.

In [None]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder


# **Step 2: Load the Dataset**
**Explanation:**

The Online Retail Transaction dataset is loaded into a pandas DataFrame.
This dataset contains detailed transactional records from an online retail store, including invoice numbers, product descriptions, quantities, customer details, and country information.

Loading the dataset allows us to inspect its structure and verify that the data has been imported correctly before applying any preprocessing steps.

In [None]:
df = pd.read_csv("/content/Online_Retail.csv")
df.head()


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6.0,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6.0,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8.0,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6.0,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6.0,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


# **Step 3: Data Cleaning and Preprocessing**
**Explanation:**

Real‑world datasets often contain noisy or invalid records.
In this step, data cleaning is performed to ensure accurate analysis:


*   Canceled transactions (Invoice numbers starting with ‘C’) are removed because they do not represent actual purchases.
*   Rows with missing product descriptions or invoice numbers are dropped.
*   Transactions with zero or negative quantities are removed, as they do not represent valid purchases.




This step ensures that only genuine and meaningful transactions are used for market basket analysis.




In [None]:
df = df[~df['InvoiceNo'].astype(str).str.startswith('C')]

df = df.dropna(subset=['Description', 'InvoiceNo'])

df = df[df['Quantity'] > 0]

df.shape


(333323, 8)

# **Step 4: Create the Transaction–Item (Basket) Matrix**
**Explanation:**

Association rule mining requires data in a basket format, where:

*   Each row represents a transaction (InvoiceNo)
*   Each column represents a product
*   Values indicate whether a product was purchased (1) or not (0)












In [None]:
basket = (
    df.groupby(['InvoiceNo', 'Description'])['Quantity']
      .sum()
      .unstack()
      .fillna(0)
)

basket = basket.applymap(lambda x: 1 if x > 0 else 0)
basket.head()


  basket = basket.applymap(lambda x: 1 if x > 0 else 0)


Description,4 PURPLE FLOCK DINNER CANDLES,50'S CHRISTMAS GIFT BAG LARGE,DOLLY GIRL BEAKER,I LOVE LONDON MINI BACKPACK,OVAL WALL MIRROR DIAMANTE,RED SPOT GIFT BAG LARGE,SET 2 TEA TOWELS I LOVE LONDON,SPACEBOY BABY GIFT SET,TOADSTOOL BEDSIDE LIGHT,TRELLIS COAT RACK,...,incorrectly credited C550456 see 47,mailout,mailout,on cargo order,rcvd be air temp fix for dotcom sit,returned,taig adjust,test,to push order througha s stock was,wrongly sold (22719) barcode
InvoiceNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
536365,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536366,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536367,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536368,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536369,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# **Step 5: Generate Frequent Itemsets Using Apriori**
**Explanation:**

The Apriori algorithm is applied to the basket matrix to identify frequent itemsets — groups of products that appear together frequently in transactions.

In [None]:
frequent_itemsets = apriori(
    basket,
    min_support=0.02,
    use_colnames=True
)

frequent_itemsets['itemsets'].apply(len).value_counts()


Unnamed: 0_level_0,count
itemsets,Unnamed: 1_level_1
1,282
2,96
3,6


# **Step 6: Generate Association Rules**
**Explanation:**

From the frequent itemsets, association rules are generated to identify relationships between products.

In [None]:
rules = association_rules(
    frequent_itemsets,
    metric="lift",
    min_threshold=1
)

rules = rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]
rules.head()


Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(PACK OF 72 RETROSPOT CAKE CASES),(60 TEATIME FAIRY CAKE CASES),0.025883,0.357884,7.669186
1,(60 TEATIME FAIRY CAKE CASES),(PACK OF 72 RETROSPOT CAKE CASES),0.025883,0.554662,7.669186
2,(ALARM CLOCK BAKELIKE GREEN),(ALARM CLOCK BAKELIKE RED ),0.026033,0.611993,12.397043
3,(ALARM CLOCK BAKELIKE RED ),(ALARM CLOCK BAKELIKE GREEN),0.026033,0.527356,12.397043
4,(ALARM CLOCK BAKELIKE PINK),(ALARM CLOCK BAKELIKE RED ),0.022057,0.609959,12.355831


# **Step 7: Extract and Rank the Top 10 Association Rules**
**Explanation:**

In the final step, association rules are sorted based on:

1.   Lift
2.   Confidence
3.   Support

The top 10 strongest rules are selected, representing the most significant product associations in the dataset.

In [None]:
top_10_rules = rules.sort_values(
    by=['lift', 'confidence', 'support'],
    ascending=False
).head(10)

top_10_rules


Unnamed: 0,antecedents,consequents,support,confidence,lift
185,(SET/6 RED SPOTTY PAPER CUPS),(SET/6 RED SPOTTY PAPER PLATES),0.021532,0.817664,25.704342
184,(SET/6 RED SPOTTY PAPER PLATES),(SET/6 RED SPOTTY PAPER CUPS),0.021532,0.676887,25.704342
197,(PINK REGENCY TEACUP AND SAUCER),"(GREEN REGENCY TEACUP AND SAUCER, ROSES REGENC...",0.02941,0.720588,16.617164
192,"(GREEN REGENCY TEACUP AND SAUCER, ROSES REGENC...",(PINK REGENCY TEACUP AND SAUCER),0.02941,0.678201,16.617164
195,(GREEN REGENCY TEACUP AND SAUCER),"(PINK REGENCY TEACUP AND SAUCER, ROSES REGENCY...",0.02941,0.511749,15.900016
194,"(PINK REGENCY TEACUP AND SAUCER, ROSES REGENCY...",(GREEN REGENCY TEACUP AND SAUCER),0.02941,0.913753,15.900016
193,"(GREEN REGENCY TEACUP AND SAUCER, PINK REGENCY...",(ROSES REGENCY TEACUP AND SAUCER ),0.02941,0.86918,14.536129
196,(ROSES REGENCY TEACUP AND SAUCER ),"(GREEN REGENCY TEACUP AND SAUCER, PINK REGENCY...",0.02941,0.491844,14.536129
31,(PINK REGENCY TEACUP AND SAUCER),(GREEN REGENCY TEACUP AND SAUCER),0.033836,0.829044,14.426017
30,(GREEN REGENCY TEACUP AND SAUCER),(PINK REGENCY TEACUP AND SAUCER),0.033836,0.588773,14.426017


**Conclusion:**

Association rule mining using the Apriori algorithm revealed strong co‑purchase patterns in the Online Retail dataset. Several product pairs exhibited high confidence and lift values, indicating meaningful associations. These insights can be used for product bundling, cross‑selling strategies, and recommendation systems to improve retail sales performance.