# Market Basket Analysis of Grocery Transactions Using Apriori



---



<p align="justify">## A market basket analysis of grocery transactions is performed using the Apriori algorithm. The analysis aims to identify relationships and patterns between frequently purchased items. After loading and processing the dataset, frequent itemsets and association rules are generated, and the results are presented through a Gradio interface.



---



In [1]:
!pip install gradio



In [2]:
!pip install mlxtend



## Import Libraries

In [3]:
import gradio as gr
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

## Load the Dataset

In [4]:
df = pd.read_csv('/content/drive/MyDrive/datasets/groceries_dataset.csv')

In [5]:
df.head()

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38765 entries, 0 to 38764
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Member_number    38765 non-null  int64 
 1   Date             38765 non-null  object
 2   itemDescription  38765 non-null  object
dtypes: int64(1), object(2)
memory usage: 908.7+ KB


## Group Items by Transaction

In [7]:
transactions = df.groupby(['Member_number', 'Date'])['itemDescription'].apply(list).tolist()

## Encode Transactions

In [8]:
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)

In [9]:
df_encoded.head()

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,True,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


## Generate Frequent Itemsets and Rules

In [16]:
frequent_itemsets = apriori(df_encoded, min_support=0.001, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.1, num_itemsets=3)

In [17]:
rules['antecedents'] = rules['antecedents'].apply(lambda x : set(map(str.lower,x)))
rules['consequents'] = rules['consequents'].apply(lambda x : set(map(str.lower,x)))

In [18]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,{uht-milk},{other vegetables},0.021386,0.122101,0.002139,0.100000,0.818993,1.0,-0.000473,0.975443,-0.184234,0.015130,-0.025175,0.058758
1,{uht-milk},{whole milk},0.021386,0.157923,0.002540,0.118750,0.751949,1.0,-0.000838,0.955549,-0.252105,0.014367,-0.046519,0.067416
2,{beef},{whole milk},0.033950,0.157923,0.004678,0.137795,0.872548,1.0,-0.000683,0.976656,-0.131343,0.024991,-0.023902,0.083709
3,{berries},{other vegetables},0.021787,0.122101,0.002673,0.122699,1.004899,1.0,0.000013,1.000682,0.004984,0.018930,0.000681,0.072297
4,{berries},{whole milk},0.021787,0.157923,0.002272,0.104294,0.660414,1.0,-0.001168,0.940127,-0.344543,0.012806,-0.063686,0.059341
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125,"{whole milk, sausage}",{soda},0.008955,0.097106,0.001069,0.119403,1.229612,1.0,0.000200,1.025320,0.188423,0.010185,0.024695,0.065207
126,"{soda, sausage}",{whole milk},0.005948,0.157923,0.001069,0.179775,1.138374,1.0,0.000130,1.026642,0.122281,0.006568,0.025951,0.093273
127,"{whole milk, yogurt}",{sausage},0.011161,0.060349,0.001470,0.131737,2.182917,1.0,0.000797,1.082219,0.548014,0.020992,0.075973,0.078050
128,"{whole milk, sausage}",{yogurt},0.008955,0.085879,0.001470,0.164179,1.911760,1.0,0.000701,1.093681,0.481231,0.015748,0.085657,0.090650


In [21]:
def generate_market_basket_rules(items):

  item_list = [item.strip().lower() for item in items.split(",")] if items else []

  if item_list:
    filter_rules = rules[rules['antecedents'].apply(lambda x: set(item_list).issubset(x)) & (rules['lift'] > 1)]
  else:
    filter_rules = rules[rules['lift'] > 1]

  if filter_rules.empty:
    return pd.DataFrame([{
            'antecedents': 'No matching rules',
            'consequents': 'No matching rules',
            'support': '',
            'confidence': '',
            'lift': ''
        }])

  result = filter_rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].copy()
  result['antecedents'] = result['antecedents'].apply(lambda x: ', '.join(list(x)))
  result['consequents'] = result['consequents'].apply(lambda x: ', '.join(list(x)))

  return result.head(10)

In [23]:
app_apriori = gr.Interface(fn = generate_market_basket_rules,
                           inputs = gr.Textbox(lines=1,placeholder='Enter Items separated by, (eg:"Tea,Sugar")'),
                           outputs = 'dataframe',
                           title = 'Market Basket Analysis',
                           description = 'Please enter the item number to see those items that are matching'
                           )

app_apriori.launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://ebf0779fea2fa28deb.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


