#### Problem Statement:
The objective of this notebook is to analyze transaction data to uncover associations between items using association rule mining techniques. Specifically, we aim to identify which products are frequently purchased together and derive insights that can inform inventory management, marketing strategies, and store layout decisions. By using metrics such as support, confidence, lift, and leverage, we seek to understand the strength and significance of these associations.

### Overall Summary:

1. Data Preparation and Cleaning:

The dataset was preprocessed to ensure it was suitable for analysis. This involved cleaning the data, converting transaction records into a format appropriate for association rule mining, and performing necessary transformations.

2. Frequent Itemsets Generation:

Using the Apriori algorithm, frequent itemsets were generated. These itemsets represent groups of items that frequently appear together in the transaction data.
Support values were calculated for these itemsets, indicating the proportion of transactions that contain the itemset.

3. Association Rule Mining:

Association rules were generated from the frequent itemsets, focusing on rules with a single antecedent to simplify the analysis.
Metrics such as support, confidence, lift, leverage, and conviction were calculated for each rule to measure the strength and significance of the associations.
The top association rules were identified based on confidence, showing which items are most likely to be purchased together.

4. Key Findings on Mineral Water:

Detailed analysis was performed on rules with mineral water as the antecedent to understand its relationship with other items.
It was found that mineral water has positive associations with several items, including cereals, red wine, avocado, honey, and salmon.
Lift values above 1 for these rules indicate that purchasing mineral water increases the likelihood of purchasing these items compared to random chance.
Confidence levels suggest that a certain percentage of transactions containing mineral water also include these items.

5. Implications for Retail Strategy:

The insights derived from the analysis can be applied to optimize product placement, enhance cross-promotion strategies, and target marketing efforts.
For example, placing mineral water near cereals and salmon, or creating promotional bundles including these items, can potentially increase sales.
Understanding these associations helps in making informed decisions on inventory management and store layout to enhance the customer shopping experience and drive sales.

#### Conclusion:
The notebook effectively demonstrates the use of association rule mining to uncover meaningful patterns in transaction data. By focusing on practical applications of these insights, such as product placement and promotional strategies, businesses can leverage these findings to optimize operations and increase profitability. The detailed analysis of mineral water serves as an example of how specific product associations can be used to inform strategic decisions in a retail context.

In [28]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py
from mlxtend.frequent_patterns import apriori,association_rules

### Loading datasets


In [29]:
marketing_data = pd.read_csv("50_SupermarketBranches.csv")
customer_data = pd.read_csv("Supermarket_CustomerMembers.csv")
df_products = pd.read_csv("Market_Basket_Optimisation.csv",header=None)

### Displaying first few rows of each dataset for initial exploration


In [30]:
marketing_data.head()

Unnamed: 0,Advertisement Spend,Promotion Spend,Administration Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94


In [31]:
customer_data.head()

Unnamed: 0,CustomerID,Genre,Age,Annual Income (k$),Spending Score (1-100)
0,1,Male,19,15,39
1,2,Male,21,15,81
2,3,Female,20,16,6
3,4,Female,23,16,77
4,5,Female,31,17,40


In [32]:
df_products.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [33]:
# Getting information about the structure and data types of the market basket dataset

df_products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Data columns (total 20 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       7501 non-null   object
 1   1       5747 non-null   object
 2   2       4389 non-null   object
 3   3       3345 non-null   object
 4   4       2529 non-null   object
 5   5       1864 non-null   object
 6   6       1369 non-null   object
 7   7       981 non-null    object
 8   8       654 non-null    object
 9   9       395 non-null    object
 10  10      256 non-null    object
 11  11      154 non-null    object
 12  12      87 non-null     object
 13  13      47 non-null     object
 14  14      25 non-null     object
 15  15      8 non-null      object
 16  16      4 non-null      object
 17  17      4 non-null      object
 18  18      3 non-null      object
 19  19      1 non-null      object
dtypes: object(20)
memory usage: 1.1+ MB


In [34]:
# Visualizing profit and advertisement spend by state using a grouped bar chart
State = marketing_data.State.unique()
x0 = marketing_data['Profit']
x1 = marketing_data['Advertisement Spend']

fig = go.Figure(data=[
    go.Bar(name='Profit', x=State, y=x0),
    go.Bar(name='Advertisement', x=State, y=x1)
])
fig.update_layout(barmode='group')
fig.update_layout(
    title='Profit and Advertisement in every state',
    xaxis_title='State',
    yaxis_title='Amount ($)'
)
fig.show()

![Plot](advertisement.png)

General Observation:

All three states exhibit a successful pattern where advertisement spending contributes to higher profits.
The efficiency of advertisement spending seems to be relatively high across these states, with consistent positive outcomes in terms of profit generation.

In [35]:
# Getting information about the structure and data types of the customer dataset

customer_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   CustomerID              200 non-null    int64 
 1   Genre                   200 non-null    object
 2   Age                     200 non-null    int64 
 3   Annual Income (k$)      200 non-null    int64 
 4   Spending Score (1-100)  200 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 7.9+ KB


In [36]:
# Visualizing gender ratio using a histogram

fig2 = go.Figure()
fig2.add_trace(go.Histogram(histfunc="sum", x=customer_data['Genre']))
fig2.update_layout(
    title='Gender ratio',
    xaxis_title='Gender',
    yaxis_title='Count'
)

![Plot](gender.png)

The key observation from this graph would be that females tend to shop more than males

In [37]:
# Visualizing relationships between age, spending score using scatter plot

fig = px.scatter(customer_data, x="Age", y="Spending Score (1-100)", color= "Age")
fig.update_layout(
    title='Relationship between Age and Spending score',
    dragmode='select',
    width=1000,
    height=600,
    hovermode='closest',
    xaxis_title='Age')

fig.show()

![Plot](agevspending.png)

Younger Age Group (20-30 years):

- Spending scores vary widely, from 0 to 100.
- Indicates diverse spending behavior among younger individuals.

Middle Age Group (30-50 years):

- Spending scores start to stabilize, with many scores clustering around 40-60.
- Less variation compared to the younger age group.

Older Age Group (50-70 years):

- Spending scores generally range between 20 and 60.
- Indicates more conservative spending habits among older individuals.

Summary:

- Younger individuals (20-30) exhibit diverse spending behaviors.
- Middle-aged individuals (30-50) tend to have moderate spending scores.
- Older individuals (50-70) show more conservative spending habits.

These insights can help in targeting different age groups for marketing and financial planning.

In [38]:
# Visualizing relationships between age, income using scatter plot

fig = px.scatter(customer_data, x="Age", y="Annual Income (k$)", color= "Age")
fig.update_layout(
    title='Relationship between Age and Income',
    dragmode='select',
    width=1000,
    height=600,
    hovermode='closest',
    xaxis_title='Age')

fig.show()

![Plot](AgevIncome.png)

From the above graph, we can extract the follwing key findings:

- There is a high variation in income found in the age range of 20 to30 years, indicating a mix of entry-level jobs, career starters, and some possibly advanced young professionals.

- High-income individuals (above 100k) are predominantly found in the age range of 30 to 60 years, with a few exceptions. This suggests that high earning potential is most commonly achieved during these years.

- We can see a slight reduction in income variation and maximum income in age groups 60 to 70 years, possibly due to retirement or transition into less demanding roles

In [39]:
# Visualizing relationships between spending score and income using scatter plot

fig = px.scatter(customer_data, x="Spending Score (1-100)", y="Annual Income (k$)", color= "Age")
fig.update_layout(
    title='Relationship between Spending Score (1-100) and Income, color-coded with age',
    dragmode='select',
    width=1000,
    height=600,
    hovermode='closest',
    xaxis_title='Age')

fig.show()


![Plot](spendingvsincome.png)

Spending behaviour
- Low Income (0-40k): Individuals with lower incomes have a broad range of spending scores, suggesting varying spending habits. Some individuals with lower incomes have high spending scores, while others have low spending scores.
- Middle Income (40k-80k): This group has a dense cluster of spending scores between 40 and 60. It indicates a concentration of moderate spending behavior in this income range.
- High Income (80k and above): Higher-income individuals also exhibit a wide range of spending scores. Many high-income earners have spending scores around 40-60

Age and spending behaviour

- Younger Age Group (20-40 years): Younger individuals (indicated by darker blue to purple colors) have a broad distribution of spending scores across all income levels. There is no consistent pattern linking age directly to spending scores.
- Middle Age Group (40-60 years): Middle-aged individuals (yellow to orange colors) are densely clustered around the middle income levels (40k-80k) and show moderate spending scores (40-60). This group tends to have balanced spending habits.
- Older Age Group (60+ years): Older individuals (light yellow to white colors) also show varied spending scores. However, there is a noticeable concentration in the moderate spending score range (40-60) across various income levels.

There is also notable clustering of individuals with incomes between 60k and 80k and spending scores between 40 and 60, indicating a common spending behavior among middle-income earners.

### Transforming the market basket dataset into a format suitable for association rule mining

In [40]:
# Convert the market basket dataset into a list of lists

products = df_products.values.tolist()

In [41]:
# Define column names for the new DataFrame
name_col = ['ID_client', 'item_description']


In [42]:

# Initialize an empty DataFrame with specified column names
df_prod = pd.DataFrame(columns=name_col)

The code below processes each transaction in the market basket dataset:- 

- Creates a new DataFrame `df_prod` where each row represents a unique association between a customer (identified by `ID_client`) and an item they purchased (`item_description`). 

- The loop ensures that each unique item purchased by a customer is recorded in the DataFrame. 

In [43]:
# Iterate over each transaction (list of items) in the market basket dataset
for i in range(len(products)):
    # Create a set of unique items bought in the current transaction
    buy_list = set(products[i])
    
    # Iterate over each unique item in the transaction
    for j in buy_list:
        # Get the current number of rows in the DataFrame
        n = len(df_prod.index)
        
        # Assign the index of the current transaction to the 'ID_client' column
        df_prod.loc[n, 'ID_client'] = i
        
        # Assign the current item description to the 'item_description' column
        df_prod.loc[n, 'item_description'] = j

In [44]:
df_prod[df_prod['ID_client'] == 111] 

Unnamed: 0,ID_client,item_description
535,111,burgers
536,111,
537,111,oil
538,111,tomato juice
539,111,fresh bread


Below, I am creating a DataFrame to represent a transaction dataset where each row corresponds to a transaction, each column corresponds to a unique product, and the entries represent the count of each product in the respective transactions. The last column 'Count_products' indicates the total number of products in each transaction. The descriptive statistics are then displayed for this DataFrame

In [45]:
# Creating a set to hold unique products
list_products = set()

In [46]:
# Iterating over each transaction (list of products) in the dataset
for i in range(len(products)):
    # Union operation to combine the unique products in the current transaction with the existing set of products
    list_products = set(products[i]) | list_products

In [47]:
print("Total number of unique products:", len(list_products))

Total number of unique products: 121


In [48]:
# Converting the set of unique products into a list and appending a new element 'Count_products' into the list
list_products = list(list_products)
list_products.append('Count_products')

In [49]:
# Creating a zero-filled numpy array with dimensions (number of transactions, number of unique products + 1)
aa = np.zeros((len(products), len(list_products)))
aa

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [50]:
# Creating a DataFrame from the numpy array with column names as unique products
df_prod2 = pd.DataFrame(aa, columns=list_products)

In [51]:
# Iterating over each transaction (excluding the last one) in the dataset
for i in range(len(products) - 1):
    # Creating a set of unique products in the current transaction
    buy_list = set(products[i])
    
    # Iterating over each unique product (excluding the last one) in the list of unique products
    for j in range(len(list_products) - 1):
        # Iterating over each product in the buy list
        for k in buy_list:
            # Checking if the current unique product matches the product in the buy list
            if list_products[j] == k:
                # Incrementing the count of the product in the corresponding cell of the DataFrame
                df_prod2.iloc[i, j] = 1 + df_prod2.iloc[i, j]
      
      # Calculating the total count of products purchased in the current transaction and storing it in the last column
    df_prod2.iloc[i, len(list_products) - 1] = df_prod2.iloc[i, 0:len(list_products) - 2].sum()

In [52]:
df_prod2

Unnamed: 0,zucchini,burgers,olive oil,green grapes,sandwich,babies food,burger sauce,brownies,soda,grated cheese,...,escalope,water spray,tomato sauce,pasta,tea,chutney,strong cheese,butter,low fat yogurt,Count_products
0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,19.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,3.0
7497,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0
7498,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
7499,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0


In [53]:
# Calculate the sum of occurrences for each item across all transactions
item_counts = df_prod2.drop(columns=['Count_products']).sum()

# Creating a DataFrame for visualization
df_item_counts = pd.DataFrame({'Item': item_counts.index, 'Count': item_counts.values})

df_item_counts

Unnamed: 0,Item,Count
0,zucchini,71.0
1,burgers,654.0
2,olive oil,494.0
3,green grapes,68.0
4,sandwich,34.0
...,...,...
116,tea,29.0
117,chutney,31.0
118,strong cheese,58.0
119,butter,226.0


In [54]:
df_prod

Unnamed: 0,ID_client,item_description
0,0,olive oil
1,0,green grapes
2,0,energy drink
3,0,salmon
4,0,cottage cheese
...,...,...
36853,7500,
36854,7500,eggs
36855,7500,yogurt cake
36856,7500,frozen smoothie


We are generating an interactive histogram that shows the distribution of items purchased in the supermarket based on their descriptions. Each bar in the histogram represents a unique item description, and the bars are ordered in descending order based on the total count of occurrences of each item description.

In [57]:
fig6 = px.histogram(df_prod, x="item_description", color="item_description").update_xaxes(categoryorder='total descending')
fig6.update_layout(
    title='Items purchased in the supermarket',
    dragmode='select',
    width=1300,
    height=600,
    hovermode='closest',
)
fig6.show()

![Plot](Itemsranking.png)

Insights: 
- Basic Necessities: Items like mineral water, eggs, and milk are essential for daily consumption and nutrition.
- Convenience and Versatility: Products like spaghetti and French fries are convenient and can be used in various dishes.
- Health and Indulgence: Items like green tea cater to health-conscious consumers, while chocolate serves as an indulgence.

Understanding these reasons can help supermarkets optimize their stock and marketing strategies to cater to consumer preferences effectively.

# Now, lets dive into Association rule mining 

To understand association rules, it is necessary to understand four fundamental concepts:

- Support: Support is an indication of how frequently the itemset appears in the dataset. In other words, this is an indication of how popular an itemset is in a dataset.

- Confidence: Confidence is an indication of how often the rule has been found to be true. In other words, confidence says how likely item Y is purchased when item X is purchased.

- Lift: Lift is a metric to measure the ratio of X and Y occur together to X and Y occurrence if they were statistically independent. In other words, lift illustrates how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is.

A Lift score that is close to 1 indicates that the antecedent and the consequent are independent and occurrence of antecedent has no impact on occurrence of consequent.

A Lift score that is bigger than 1 indicates that the antecedent and consequent are dependent to each other, and the occurrence of antecedent has a positive impact on occurrence of consequent.

A Lift score that is smaller than 1 indicates that the antecedent and the consequent are substitute each other that means the existence of antecedent has a negative impact to consequent or visa versa.

Conviction: Conviction measures the implication strength of the rule from statistical independence Conviction score is a ratio between the probability that X occurs without Y while they were dependent and the actual probability of X existence without Y.

In [59]:
# Dropping the count column for association rule mining

df_prod2.drop(columns = ['Count_products'],inplace = True)

#### Assigning min_support=0.01: 

The minimum support threshold, which determines the minimum frequency required for an itemset to be considered frequent. Here, it's set to 0.01, meaning itemsets must appear in at least 1% of transactions to be considered frequent.

In [60]:
# Finding frequent itemsets using the Apriori algorithm

freq_item = apriori(df_prod2, min_support=0.01, use_colnames=True)
freq_item['length'] = freq_item['itemsets'].apply(lambda x: len(x))
freq_item


DataFrames with non-bool types result in worse computationalperformance and their support might be discontinued in the future.Please use a DataFrame with bool type



Unnamed: 0,support,itemsets,length
0,0.087188,(burgers),1
1,0.065858,(olive oil),1
2,0.033729,(brownies),1
3,0.052393,(grated cheese),1
4,0.063192,(frozen smoothie),1
...,...,...,...
252,0.013998,"(mineral water, milk, chocolate)",3
253,0.011065,"(mineral water, milk, frozen vegetables)",3
254,0.010132,"(mineral water, ground beef, eggs)",3
255,0.010932,"(mineral water, ground beef, chocolate)",3


These results show the frequent itemsets discovered by the Apriori algorithm. Each row represents an itemset, with the following columns:

- support: The support of the itemset, which is the proportion of transactions that contain the itemset.
- itemsets: The set of items that form the itemset.
- length: The number of items in the itemset.

In [61]:
# Displaying frequent itemsets of length 2 with support greater than or equal to 0.04

freq_item[ (freq_item['length'] == 2) & (freq_item['support'] >= 0.04) ]

Unnamed: 0,support,itemsets,length
151,0.059725,"(mineral water, spaghetti)",2
170,0.047994,"(mineral water, milk)",2
176,0.040928,"(mineral water, ground beef)",2
178,0.050927,"(mineral water, eggs)",2
180,0.05266,"(mineral water, chocolate)",2


In [62]:
# Displaying frequent itemsets of length 3

freq_item[ (freq_item['length'] == 3)]

Unnamed: 0,support,itemsets,length
240,0.010265,"(mineral water, olive oil, spaghetti)",3
241,0.010132,"(mineral water, french fries, spaghetti)",3
242,0.015731,"(mineral water, spaghetti, milk)",3
243,0.011465,"(mineral water, pancakes, spaghetti)",3
244,0.017064,"(mineral water, ground beef, spaghetti)",3
245,0.014265,"(mineral water, spaghetti, eggs)",3
246,0.015865,"(mineral water, spaghetti, chocolate)",3
247,0.011998,"(mineral water, spaghetti, frozen vegetables)",3
248,0.010932,"(milk, spaghetti, chocolate)",3
249,0.010532,"(eggs, spaghetti, chocolate)",3


In [63]:
# Generating association rules with lift metric

rules = association_rules(freq_item, metric="lift", min_threshold=1.3)
rules["antecedents_length"] = rules["antecedents"].apply(lambda x: len(x))
rules["consequents_length"] = rules["consequents"].apply(lambda x: len(x))



In the context of association rule mining, confidence measures the reliability of a rule. Specifically, it quantifies the likelihood that the consequent (right-hand side) of a rule will be found in a transaction given that the antecedent (left-hand side) is present.

Mathematically, confidence is defined as:

confidence(A→B)=support(A∪B)÷support(A)

​
 

Where:


A→B represents the association rule with antecedent A and consequent B.

support(A∪B) is the support of the combined occurrence of A and B.

support(A) is the support of the occurrence of A.

In simpler terms, confidence tells us how often items in the consequent of a rule appear in transactions that contain the items in the antecedent.

For example, if we have the association rule 

milk→bread with a confidence of 0.7, it means that when milk is purchased, bread is also purchased in 70% of the transactions where milk is present.

In the context of the provided code, confidence is one of the metrics used to evaluate the strength of association rules generated from frequent itemsets. Sorting the rules by confidence in descending order allows us to identify the rules with the highest confidence, indicating the strongest relationships between items.


In [64]:

# Sorting rules by confidence in descending order and displaying the top 15 rules

rules.sort_values("confidence",ascending=False).head(15)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric,antecedents_length,consequents_length
290,"(ground beef, eggs)",(mineral water),0.019997,0.238368,0.010132,0.506667,2.125563,0.005365,1.543848,0.540342,2,1
266,"(ground beef, milk)",(mineral water),0.021997,0.238368,0.011065,0.50303,2.110308,0.005822,1.532552,0.537969,2,1
296,"(ground beef, chocolate)",(mineral water),0.023064,0.238368,0.010932,0.473988,1.988472,0.005434,1.447937,0.508837,2,1
284,"(milk, frozen vegetables)",(mineral water),0.023597,0.238368,0.011065,0.468927,1.967236,0.00544,1.434136,0.503555,2,1
119,(soup),(mineral water),0.050527,0.238368,0.023064,0.456464,1.914955,0.01102,1.401255,0.503221,1,1
224,"(pancakes, spaghetti)",(mineral water),0.025197,0.238368,0.011465,0.455026,1.908923,0.005459,1.397557,0.488452,2,1
208,"(olive oil, spaghetti)",(mineral water),0.02293,0.238368,0.010265,0.447674,1.878079,0.004799,1.378954,0.478514,2,1
218,"(spaghetti, milk)",(mineral water),0.035462,0.238368,0.015731,0.443609,1.861024,0.007278,1.368879,0.479672,2,1
278,"(chocolate, milk)",(mineral water),0.032129,0.238368,0.013998,0.435685,1.82778,0.00634,1.349656,0.467922,2,1
230,"(ground beef, spaghetti)",(mineral water),0.039195,0.238368,0.017064,0.435374,1.826477,0.007722,1.348914,0.470957,2,1


#### This result represents association rules along with various metrics associated with them. Let's analyze some of the key metrics:

- Antecedents: The items on the left-hand side of the rule.
- Consequents: The items on the right-hand side of the rule.
- Antecedent Support: The proportion of transactions that contain the antecedents.
- Consequent Support: The proportion of transactions that contain the consequents.
- Support: The proportion of transactions that contain both the antecedents and the consequents.
- Confidence: The likelihood that the consequents are purchased given that the antecedents are purchased. It's calculated as the support of (antecedents ∩ consequents) divided by the support of the antecedents.
- Lift: The ratio of observed support to the expected support if the antecedents and consequents were independent. A lift greater than 1 indicates that the antecedents and consequents are more likely to occur together.
- Leverage: The difference between the observed support and the expected support if the antecedents and consequents were independent. It measures the difference in the proportion of transactions that contain both antecedents and consequents compared to what would be expected if they were independent.
- Conviction: Measures the dependency of the consequents on the antecedents. A high conviction value indicates strong dependency.
- Zhang's Metric: A metric to evaluate the strength of association rules. It combines lift and conviction to provide a more comprehensive measure of rule strength.
- Antecedents Length: The number of items in the antecedents.
- Consequents Length: The number of items in the consequents.

#### For example, let's take the first rule:

    Antecedents: (eggs, ground beef)
    Consequents: (mineral water)
    Support: 0.010132
    Confidence: 0.506667
    Lift: 2.125563
    Leverage: 0.005365
    Conviction: 1.543848
    Zhang's Metric: 0.540342
    Antecedents Length: 2
    Consequents Length: 1

- This rule indicates that if both eggs and ground beef are purchased together, there's a 50.67% chance that mineral water will also be purchased. 
- The lift of 2.125563 suggests that eggs and ground beef are about 2.13 times more likely to be purchased together with mineral water than if they were purchased independently.

In [66]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric,antecedents_length,consequents_length
0,(french fries),(burgers),0.170911,0.087188,0.021997,0.128705,1.476173,0.007096,1.047650,0.389069,1,1
1,(burgers),(french fries),0.087188,0.170911,0.021997,0.252294,1.476173,0.007096,1.108844,0.353384,1,1
2,(cake),(burgers),0.081056,0.087188,0.011465,0.141447,1.622319,0.004398,1.063198,0.417434,1,1
3,(burgers),(cake),0.087188,0.081056,0.011465,0.131498,1.622319,0.004398,1.058080,0.420238,1,1
4,(burgers),(spaghetti),0.087188,0.174110,0.021464,0.246177,1.413918,0.006283,1.095602,0.320707,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...
301,"(mineral water, chocolate)",(eggs),0.052660,0.179576,0.013465,0.255696,1.423888,0.004008,1.102270,0.314246,2,1
302,"(chocolate, eggs)",(mineral water),0.033196,0.238368,0.013465,0.405622,1.701663,0.005552,1.281394,0.426498,2,1
303,(mineral water),"(chocolate, eggs)",0.238368,0.033196,0.013465,0.056488,1.701663,0.005552,1.024687,0.541390,1,2
304,(eggs),"(mineral water, chocolate)",0.179576,0.052660,0.013465,0.074981,1.423888,0.004008,1.024131,0.362858,1,2


In [65]:
# Visualizing association rules using a scatter plot

fig7 = px.scatter(rules, x ='support', y =  'lift', color = 'confidence')
fig7.update_layout(
    title='Lift vs Support',
    dragmode='select',
    width=1000,
    height=600,
    hovermode='closest',
)
fig7.show()

![Plot](liftvsSupport.png)

### Insights: 

#### Lift and Support Distribution:

The majority of the points are clustered in the lower left corner of the plot, indicating that most rules have low support and low to moderate lift.
There are a few rules with higher support values, but these are less common.
High Lift and Low Support:

There are several rules with high lift values (>2) and low support (<0.02). These rules indicate strong associations between items that are not frequently purchased together.
Moderate Lift and Support:

Many rules have a lift between 1.5 and 2 and support values ranging from 0.01 to 0.03. These rules suggest moderate associations and are more common.
Color Representation of Confidence:

The color gradient represents the confidence of the rules, with yellow indicating higher confidence and blue indicating lower confidence.
Rules with higher confidence tend to be more spread out, though there are some high-confidence rules even in the regions with lower support and lift.

### Outliers:

There are a few outliers with very high lift (>3) and relatively low support. These might be interesting to investigate further as they indicate strong but rare associations.


Now, we are going to dig deep and focus on finding rules that have:

-  range of support between 0.02 and 0.05 and have lift values greater than 1.5 with confidence greater than 0.2
-  outliers with very high lift (>3) and relatively low support.

In [71]:
# Focusing on the data points (rules) that fall in the range of support between 0.02 and 0.05 and have lift values greater than 1.5 with confidence greater than 0.2

support_threshold_min = 0.02
support_threshold_max = 0.05
lift_threshold = 1.5
confidence_threshold = 0.2

filtered_rules = rules[
    (rules['support'] >= support_threshold_min) & 
    (rules['support'] <= support_threshold_max) & 
    (rules['lift'] > lift_threshold) & 
    (rules['confidence'] > confidence_threshold)
]

# Sort by confidence for better prioritization
sorted_rules = filtered_rules.sort_values(by='confidence', ascending=False)

# Display the top rules
sorted_rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric,antecedents_length,consequents_length
119,(soup),(mineral water),0.050527,0.238368,0.023064,0.456464,1.914955,0.01102,1.401255,0.503221,1,1
19,(olive oil),(mineral water),0.065858,0.238368,0.027596,0.419028,1.757904,0.011898,1.310962,0.461536,1,1
125,(ground beef),(mineral water),0.098254,0.238368,0.040928,0.416554,1.747522,0.017507,1.305401,0.474369,1,1
100,(ground beef),(spaghetti),0.098254,0.17411,0.039195,0.398915,2.291162,0.022088,1.373997,0.624943,1,1
113,(cooking oil),(mineral water),0.05106,0.238368,0.020131,0.394256,1.653978,0.00796,1.257349,0.416672,1,1


## Key Insights

### Strong Association Between Items and Mineral Water:

- Soup and mineral water have a high lift (1.91) and confidence (0.46), indicating that customers who buy soup are very likely to also buy mineral water.
- Olive oil and mineral water have a lift of 1.76 and confidence of 0.42, showing a strong association.
- Ground beef and mineral water also show a strong relationship with a lift of 1.75 and confidence of 0.42.
- Cooking oil and mineral water have a lift of 1.65 and confidence of 0.39.

### Strong Association Between Ground Beef and Spaghetti:

Ground beef and spaghetti have a lift of 2.29 and confidence of 0.40, indicating that customers who buy ground beef are very likely to also buy spaghetti.

## Recommendations

### Promotions

Bundling discounts for items like soup and mineral water, or ground beef and spaghetti. Targeting promotions for olive oil, mineral water, and cooking oil.

### Placement
Placing mineral water near soup, olive oil, and ground beef. Positioning spaghetti near ground beef.

### Marketing
Highlighting these pairs in campaigns to boost sales and attract customers. Using these associations in ads and promotions.

## Inventory Management:

Ensuring that these items are well-stocked, especially when promoting them together, to avoid stockouts and maximize sales.

### Detailed Analysis of Each Rule

Soup -> Mineral Water:

- Support: 0.023 (2.3% of transactions)
- Confidence: 0.456 (45.6% of soup transactions also include mineral water)
- Lift: 1.91 (Strong positive association)

Olive Oil -> Mineral Water:

- Support: 0.027 (2.7% of transactions)
- Confidence: 0.419 (41.9% of olive oil transactions also include mineral water)
- Lift: 1.76 (Strong positive association)

Ground Beef -> Mineral Water:

- Support: 0.041 (4.1% of transactions)
- Confidence: 0.417 (41.7% of ground beef transactions also include mineral water)
- Lift: 1.75 (Strong positive association)

Ground Beef -> Spaghetti:

- Support: 0.039 (3.9% of transactions)
- Confidence: 0.399 (39.9% of ground beef transactions also include spaghetti)
- Lift: 2.29 (Very strong positive association)

Cooking Oil -> Mineral Water:

- Support: 0.020 (2.0% of transactions)
- Confidence: 0.394 (39.4% of cooking oil transactions also include mineral water)
- Lift: 1.65 (Strong positive association)

By focusing on these insights and implementing the recommendations, we can leverage improve customer satisfaction. 

In [73]:
# outliers with very high lift (>3) and relatively low support

lift_threshold = 3
support_threshold = 0.02

outliers_df = rules[(rules['lift'] > lift_threshold) & (rules['support'] < support_threshold)]

outliers_df

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric,antecedents_length,consequents_length
180,(ground beef),(herb & pepper),0.098254,0.04946,0.015998,0.162822,3.291994,0.011138,1.13541,0.772094,1,1
181,(herb & pepper),(ground beef),0.04946,0.098254,0.015998,0.32345,3.291994,0.011138,1.33286,0.73246,1,1


## Key Insights

### Ground Beef -> Herb & Pepper:

- Support: 0.015998 (1.6% of transactions)
- Confidence: 0.162822 (16.28% of ground beef transactions also include herb & pepper)
- Lift: 3.291994 (Ground beef and herb & pepper are 3.29 times more likely to be bought together than by random chance)

### Herb & Pepper -> Ground Beef:

- Support: 0.015998 (1.6% of transactions)
- Confidence: 0.323450 (32.34% of herb & pepper transactions also include ground beef)
- Lift: 3.291994 (Herb & pepper and ground beef are 3.29 times more likely to be bought together than by random chance)

## Implications

### Strong Niche Association

Ground beef and herb & pepper show a strong but rare association. When one is bought, the other is highly likely to be purchased too.

### Actions

- Promotions: Bundle ground beef and herb & pepper with discounts and recipe suggestions.
- Store Layout: Place these items near each other.
- Stock Management: Keep both items well-stocked, especially during promotions.


These strategies can boost sales and improve customer experience by leveraging their strong association.

In [221]:
# Displaying association rules of length 1 sorted by confidence in ascending order

rules[ (rules['antecedents_length'] == 1) ].sort_values("confidence",ascending=False).head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric,antecedents_length,consequents_length
18,(soup),(mineral water),0.050527,0.238368,0.023064,0.456464,1.914955,0.01102,1.401255,0.503221,1,1
83,(olive oil),(mineral water),0.065858,0.238368,0.027596,0.419028,1.757904,0.011898,1.310962,0.461536,1,1
74,(ground beef),(mineral water),0.098254,0.238368,0.040928,0.416554,1.747522,0.017507,1.305401,0.474369,1,1
70,(salmon),(mineral water),0.042528,0.238368,0.017064,0.401254,1.683336,0.006927,1.272045,0.423972,1,1
64,(cereals),(mineral water),0.02573,0.238368,0.010265,0.398964,1.673729,0.004132,1.267198,0.413162,1,1
175,(ground beef),(spaghetti),0.098254,0.17411,0.039195,0.398915,2.291162,0.022088,1.373997,0.624943,1,1
68,(cooking oil),(mineral water),0.05106,0.238368,0.020131,0.394256,1.653978,0.00796,1.257349,0.416672,1,1
86,(red wine),(mineral water),0.02813,0.238368,0.010932,0.388626,1.630358,0.004227,1.24577,0.397829,1,1
56,(chicken),(mineral water),0.059992,0.238368,0.022797,0.38,1.594172,0.008497,1.228438,0.396502,1,1
50,(frozen vegetables),(mineral water),0.095321,0.238368,0.035729,0.374825,1.572463,0.013007,1.21827,0.402413,1,1


Interpretation of the Result above:
The results show the top 10 association rules with single-item antecedents, sorted by confidence in descending order. For instance, the first rule: <br>
`(soup) => (mineral water)`

- Confidence: 45.65% <br>
This means that 45.65% of the transactions that include soup also include mineral water.
- Lift: 1.91 <br>
This indicates that purchasing soup makes it 1.91 times more likely that mineral water will also be purchased compared to what we would expect if the items were independent.
- Support: 2.31% <br>
About 2.31% of all transactions contain both soup and mineral water.

In [222]:
# Displaying association rules of length 1 for both antecedents and consequents, sorted by confidence in ascending order

rules[ (rules['antecedents_length'] == 1) & (rules['consequents_length'] == 1) ].sort_values("confidence",ascending=True).head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric,antecedents_length,consequents_length
65,(mineral water),(cereals),0.238368,0.02573,0.010265,0.043065,1.673729,0.004132,1.018115,0.528512,1,1
87,(mineral water),(red wine),0.238368,0.02813,0.010932,0.045861,1.630358,0.004227,1.018584,0.507644,1,1
54,(mineral water),(avocado),0.238368,0.033329,0.011598,0.048658,1.459926,0.003654,1.016113,0.41363,1,1
184,(spaghetti),(red wine),0.17411,0.02813,0.010265,0.058959,2.095966,0.005368,1.032761,0.633127,1,1
53,(mineral water),(honey),0.238368,0.04746,0.015065,0.063199,1.331619,0.003752,1.016801,0.326975,1,1
173,(chocolate),(salmon),0.163845,0.042528,0.010665,0.065094,1.530617,0.003697,1.024137,0.414599,1,1
107,(spaghetti),(honey),0.17411,0.04746,0.011865,0.068147,1.435873,0.003602,1.0222,0.367554,1,1
128,(eggs),(herb & pepper),0.179576,0.04946,0.012532,0.069785,1.41093,0.00365,1.021849,0.354997,1,1
139,(chocolate),(champagne),0.163845,0.046794,0.011598,0.070789,1.512793,0.003932,1.025824,0.405392,1,1
71,(mineral water),(salmon),0.238368,0.042528,0.017064,0.071588,1.683336,0.006927,1.031302,0.532989,1,1


Interpretation of the Result above:
The results show the top 10 association rules with single-item antecedents as well as single-item consequents, sorted by confidence in descending order. For instance, the first rule: <br>
`(mineral water) => (cereals)`

- Confidence: 4.31% <br>
This means that 4.31% of the transactions that include mineral water also include cereals.
- Lift: 1.67 <br>
This indicates that purchasing mineral water makes it 1.67 times more likely that cereals will also be purchased compared to what we would expect if the items were independent.
- Support: 1.03% <br>
About 1.03% of all transactions contain both mineral water and cereals.

In [223]:
mineral_water_rules = rules[rules['antecedents'].apply(lambda x: 'mineral water' in x)]
mineral_water_rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric,antecedents_length,consequents_length
3,(mineral water),(pancakes),0.238368,0.095054,0.033729,0.141499,1.488616,0.011071,1.0541,0.430963,1,1
19,(mineral water),(soup),0.238368,0.050527,0.023064,0.096756,1.914955,0.01102,1.051182,0.62733,1,1
27,(mineral water),(cake),0.238368,0.081056,0.027463,0.115213,1.421397,0.008142,1.038604,0.389252,1,1
51,(mineral water),(frozen vegetables),0.238368,0.095321,0.035729,0.149888,1.572463,0.013007,1.064189,0.477993,1,1
53,(mineral water),(honey),0.238368,0.04746,0.015065,0.063199,1.331619,0.003752,1.016801,0.326975,1,1
54,(mineral water),(avocado),0.238368,0.033329,0.011598,0.048658,1.459926,0.003654,1.016113,0.41363,1,1
57,(mineral water),(chicken),0.238368,0.059992,0.022797,0.095638,1.594172,0.008497,1.039415,0.489364,1,1
58,(mineral water),(shrimp),0.238368,0.071457,0.023597,0.098993,1.385352,0.006564,1.030562,0.365218,1,1
60,(mineral water),(herb & pepper),0.238368,0.04946,0.017064,0.071588,1.447397,0.005275,1.023835,0.405845,1,1
63,(mineral water),(whole wheat rice),0.238368,0.058526,0.020131,0.084452,1.442993,0.00618,1.028318,0.403076,1,1


#### Summary of Relationships with Mineral Water:

- Frequent Co-Purchases: Mineral water is frequently purchased along with items such as cereals, red wine, avocado, honey, and salmon.
- Lift Values: All lift values are greater than 1, indicating a positive association. The highest lift is with cereals (1.67) and salmon (1.68), showing a stronger likelihood of these items being purchased together with mineral water.
- Confidence Levels: The confidence values indicate the likelihood of these items being purchased together. The highest confidence is with salmon (7.16%) and honey (6.32%).

#### Implications for Retailers:
- Product Placement: Retailers should consider placing mineral water near these items (cereals, red wine, avocado, honey, and salmon) to encourage additional purchases.
- Cross-Promotions: Retailers can create promotional bundles that include mineral water and these associated items to increase sales.
- Targeted Marketing: Marketing campaigns can highlight these combinations, suggesting recipes or meal ideas that include these items with mineral water.

Overall, mineral water has significant positive associations with several items, indicating that customers who buy mineral water are more likely to buy these other items as well. This information can be leveraged for strategic product placement, promotions, and marketing efforts to boost sales.


# Conclusion:

### Strong Niche Association

- Ground Beef -> Herb & Pepper: 1.6% support, 16.28% confidence, 3.29 lift.
- Herb & Pepper -> Ground Beef: 1.6% support, 32.34% confidence, 3.29 lift.

Implications: Indicating a rare but strong association. When one is bought, the other is highly likely to be purchased too.

### Strong Associations with Mineral Water

- Soup and Mineral Water: 2.3% support, 45.6% confidence, 1.91 lift.
- Olive Oil and Mineral Water: 2.7% support, 41.9% confidence, 1.76 lift.
- Ground Beef and Mineral Water: 4.1% support, 41.7% confidence, 1.75 lift.
- Cooking Oil and Mineral Water: 2.0% support, 39.4% confidence, 1.65 lift.
- Ground Beef and Spaghetti: 3.9% support, 39.9% confidence, 2.29 lift.

## Recommendations

- Creating bundle discounts for items like soup and mineral water, or ground beef and spaghetti.
Including recipe suggestions for bundles like ground beef and herb & pepper.

- Placing mineral water near soup, olive oil, and ground beef.
Positioning spaghetti near ground beef to encourage joint purchases. 

- Highlighting these pairs in campaigns to attract customers and boost sales.
Using these associations in ads and promotions to drive sales of complementary items.

- Ensuring these items are well-stocked, especially during promotions, to avoid stockouts and maximize sales.
