# <font color='#2F4F4F'>1. Defining the Problem</font>

### a) Background and Problem Statement

Care five is a German multinational retail corporation headquartered in Berlin, Germany.
It is the eighth-largest retailer in the world by revenue. It operates a chain of hypermarkets, groceries stores, and convenience stores, which as of January 2021, comprises its 12,00 stores in over 30 countries.

As a Data analyst working for one of the stores, perform market basket
analysis to help the store maximize revenue. 

More specifically, the task is to analyze transactional data to identify the top 10 products likely to be purchased together.



### b) Understanding the Context 
A dataset is provided and it contains the transactional data of products sold in the past week.

Analysis will be carried out on the transactional data and top 10 products likely to be purchased together identified.



### c) Defining the Metric for Success

The metrics will be finding an association of itemsets with more than 0.3 Confidence, and Lift greater than 1

### d) Recording the Experimental Design
* Perform data importation and loading
* Perform data preprocessing
* Find frequent itemsets
* Generate association rules
* Perform metric interpretation and provide recommendation

# <font color='#2F4F4F'>2. Data Importation and Loading </font>

In [1]:
# Import the required libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [2]:
basket_df = pd.read_csv("https://bit.ly/30A2gHO")
basket_df.head()

Unnamed: 0,A,Quantity,Transaction,Store,Product
0,30000,2,93194,6,Magazine
1,30001,2,93194,6,Candy Bar
2,30002,2,93194,6,Candy Bar
3,30003,2,93194,6,Candy Bar
4,30004,2,93194,6,Candy Bar


In [3]:
basket_df.shape

(15001, 5)

the transactional data has 5 columns, and 15001 rows of data

# <font color='#2F4F4F'>3. Data Preprocessing</font>

In [4]:
# Grouping the basket dataframe by Transaction and Product, and displaying the count of the items

grouped_basket = basket_df.groupby(['Transaction','Product']).size().reset_index(name='Count')
grouped_basket.head()

Unnamed: 0,Transaction,Product,Count
0,93194,Candy Bar,4
1,93194,Magazine,1
2,93197,Pencils,1
3,93200,Candy Bar,3
4,93200,Magazine,1


In [7]:
# Consolidating the items into one transaction per row and then one-hot-encode each item
basket_df2 = (grouped_basket.groupby(['Transaction', 'Product'])['Count']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Transaction'))

basket_df2.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0.0,4.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93200,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93206,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [8]:
# Create a custom encoding function to convert all the values to 0 or 1. 

def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

#calling the function

basket_df3 = basket_df2.applymap(encode_units)
basket_df3.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93197,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
93200,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93206,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0
93212,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


# <font color='#2F4F4F'>4. Find the Frequent Itemsets</font>

In [9]:
#Generate the frequent itemsets

shop_frequent_itemsets = apriori(basket_df3, min_support=0.01, use_colnames=True)

#view the output
shop_frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.051591,(Bow)
1,0.175736,(Candy Bar)
2,0.15284,(Greeting Cards)
3,0.231936,(Magazine)
4,0.020071,(Pain Reliever)


# <font color='#2F4F4F'>5. Generating the Association Rules</font>

In [13]:
#Finding the association rules
shop_rules = association_rules(shop_frequent_itemsets, metric="lift", min_threshold=1)

# Sorting 
shop_rules.sort_values("confidence", ascending = False, inplace = True)

# Previewing the associative rules
shop_rules.head(18)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
46,"(Toothpaste, Pencils)",(Candy Bar),0.022748,0.175736,0.011002,0.48366,2.752198,0.007005,1.596359
22,"(Greeting Cards, Magazine)",(Candy Bar),0.037467,0.175736,0.017247,0.460317,2.61937,0.010662,1.527313
40,"(Toothpaste, Magazine)",(Candy Bar),0.029884,0.175736,0.013232,0.442786,2.51961,0.007981,1.47926
28,"(Greeting Cards, Toothpaste)",(Candy Bar),0.033304,0.175736,0.01457,0.4375,2.48953,0.008718,1.465358
21,"(Candy Bar, Magazine)",(Greeting Cards),0.039994,0.15284,0.017247,0.431227,2.821431,0.011134,1.489452
52,"(Pencils, Magazine)",(Greeting Cards),0.028546,0.15284,0.012043,0.421875,2.760244,0.00768,1.465358
50,"(Greeting Cards, Pencils)",(Magazine),0.029884,0.231936,0.012043,0.402985,1.737486,0.005112,1.286508
20,"(Candy Bar, Greeting Cards)",(Magazine),0.04609,0.231936,0.017247,0.374194,1.61335,0.006557,1.227319
58,"(Toothpaste, Magazine)",(Greeting Cards),0.029884,0.15284,0.011151,0.373134,2.441344,0.006583,1.351422
34,"(Pencils, Magazine)",(Candy Bar),0.028546,0.175736,0.010407,0.364583,2.074609,0.005391,1.297202


**Observation**


* The output above shows the Top 18 itemsets sorted by **confidence value >0.3** and all these itemsets have **lift value > 1**




# <font color='#2F4F4F'>6. Metric Interpretation</font>

**Observations made using the first item set**

* The first itemset shows the association rule "if Pencils, Toothpaste then Candy Bar" with support value at 0.011002 means nearly 1.1% of all transactions have this combination of {Pencils, Toothpaste} and Candy Bar bought together. 

* We also have 48% confidence that candy bar sales happen whenever Toothpaste and Pencils are purchased. 

* The lift value of 2.75 (greater than 1) shows that the purchase of candy bars is indeed influenced by the purchase of toothpaste and pencils rather than candybars' purchase being independent of toothpaste and pencils purchases. 

* The lift value of 2.75 means that toothpaste and pencils purchase lifts the candybars' purchase by 2.75 times.

**Other observations made **
* there is a strong association between {Candy Bar, Magazine} and {Greeting Cards} as we can see their lift value = 2.821431


# <font color='#2F4F4F'>7. Recommendations</font>

Based on the above observations, we can conclude that there is indeed evidence to suggest that the purchase of {Toothpaste, Pencils} leads to the purchase of Candy Bar.

* Care Five should consider bundling  {Toothpaste, Pencils} and  Candy Bar together in all the hypermarkets, groceries stores, and convenience stores.

* The staff in all the hypermarkets, groceries stores, and convenience stores should also be trained to cross-sell candybars to customers who purchase {Toothpaste, Pencils} , as they are more likely to purchase them together, thereby increasing the store's revenue.