## <font color='#2F4F4F'>1. Defining the Question</font>

### a) Specifying the Data Analysis Question

Perform market basket analysis to help the store maximize revenue by identifying which items should be bundled together

### b) Defining the Metric for Success

The project will be a success when we are able to find association of itemsets with more than 0.3 Confidence and Lift greater than 1

### c) Understanding the Context 

<p>Care five is a German multinational retail corporation headquartered in Berlin, Germany. It is the eighth-largest retailer in the world by revenue. It operates a chain of hypermarkets, groceries stores, and convenience stores, which as of January 2021, comprises its 1,200 stores in over 30 countries.
</p><p>As a Data analyst working for one of the stores, you must perform market basket
analysis to help the store maximize revenue. More specifically, your task will analyze transactional data to identify the top 10 products likely to be purchased together.</p>

### d) Recording the Experimental Design

1. Load datasets and libraries
2. Perform data preprocessing
3. Find frequent itemsets
4. Generate association rules
5. Perform metric interpretation and outline findings
6. Provide Recommendations
7. Challenge the solution

### e) Data Relevance

The dataset provided is appropriate and relevant to the research question.

## <font color='#2F4F4F'>Step 2. Data Importation</font>

In [1]:
# Import the required libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder

In [2]:
# Importing our dataset
# ---
items_df = pd.read_csv('https://bit.ly/30A2gHO')
print(items_df.shape)
items_df.head()

(15001, 5)


Unnamed: 0,A,Quantity,Transaction,Store,Product
0,30000,2,93194,6,Magazine
1,30001,2,93194,6,Candy Bar
2,30002,2,93194,6,Candy Bar
3,30003,2,93194,6,Candy Bar
4,30004,2,93194,6,Candy Bar


## <font color='#2F4F4F'>Step 3. Data preprocessing</font>

In [3]:
# Step 1: Data processing 
# ---
# We group the dataframe by Transaction and Product and display the count of items
# ---
items_count = items_df.groupby(["Transaction","Product"]).size().reset_index(name="Count")
print(items_count.shape)
items_count.head()

(9629, 3)


Unnamed: 0,Transaction,Product,Count
0,93194,Candy Bar,4
1,93194,Magazine,1
2,93197,Pencils,1
3,93200,Candy Bar,3
4,93200,Magazine,1


In [4]:
# Step 1: Data processing 
# ---
# Then we consolidate the items into one transaction per row 
# with each item one-hot encoded.
# ---
#
transactions_df = (items_count.groupby(["Transaction","Product"])['Count']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Transaction'))
print(transactions_df.shape)
transactions_df.head()

(6726, 17)


Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0.0,4.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93200,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93206,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [5]:
# Step 1: Data processing
# ---
# We then use a custom encoding function to convert 
# all the values to 0 or 1. 
# The Apriori algorithm will only take 0's or 1's.
# ---
# 
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

In [6]:
encoded_transactions = transactions_df.applymap(encode_units)
print(encoded_transactions.shape)
encoded_transactions.head()

(6726, 17)


Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93197,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
93200,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93206,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0
93212,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


## <font color='#2F4F4F'>Step 4. Frequent itemsets</font>

In [7]:
# Step 2: Generating frequent itemsets
# ---
# We'll generate the most frequent itemsets by using apriori function() 
# pass the parameters: 
# ---
# encoded_transactions - our transactional dataset
# min_support = 0.01 - We set minimum-support threshold at 1%
# use_colnames = True to display the column names in our itemset columns.
# If you set use_colnames = False the itemsets will be shown in indices.
# ---
# 
shop_frequent_itemsets = apriori(encoded_transactions, min_support=0.01, use_colnames=True)
print(shop_frequent_itemsets.shape)
shop_frequent_itemsets.head()

(39, 2)


Unnamed: 0,support,itemsets
0,0.051591,(Bow)
1,0.175736,(Candy Bar)
2,0.15284,(Greeting Cards)
3,0.231936,(Magazine)
4,0.020071,(Pain Reliever)


## <font color='#2F4F4F'>Step 5. Association rules</font>

In [8]:
# Step 3: Finding the association rules
shop_rules = association_rules(shop_frequent_itemsets, metric="lift", min_threshold=1)

# Sorting 
shop_rules.sort_values("confidence", ascending = False, inplace = True)

# Previewing the associative rules
print(shop_rules.shape)
shop_rules.head(10)

(62, 9)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
44,"(Pencils, Toothpaste)",(Candy Bar),0.022748,0.175736,0.011002,0.48366,2.752198,0.007005,1.596359
20,"(Greeting Cards, Magazine)",(Candy Bar),0.037467,0.175736,0.017247,0.460317,2.61937,0.010662,1.527313
38,"(Toothpaste, Magazine)",(Candy Bar),0.029884,0.175736,0.013232,0.442786,2.51961,0.007981,1.47926
26,"(Toothpaste, Greeting Cards)",(Candy Bar),0.033304,0.175736,0.01457,0.4375,2.48953,0.008718,1.465358
21,"(Candy Bar, Magazine)",(Greeting Cards),0.039994,0.15284,0.017247,0.431227,2.821431,0.011134,1.489452
50,"(Pencils, Magazine)",(Greeting Cards),0.028546,0.15284,0.012043,0.421875,2.760244,0.00768,1.465358
51,"(Pencils, Greeting Cards)",(Magazine),0.029884,0.231936,0.012043,0.402985,1.737486,0.005112,1.286508
22,"(Candy Bar, Greeting Cards)",(Magazine),0.04609,0.231936,0.017247,0.374194,1.61335,0.006557,1.227319
56,"(Toothpaste, Magazine)",(Greeting Cards),0.029884,0.15284,0.011151,0.373134,2.441344,0.006583,1.351422
32,"(Pencils, Magazine)",(Candy Bar),0.028546,0.175736,0.010407,0.364583,2.074609,0.005391,1.297202


## <font color='#2F4F4F'>Step 6. Metric interpretation and Findings </font>

**Observation**
* The lift value of 2 and above (greater than 1) shows that the purchase of Candy Bar is highly influenced by the purchase of the following sets of items rather than the purchase of Candy Bar being independent:
> <li> Toothpaste and Pencils </li>
> <li> Magazine and Greeting Cards </li>
> <li> Magazine and Toothpaste </li>
> <li> Toothpaste and Greeting Cards </li>
> <li> Magazine and Pencils </li>
* The above combinations lift the purchase of Candy Bar by more than double.
* Purchase of Greeting Cards is highly influenced by purchase of the following combinations of items:
> <li> Magazine and Candy Bar </li>
> <li> Magazine and Pencils </li>
> <li> Magazine and Toothpaste </li>
* The above combinations lift the purchase of Greeting Cards by more than double.

* Magazine is also not independently purchase but its sales is highly influenced by purchase of the following combination of items:
> <li> Pencils and Greeting Cards </li>
> <li> Greeting Cards and Candy Bar </li>
* The above combinations lift the purchase of Magazine by more than 1.5 times.

## <font color='#2F4F4F'>Step 7. Recommendations </font>

<p> Therefore, we can conclude that there is indeed evidence to suggest that the purchase of the following items go hand in hand:
<li> Toothpaste </li>
<li> Pencils </li>
<li> Magazine </li>
<li> Greeting Cards </li>
<li> Candy Bar </li>
</p>
<p> Care Five Supermarket should consider stocking the above items next to one another, the staff in the store should also be trained to cross-sell these items, knowing that customers are more likely to purchase them together, thereby increasing the supermarket's revenue.
</p>

## <font color='#2F4F4F'>Step 8. Challenging our Solution</font>

#### a) Did we have the right question?
Yes.

#### b) Did we have the right data?
Yes.

#### c) What can be done to improve the solution?
Using more data to uncover more insights that can help increase the revenues