# Product Bundling and Recommendation

## Product Bundling

![Example Product Bundle. Credits: Burger King](https://cxl.com/wp-content/uploads/2018/10/burger-king-bundle.png)

<p style = 'text-align: center;'>Image Credits: Burger King</p>

Product bundling refers to grouping products or services together for sale as one package, as illustrated in the image above.

In order to identify the products that are ideal for bundling, we will perform **market basket analysis** which is a data mining technique used to identify relationships between products that are frequently purchased together. In technical terms, _market basket analysis_ is **Association Rule Mining** whose goal is to identify rules that describe the likelihood of a product being purchased together with other products.

There is a number of algorithms that one can use to perform market basket analysis including:
- Apriori Algorithm
- AIS Algorithm
- SETM Algorithm
- FP Growth Algorithm

**Apriori algorithm** is a _popular_ algorithm in association rule mining and it _performs better_ than AIS and SETM algorithms. However, it is _computationally expensive_ when working on a large dataset therefore we shall use **FP Growth algorithm** which is an advancement of Apriori algorithm.

### Import Packages

In [1]:
# Module containing all libraries used
import src.dependencies as dep

# Module containing custom functions
import src.functions as fn

### Load the Dataset

The dataset is the transformed version obtained from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/dataset/502/online+retail+ii).

In [2]:
# Load data
df = dep.pd.read_csv('dataset/Transformed.csv')

# Confirm successful loading
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085,United Kingdom


### Preprocess the Data

Since the data is already transformed, we will proceed to extract the data we need to implement the FP Growth algorithm. The features of interest are:
- `Invoice`: It acts as the transaction identifier
- `StockCode`: It acts as the product identifier

In [3]:
# Get features
features_df = df[['Invoice', 'StockCode']]

# Rename them
features_df = features_df.rename(columns = {'Invoice': 'Transaction', 'StockCode': 'Products'})
features_df.head()

Unnamed: 0,Transaction,Products
0,489434,85048
1,489434,79323P
2,489434,79323W
3,489434,22041
4,489434,21232


Next, we have to group the items in a transaction, as a list, figuratively, place the products in the invoice 'cart'.

In [4]:
# Group the items
transactions_df = features_df.groupby('Transaction')['Products'].apply(list).reset_index()
transactions_df.head()

Unnamed: 0,Transaction,Products
0,489434,"[85048, 79323P, 79323W, 22041, 21232, 22064, 2..."
1,489435,"[22350, 22349, 22195, 22353]"
2,489436,"[48173C, 21755, 21754, 84879, 22119, 22142, 22..."
3,489437,"[22143, 22145, 22130, 21364, 21360, 21351, 213..."
4,489438,"[21329, 21252, 21100, 21033, 20711, 21410, 214..."


We will also convert the `Products` column into a list, making it a list of lists.

In [7]:
# Get the lists of products
transactions = transactions_df['Products'].tolist()

# Check the first 2 entries in the list
transactions[0:2]

[['85048', '79323P', '79323W', '22041', '21232', '22064', '21871', '21523'],
 ['22350', '22349', '22195', '22353']]

Finally, encode the transactions into a NumPy array.

In [8]:
# Encode
encoder = dep.TransactionEncoder()
encoded_transactions = encoder.fit(transactions).transform(transactions)

### Modeling

We will generate the frequent itemsets using `fpgrowth()` function. Since the function expects a one-hot encoded DataFrame, we will convert the encoded array into a dataframe first.

In [9]:
# Array to Dataframe
encoded_df = dep.pd.DataFrame(encoded_transactions, columns = encoder.columns_)
encoded_df.head()

Unnamed: 0,10002,10080,10109,10120,10123C,10123G,10124A,10124G,10125,10133,...,C2,CRUK,D,DOT,M,PADS,POST,SP1002,TEST001,TEST002
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


The syntax of the `fpgrowth()` function is:

**_fpgrowth(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0)_**

Where:
- _df_ - One-hot encoded DataFrame, our `encoded_df` DataFrame.
- _min_support_ - A float between 0 and 1 for minimum support of the itemsets returned, with support being a fraction of the number of transactions where the item occurs divided by total transactions. The default value is 0.5, but we will set it lower, say 0.01 due to the many products.
- _use_colnames_ - It is fault by default, yielding column indices but we are interested in the products identifiers, therefore, we shall set it to true.
- _max_len_ - It refers to the maximum length of the itemsets generated. We shall leave it as it is.
- _verbose_ - It shows the stages of conditional tree generation. We shall set it to 1 to view the stages.

In [19]:
# Generate Frequent itemsets
freq_itemsets = dep.fpgrowth(encoded_df, min_support = 0.01, use_colnames = True, verbose = 1)

465 itemset(s) from tree conditioned on items ()
1 itemset(s) from tree conditioned on items (21232)
0 itemset(s) from tree conditioned on items (21523)
0 itemset(s) from tree conditioned on items (21871)
0 itemset(s) from tree conditioned on items (22064)
0 itemset(s) from tree conditioned on items (85048)
0 itemset(s) from tree conditioned on items (22195)
1 itemset(s) from tree conditioned on items (84879)
1 itemset(s) from tree conditioned on items (21754)
1 itemset(s) from tree conditioned on items (21181)
2 itemset(s) from tree conditioned on items (21755)
0 itemset(s) from tree conditioned on items (21755, 21754)
0 itemset(s) from tree conditioned on items (21755, 85123A)
0 itemset(s) from tree conditioned on items (22111)
0 itemset(s) from tree conditioned on items (22296)
0 itemset(s) from tree conditioned on items (82582)
0 itemset(s) from tree conditioned on items (22295)
0 itemset(s) from tree conditioned on items (21756)
0 itemset(s) from tree conditioned on items (48173C)

3 itemset(s) from tree conditioned on items (21166)
0 itemset(s) from tree conditioned on items (21166, 21175)
0 itemset(s) from tree conditioned on items (21166, 21181)
0 itemset(s) from tree conditioned on items (21166, 85152)
0 itemset(s) from tree conditioned on items (48194)
0 itemset(s) from tree conditioned on items (22113)
0 itemset(s) from tree conditioned on items (21889)
0 itemset(s) from tree conditioned on items (21174)
0 itemset(s) from tree conditioned on items (84029G)
0 itemset(s) from tree conditioned on items (21868)
0 itemset(s) from tree conditioned on items (20676)
0 itemset(s) from tree conditioned on items (21238)
1 itemset(s) from tree conditioned on items (21136)
1 itemset(s) from tree conditioned on items (21094)
0 itemset(s) from tree conditioned on items (85049E)
1 itemset(s) from tree conditioned on items (21086)
0 itemset(s) from tree conditioned on items (48111)
0 itemset(s) from tree conditioned on items (21240)
0 itemset(s) from tree conditioned on ite

0 itemset(s) from tree conditioned on items (21507)
0 itemset(s) from tree conditioned on items (21559)
0 itemset(s) from tree conditioned on items (47590A)
0 itemset(s) from tree conditioned on items (21937)
0 itemset(s) from tree conditioned on items (22084)
0 itemset(s) from tree conditioned on items (22170)
0 itemset(s) from tree conditioned on items (22061)
0 itemset(s) from tree conditioned on items (21390)
0 itemset(s) from tree conditioned on items (21704)
0 itemset(s) from tree conditioned on items (21874)
0 itemset(s) from tree conditioned on items (21380)
0 itemset(s) from tree conditioned on items (21914)
0 itemset(s) from tree conditioned on items (21936)
0 itemset(s) from tree conditioned on items (21509)
0 itemset(s) from tree conditioned on items (21506)
0 itemset(s) from tree conditioned on items (21899)
0 itemset(s) from tree conditioned on items (22158)
0 itemset(s) from tree conditioned on items (22045)
0 itemset(s) from tree conditioned on items (84945)
0 itemset(s

0 itemset(s) from tree conditioned on items (22909)
0 itemset(s) from tree conditioned on items (22952)
1 itemset(s) from tree conditioned on items (22910)
0 itemset(s) from tree conditioned on items (22951)
0 itemset(s) from tree conditioned on items (22632)
0 itemset(s) from tree conditioned on items (22804)
0 itemset(s) from tree conditioned on items (22727)
1 itemset(s) from tree conditioned on items (22726)
0 itemset(s) from tree conditioned on items (22729)
0 itemset(s) from tree conditioned on items (22730)
1 itemset(s) from tree conditioned on items (22728)
0 itemset(s) from tree conditioned on items (22748)
1 itemset(s) from tree conditioned on items (22745)
0 itemset(s) from tree conditioned on items (22746)
0 itemset(s) from tree conditioned on items (22900)
0 itemset(s) from tree conditioned on items (22633)
0 itemset(s) from tree conditioned on items (22766)
0 itemset(s) from tree conditioned on items (22847)
0 itemset(s) from tree conditioned on items (22776)
0 itemset(s)

In [17]:
freq_itemsets

Unnamed: 0,support,itemsets
0,0.046261,(21232)
1,0.059475,(84879)
2,0.041225,(21754)
3,0.034874,(21181)
4,0.033381,(21755)
...,...,...
78,0.025203,(22699)
79,0.027855,(22910)
80,0.026206,(22727)
81,0.027141,(22720)


In [12]:
help(dep.fpgrowth)

Help on function fpgrowth in module mlxtend.frequent_patterns.fpgrowth:

fpgrowth(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0)
    Get frequent itemsets from a one-hot DataFrame
    
    Parameters
    -----------
    df : pandas DataFrame
      pandas DataFrame the encoded format. Also supports
      DataFrames with sparse data; for more info, please
      see https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#sparse-data-structures.
    
      Please note that the old pandas SparseDataFrame format
      is no longer supported in mlxtend >= 0.17.2.
    
      The allowed values are either 0/1 or True/False.
      For example,
    
    ```
           Apple  Bananas   Beer  Chicken   Milk   Rice
        0   True    False   True     True  False   True
        1   True    False   True    False  False   True
        2   True    False   True    False  False  False
        3   True     True  False    False  False  False
        4  False    False   True  

## References

- [Apriori Algorithm in Machine Learning](https://www.javatpoint.com/apriori-algorithm-in-machine-learning)
- [FP Growth Algorithm Explained With Numerical Example](https://codinginfinite.com/fp-growth-algorithm-explained-with-numerical-example/)
- [Implement FP Growth Algorithm in Python](https://codinginfinite.com/implement-fp-growth-algorithm-in-python/)
- [Introduction to Apriori Algorithm in Python](https://intellipaat.com/blog/data-science-apriori-algorithm/)
- [Market Basket Analysis: A Comprehensive Guide for Businesses](https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-market-basket-analysis/)
- [Product Sales Analysis Using Python](https://medium.com/swlh/product-sales-analysis-using-python-863b29026957)