# Product Bundling and Recommendation

## Product Bundling

![Example Product Bundle. Credits: Burger King](https://cxl.com/wp-content/uploads/2018/10/burger-king-bundle.png)

<p style = 'text-align: center;'>Image Credits: Burger King</p>

Product bundling refers to grouping products or services together for sale as one package, as illustrated in the image above.

In order to identify the products that are ideal for bundling, we will perform **market basket analysis** which is a data mining technique used to identify relationships between products that are frequently purchased together. In technical terms, _market basket analysis_ is **Association Rule Mining** whose goal is to identify rules that describe the likelihood of a product being purchased together with other products.

There is a number of algorithms that one can use to perform market basket analysis including:
- Apriori Algorithm
- AIS Algorithm
- SETM Algorithm
- FP Growth Algorithm

**Apriori algorithm** is a _popular_ algorithm in association rule mining and it _performs better_ than AIS and SETM algorithms. However, it is _computationally expensive_ when working on a large dataset therefore we shall use **FP Growth algorithm** which is an advancement of Apriori algorithm.

### Import Packages

In [1]:
# Module containing all libraries used
import src.dependencies as dep

# Module containing custom functions
import src.functions as fn

### Load the Dataset

The dataset is the transformed version obtained from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/dataset/502/online+retail+ii).

In [2]:
# Load data
df = dep.pd.read_csv('dataset/Transformed.csv')

# Confirm successful loading
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085,United Kingdom


### Preprocess the Data

Since the data is already transformed, we will proceed to extract the data we need to implement the FP Growth algorithm. The features of interest are:
- `Invoice`: It acts as the transaction identifier
- `StockCode`: It acts as the product identifier

In [3]:
# Get features
features_df = df[['Invoice', 'StockCode']]

# Rename them
features_df = features_df.rename(columns = {'Invoice': 'Transaction', 'StockCode': 'Products'})
features_df.head()

Unnamed: 0,Transaction,Products
0,489434,85048
1,489434,79323P
2,489434,79323W
3,489434,22041
4,489434,21232


Next, we have to group the items in a transaction, as a list, figuratively, place the products in the invoice 'cart'.

In [4]:
# Group the items
transactions_df = features_df.groupby('Transaction')['Products'].apply(list).reset_index()
transactions_df.head()

Unnamed: 0,Transaction,Products
0,489434,"[85048, 79323P, 79323W, 22041, 21232, 22064, 2..."
1,489435,"[22350, 22349, 22195, 22353]"
2,489436,"[48173C, 21755, 21754, 84879, 22119, 22142, 22..."
3,489437,"[22143, 22145, 22130, 21364, 21360, 21351, 213..."
4,489438,"[21329, 21252, 21100, 21033, 20711, 21410, 214..."


We will also convert the `Products` column into a list, making it a list of lists.

In [5]:
# Get the lists of products
transactions = transactions_df['Products'].tolist()

# Check the first 2 entries in the list
transactions[0:2]

[['85048', '79323P', '79323W', '22041', '21232', '22064', '21871', '21523'],
 ['22350', '22349', '22195', '22353']]

Finally, encode the transactions into a NumPy array.

In [6]:
# Encode
encoder = dep.TransactionEncoder()
encoded_transactions = encoder.fit(transactions).transform(transactions)

### Modeling

We will generate the frequent itemsets using `fpgrowth()` function. Since the function expects a one-hot encoded DataFrame, we will convert the encoded array into a dataframe first.

In [7]:
# Array to Dataframe
encoded_df = dep.pd.DataFrame(encoded_transactions, columns = encoder.columns_)
encoded_df.head()

Unnamed: 0,10002,10080,10109,10120,10123C,10123G,10124A,10124G,10125,10133,...,C2,CRUK,D,DOT,M,PADS,POST,SP1002,TEST001,TEST002
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


The syntax of the `fpgrowth()` function is:

**_fpgrowth(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0)_**

Where:
- _df_ - One-hot encoded DataFrame, our `encoded_df` DataFrame.
- _min_support_ - A float between 0 and 1 for minimum support of the itemsets returned, with **_support_** being a fraction of the number of transactions where the item occurs divided by total transactions. The default value is 0.5, but we will set it lower, say 0.015 due to the many products.
- _use_colnames_ - It is fault by default, yielding column indices but we are interested in the products identifiers, therefore, we shall set it to true.
- _max_len_ - It refers to the maximum length of the itemsets generated. We shall leave the default setting.
- _verbose_ - It shows the stages of conditional tree generation. We shall leave the default setting.

In [8]:
# Generate Frequent itemsets
freq_itemsets = dep.fpgrowth(encoded_df, min_support = 0.015, use_colnames = True)

In [9]:
#Sample of frequent itemsets
freq_itemsets.sample(3)

Unnamed: 0,support,itemsets
246,0.018339,"(20728, 22383)"
125,0.015376,(20974)
82,0.041046,(22197)


Using the frequent itemsets generated, we will generate their **association rules**, which express the likelihood of products being purchased together. An association rule is an implication expression of the form X→Y, where X and Y are disjoint itemsets, with X being the antecedent and Y, the consequent.

The syntax for `association_rules()` is:

**association_rules(df, metric='confidence', min_threshold=0.8, support_only=False)**

Where:
- _df_ - DataFrame of frequent itemsets.
- _metric_ - Metric to evaluate if a rule is of interest. The default value is 'confidence'. The other supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction'. These metrics are defined in the glossary section.
- _min_threshold_ - Minimal threshold for the evaluation metric, default is 0.8.
- _support_only_ - It only computes the rule support, and the default value is False.

In [10]:
# Generate association rules
rules = dep.association_rules(freq_itemsets, metric = 'confidence', min_threshold = 0.5)

# View a sample of the rules
rules.sample(5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
17,(22727),(22726),0.026206,0.023687,0.015799,0.602891,25.451875,0.015178,2.458551
13,(22386),(85099B),0.03991,0.074494,0.023977,0.600782,8.064816,0.021004,2.318295
2,(20726),(20725),0.036166,0.059163,0.019275,0.532964,9.00839,0.017136,2.014483
18,(22726),(22727),0.023687,0.026206,0.015799,0.66698,25.451875,0.015178,2.924134
11,(85099F),(85099B),0.034317,0.074494,0.021459,0.625325,8.394278,0.018903,2.470154


In [11]:
# Check the number of rules generated with the set parameters
rules.shape

(19, 9)

A total of 19 rules have been generated, together with their respective metrics.

### Bundling

We can use the generated rules to help propose product bundles. First, we will sort the rules in a descending order based on respective values of the **_lift_** and **_confidence_**.

In [12]:
# Sort the rules
sorted_rules = rules.sort_values(by=['lift', 'confidence'], ascending=[False, False])

Next, for now, we will limit product selection for bundling to the antecedents in our rules only.

In [15]:
# Get list of antecedents
antecedents = fn.antecedents_list(sorted_rules)

# View sample
antecedents[:3]

['22697', '22699', '22727']

In [22]:
test = list(rules['antecedents'])
ante = []
for item in test:
    x, = item
    print(x)
    ante.append(x)
print(ante)

21755
21733
20726
22356
84991
21977
21231
21931
85099C
82494L
82482
85099F
22384
22386
22699
22697
22910
22727
22726
['21755', '21733', '20726', '22356', '84991', '21977', '21231', '21931', '85099C', '82494L', '82482', '85099F', '22384', '22386', '22699', '22697', '22910', '22727', '22726']


In [26]:
sorted_rules = rules.sort_values(by=['lift', 'confidence'], ascending=[False, False])

# Recommendation function
def recommend_bundles(customer_cart, rules, top_n=3):
    recommendations = []
    for index, row in rules.iterrows():
        antecedents = set(row['antecedents'])
        if antecedents.issubset(customer_cart):
            consequents = set(row['consequents'])
            recommendations.extend(consequents.difference(customer_cart))
    
    return recommendations[:top_n]

# Example usage
customer_cart = set(['22697', '22910', '22727'])
recommended_products = recommend_bundles(customer_cart, sorted_rules, top_n=3)

print("Customer Cart:", customer_cart)
print("Recommended Products:", recommended_products)

Customer Cart: {'22727', '22910', '22697'}
Recommended Products: ['22699', '22726', '22086']


In [16]:
help(dep.association_rules)

Help on function association_rules in module mlxtend.frequent_patterns.association_rules:

association_rules(df, metric='confidence', min_threshold=0.8, support_only=False)
    Generates a DataFrame of association rules including the
    metrics 'score', 'confidence', and 'lift'
    
    Parameters
    -----------
    df : pandas DataFrame
      pandas DataFrame of frequent itemsets
      with columns ['support', 'itemsets']
    
    metric : string (default: 'confidence')
      Metric to evaluate if a rule is of interest.
      **Automatically set to 'support' if `support_only=True`.**
      Otherwise, supported metrics are 'support', 'confidence', 'lift',
      'leverage', and 'conviction'
      These metrics are computed as follows:
    
      - support(A->C) = support(A+C) [aka 'support'], range: [0, 1]
    
      - confidence(A->C) = support(A+C) / support(A), range: [0, 1]
    
      - lift(A->C) = confidence(A->C) / support(C), range: [0, inf]
    
      - leverage(A->C) = suppo

## Glossary

### Association Rules Metrics

| Metric | Definition | Formula| Range |
| :-- | :-- | :-- | :--: |
| support | The sum of support of the antecedent and the consequent. | support(A->C) = support(A+C) | [0, 1] |
| confidence | The likelihood of the consequent being purchased when antecedent is purchased. | confidence(A->C) = support(A+C) / support(A) | [0, 1] |
| lift | The likelihood of the consequent being purchased when antecedent is sold, taking into account the popularity of the consequent. | lift(A->C) = confidence(A->C) / support(C) | [0, inf] |
| leverage | The ratio of support of the association rule to the product of support of antecedent and consequent. | leverage(A->C) = support(A->C) - support(A) * support(C) | [-1, 1] |
| conviction | Defined as (1-support of consequent) divided by (1- confidence of the association rule). | conviction(A->C) = [1 - support(C)] / [1 - confidence(A->C)] | [0, inf] |

## References

- [Apriori Algorithm in Machine Learning](https://www.javatpoint.com/apriori-algorithm-in-machine-learning)
- [FP Growth Algorithm Explained With Numerical Example](https://codinginfinite.com/fp-growth-algorithm-explained-with-numerical-example/)
- [Implement FP Growth Algorithm in Python](https://codinginfinite.com/implement-fp-growth-algorithm-in-python/)
- [Introduction to Apriori Algorithm in Python](https://intellipaat.com/blog/data-science-apriori-algorithm/)
- [Market Basket Analysis: A Comprehensive Guide for Businesses](https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-market-basket-analysis/)
- [Product Sales Analysis Using Python](https://medium.com/swlh/product-sales-analysis-using-python-863b29026957)