# This is a sample Jupyter Notebook

Below is an example of a code cell. 
Put your cursor into the cell and press Shift+Enter to execute it and select the next one, or click !here goes the icon of the corresponding button in the gutter! button.
To debug a cell, press Alt+Shift+Enter, or click !here goes the icon of the corresponding button in the gutter! button.

Press Double Shift to search everywhere for classes, files, tool windows, actions, and settings.

To learn more about Jupyter Notebooks in PyCharm, see [help](https://www.jetbrains.com/help/pycharm/jupyter-notebook-support.html).
For an overview of PyCharm, go to Help -> Learn IDE features or refer to [our documentation](https://www.jetbrains.com/help/pycharm/getting-started.html).

In [63]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
# Collect Data

df = pd.read_excel("data/online_retail_II.xlsx")
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom


In [64]:
# Clean Data
df.dropna(subset=["Customer ID", "Description"])
df = df[ df['Quantity'] > 0]

In [69]:
# Create Transaction Baskets
transactions = df.groupby('Invoice')['Description'].apply(list).tolist()
# Keep only rows where 'Description' is a string
df = df[df['Description'].apply(lambda x: isinstance(x, str))]
# Replace non-string descriptions with a placeholder
df['Description'] = df['Description'].apply(lambda x: str(x) if isinstance(x, str) else 'Unknown')

In [93]:
# Apriori Algorithm

# Convert transaction data into one-hot encoded format
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_transactions = pd.DataFrame(te_ary, columns=te.columns_)

# Find frequent item sets
min_support = 0.01
min_confidence = 0.3
min_lift = 1.0
frequent_item_sets = apriori(df_transactions, min_support=min_support , use_colnames=True)
print(frequent_item_sets)

       support                                           itemsets
0     0.010999                 ( SET 2 TEA TOWELS I LOVE LONDON )
1     0.011570                             ( WHITE CHERRY LIGHTS)
2     0.013856                           (10 COLOUR SPACEBOY PEN)
3     0.010142                  (12 MESSAGE CARDS WITH ENVELOPES)
4     0.015475                    (12 PENCIL SMALL TUBE WOODLAND)
...        ...                                                ...
1088  0.013523  (STRAWBERRY CERAMIC TRINKET BOX, WHITE HANGING...
1089  0.014903  (WOODEN PICTURE FRAME WHITE FINISH, WHITE HANG...
1090  0.010713  (WOOD 2 DRAWER CABINET WHITE FINISH, WOOD S/3 ...
1091  0.011189  (WOOD 2 DRAWER CABINET WHITE FINISH, WOODEN PI...
1092  0.010904  (WOODEN PICTURE FRAME WHITE FINISH, WOOD S/3 C...

[1093 rows x 2 columns]
                                           antecedents  \
0                             (6 RIBBONS RUSTIC CHARM)   
1                           (REGENCY CAKESTAND 3 TIER)   
2        

In [119]:
# Generate association rules
rules = association_rules(frequent_item_sets, metric="lift", min_threshold=1.0 , num_itemsets= len(frequent_item_sets))

rules_list = list(rules)
print("\nTotal Rules length" , len(rules_list))
print("\nAssociation Rules : ")
print(rules)


Total Rules length 14

Association Rules : 
                                           antecedents  \
0                             (6 RIBBONS RUSTIC CHARM)   
1                           (REGENCY CAKESTAND 3 TIER)   
2                             (6 RIBBONS RUSTIC CHARM)   
3                 (WHITE HANGING HEART T-LIGHT HOLDER)   
4                    (60 CAKE CASES VINTAGE CHRISTMAS)   
..                                                 ...   
865  (WOODEN PICTURE FRAME WHITE FINISH, WOODEN FRA...   
866  (WOODEN FRAME ANTIQUE WHITE , WOOD S/3 CABINET...   
867                (WOODEN PICTURE FRAME WHITE FINISH)   
868                (WOOD S/3 CABINET ANT WHITE FINISH)   
869                      (WOODEN FRAME ANTIQUE WHITE )   

                                           consequents  antecedent support  \
0                           (REGENCY CAKESTAND 3 TIER)            0.039806   
1                             (6 RIBBONS RUSTIC CHARM)            0.096181   
2                 (WHITE

In [123]:
# Create function Product Recommendation
def recommend_products(input_items, rules, top_n=3):
    recommendations = rules[rules['antecedents'].apply(lambda x: set(input_items).issubset(x))]
    return recommendations.sort_values(by='lift', ascending=False).head(top_n)['consequents']

# Test the recommendation function
print("Recommendation Products : ")
recommend_products(['HOT WATER BOTTLE TEA AND SYMPATHY'], rules)

Recommendation Products : 


113             (CHOCOLATE HOT WATER BOTTLE)
261    (KNITTED UNION FLAG HOT WATER BOTTLE)
263         (RED WOOLLY HOTTIE WHITE HEART.)
Name: consequents, dtype: object