# NextBuys - Recommendation engine built using FP-Growth and Cosine Similarity

## Importing the class files 

- RuleCreation is the class which contains the code for creating the rules for the website.
    - It has functions used to filter the orders and remove orders which might not be useful in the algorithm
    - It then uses fp-growth algorithm to create a dataframe containing the rules which will help us in recommending different categories of products
    - It also has a function which will help us in creating a directory containing all the categories and a unique id
- ProductDirectoryBuilder is the class which calculates similar products based on cosine similarity
    - It first compares the product title with all the other titles present in the category. It then only keeps the top 20 products in the category. This is done so that mis-classified products in a category are removed.
    - It then calculates the cosine similarity of each product with every other product and stores the cosine similarity score of each product-product pair in a row

In [None]:
from RecommendationRuleCreation import RuleCreation
from ProductPairing import ProductDirectoryBuilder

## Creating the rules

We have made use of the `mlxtend` library for FP-Growth and Association rules

Steps used to create the rules:
- Load the amazon purchase data
- Filter out orders which only have single items, incomplete data, etc.
- Generate freuqent itemsets using fp-growth algorithm which will then help us in creating the rules
- Generate the association rules
- Filter out rules with any specific criteria [Example shown below uses lift]


Here we have kept the minimum support as 0.005 for this example

In [2]:
# Load and filter data
data = RuleCreation.load_data('src/amazon-purchases.csv')
filtered_data = RuleCreation.filter_orders(data)

# Generate frequent itemsets and association rules
frequent_itemsets = RuleCreation.generate_frequent_itemsets(filtered_data, min_support=0.005)
rules = RuleCreation.generate_association_rules(frequent_itemsets, min_threshold=0.1)
rules = rules[rules['lift'] >= 1.0]
rules.head()

# Create category directory
category_dir = RuleCreation.create_category_directory(rules)

  basket = basket.applymap(lambda x: True if x > 0 else False)


## Creating similar product pairs using cosine similarity

We start off with creating the product directory similar to the category directory. This is mainly to filter out products which have been misclassified and to reduce the product catalog size for the website. 

Steps to create the product directory using `build_product_directory`
- Iterate through each unique category in category_dir.
- Preprocess product titles to clean and standardize the product titles.
- Create a vocabulary set containing all the titles in the category
- Calculate similarity scores using TF-IDF and cosine similarity for all the products in a category to the vocabulary set.
- Sort products by similarity scores, remove duplicates, and select the top N products. Products which do not belong to a category might have a smaller cosine similarity score to the overall vocabulary set 
- Combine the results for all categories into a single DataFrame.
- Return the final product directory.

In [3]:
prod_dir = ProductDirectoryBuilder()
new_df = prod_dir.build_product_directory(category_dir, filtered_data)
new_df.head()

Unnamed: 0,Order Date,Purchase Price Per Unit,Quantity,Shipping Address State,Title,ASIN/ISBN (Product Code),Category_x,Survey ResponseID,order_id,count,unique_products_count,Category_y,sim_index,cat_id
0,2021-02-06,12.99,1.0,WI,supbec iphone 8 case iphone 7 case slim fit ip...,B07RY28HPC,CELLULAR_PHONE_CASE,R_2xMobch64onIeYO,408092,3,3,1.0,0.11963,1
1,2020-07-01,12.99,1.0,KY,compatible with iphone case clear case cover,B075WMX4JS,CELLULAR_PHONE_CASE,R_3GiWheWDtygg0V2,467735,5,5,1.0,0.11888,1
2,2021-09-23,11.99,1.0,TN,giika for iphone 12 case iphone 12 pro case wi...,B08N4H4NDD,CELLULAR_PHONE_CASE,R_PSgcRvjngEuUFC9,681251,4,4,1.0,0.115908,1
3,2021-05-25,14.99,1.0,TX,giika iphone se 2020 case iphone 8 case iphone...,B08LKY1NWY,CELLULAR_PHONE_CASE,R_29sSdZhPXE2rY6T,238179,3,3,1.0,0.115806,1
4,2020-08-31,16.99,1.0,CA,youmaker designed for iphone 8 plus case ipho...,B07D3K632T,CELLULAR_PHONE_CASE,R_301qJ8vhV7FSEct,422949,6,6,1.0,0.113227,1


To create the pairing directory we just iterate through each product-product pairing in a category and calculate the cosine similarity score and store the pair and its respective score in a new dataframe. 

In [4]:
pairing_dir = prod_dir.build_pairing_directory(category_dir, new_df)
pairing_dir.head()

Total =  7


Unnamed: 0,target_prod,compare_prod,sim_values
0,B07RY28HPC,B07RY28HPC,1.0
1,B07RY28HPC,B07RXZ6LC7,0.859428
2,B07RY28HPC,B083M7LR3D,0.49225
3,B07RY28HPC,B083M72PZ6,0.485893
4,B07RY28HPC,B076NPPW58,0.4848
