# NextBuys - Recommendation engine built using FP-Growth and Cosine Similarity

## Importing the class files 

- AmaazonAnalysis is the class which contains the code for creating the rules for the website.
    - It has functions used to filter the orders and remove orders which might not be useful in the algorithm
    - It then uses fp-growth algorithm to create a dataframe containing the rules which will help us in recommending different categories of products
    - It also has a function which will help us in creating a directory containing all the categories and a unique id
- ProductDirectoryBuilder is the class which calculates similar products based on cosine similarity
    - It first compares the product title with all the other titles present in the category. It then only keeps the top 20 products in the category. This is done so that mis-classified products in a category are removed.
    - It then calculates the cosine similarity of each product with every other product and stores the cosine similarity score of each product-product pair in a row

In [1]:
from sample_class import AmazonAnalysis
from prod_dir import ProductDirectoryBuilder

## Creating the rules

We have made use of the `mlxtend` library for FP-Growth and Association rules

Steps used to create the rules:
- Load the amazon purchase data
- Filter out orders which only have single items, incomplete data, etc.
- Generate freuqent itemsets using fp-growth algorithm which will then help us in creating the rules
- Generate the association rules
- Filter out rules with any specific criteria [Example shown below uses lift]

In [2]:
# Load and filter data
data = AmazonAnalysis.load_data('data/dataverse_files/amazon-purchases.csv')
filtered_data = AmazonAnalysis.filter_orders(data)

# Generate frequent itemsets and association rules
frequent_itemsets = AmazonAnalysis.generate_frequent_itemsets(filtered_data, min_support=0.001)
rules = AmazonAnalysis.generate_association_rules(frequent_itemsets, min_threshold=0.1)
rules = rules[rules['lift'] >= 1.0]
rules.head()


  basket = basket.applymap(lambda x: True if x > 0 else False)


In [None]:
# Create category directory
category_dir = AmazonAnalysis.create_category_directory(rules)
category_dir.head()

## Creating similar product pairs using cosine similarity

Here we use the ProductDirectoryBuilder class to create the similar product pairing dataframe

In [8]:
prod_dir = ProductDirectoryBuilder()
new_df = prod_dir.build_product_directory(category_dir, filtered_data)
new_df.head()

Total Categories: 81
Index(['Order Date', 'Purchase Price Per Unit', 'Quantity',
       'Shipping Address State', 'Title', 'ASIN/ISBN (Product Code)',
       'Category_x', 'Survey ResponseID', 'order_id', 'count',
       'unique_products_count', 'Category_y'],
      dtype='object')


  right=ast.Str(s=sentinel),
  right=ast.Str(s=sentinel),
  return Constant(*args, **kwargs)
  right=ast.Str(s=sentinel),
  return Constant(*args, **kwargs)
  right=ast.Str(s=sentinel),
  return Constant(*args, **kwargs)
  right=ast.Str(s=sentinel),
  return Constant(*args, **kwargs)
  right=ast.Str(s=sentinel),
  return Constant(*args, **kwargs)
  right=ast.Str(s=sentinel),
  return Constant(*args, **kwargs)


KeyboardInterrupt: 