<a id='top'></a>

* [Load Data and Remove Duplicates](#load_data)

* [Pick a Target Product and Number of Recommendations](#init)

## Product content oriented:
* [Popular Items](#popular_item): 
    - Most popular products in same (l2) category

* [Product Association](#product_association): 
    - Find what products most frequently purhcased together
    - Filter orders that purchased the target product and find other most frequently purchased products

## User oriented:
* Content Filtering: 
    - Give recommendations based on user historic purhcases
    - Since dataset doesn't include user and purchased history, skip this method 

* [Collaborative Filtering](#collaborative_filtering):   
    - Give recommendations based on other similar users' purhcases
    - Since dataset doesn't include user info, compare similar orders instead using Jaccard similarity


In [1]:
import os
import numpy as np
import pandas as pd
from zipfile import ZipFile

<a id='load_data'></a>
## Load Data and Remove Duplicates

In [2]:
zipfile_fpath = os.path.join('e-corp-data.zip')
destination_path = "data"

with ZipFile(zipfile_fpath, 'r') as zipObj:
    # Extract all the contents of zip file in current directory
    zipObj.extractall(destination_path)
    print("Unzipped dataset")
    
os.listdir(destination_path)

Unzipped dataset


['All Transations - 2 Weeks.txt', 'Transactions with A&S.txt']

In [3]:
# AnS = pd.read_csv(os.path.join(destination_path, 'Transactions with A&S.txt'), sep='\t')
All = pd.read_csv(os.path.join(destination_path, 'All Transations - 2 Weeks.txt'), sep='\t')
All

Unnamed: 0,order_number,l1,l2,l3,sku,brand
0,168266,Power Tools,Power Saws and Accessories,Reciprocating Saw Blades,265105,2768
1,123986,Safety,Spill Control Supplies,Temporary Leak Repair,215839,586
2,158978,Hardware,Door Hardware,Thresholds,284756,1793
3,449035,"Electronics, Appliances, and Batteries",Batteries,Standard Batteries,12579,1231
4,781232,Motors,General Purpose AC Motors,General Purpose AC Motors,194681,2603
...,...,...,...,...,...,...
2107532,373846,Hand Tools,Wrenches,Adjustable Wrench Sets,197463,3356
2107533,373846,Hand Tools,Wrenches,Combination Wrench Sets,104442,2351
2107534,373846,Test Instruments,Temperature and Humidity Measuring,Infrared Thermometers,61610,1596
2107535,373846,Hand Tools,Wrenches,Adjustable Wrenches,45956,4692


### Duplicates
Remove orders that purchase products in large quantity.

In [4]:
All.drop_duplicates(inplace=True)
All.shape

(2055467, 6)

[Go back to top](#top)
<a id='init'></a>
### Pick a Target Product and Number of Recommendations

In [5]:
# randomly choose a l3 item
l3 = np.random.choice(list(set(All.l3)))
print(f"If user bought {l3}\n")

If user bought Cable and Wire Cutters



In [6]:
# choose number of recommendations
num_recs = 3

[Go back to top](#top)
<a id='popular_item'></a>
## Popular Item

In [7]:
# find popular items in same l2 category
for l2 in set(All[All.l3==l3].l2):
    recs = list(All[All.l2==l2].groupby('l3').size().sort_values(ascending=False).index)
    print(f"{l3} belongs to category: {l2}")
    print(f"Top {num_recs} popular items in this category: {recs[:num_recs] if len(recs)>=num_recs else recs}")

Cable and Wire Cutters belongs to category: Cutting Tools
Top 3 popular items in this category: ['Safety Utility Knives', 'Utility Knife Blades', 'Utility Knives']


[Go back to top](#top)
<a id='product_association'></a>
## Product Association

In [8]:
# find orders that bought this item
orders = All[All.order_number.isin(All[All.l3==l3].order_number)]
# find other items frequently bought with the target item
recs = list(orders.groupby('l3').size().sort_values(ascending=False).index)
recs.remove(l3)
print(f"Top {num_recs} popular items from people who also bought {l3}: {recs[:num_recs] if len(recs)>=num_recs else recs}")

Top 3 popular items from people who also bought Cable and Wire Cutters: ['Screwdrivers', 'Hex and Torx Key Sets', 'Sockets']


[Go back to top](#top)
<a id='collaborative_filtering'></a>
## Collaborative Filtering

In [24]:
jaccard_series = pd.Series(0.0, index=All.l3.unique())
orders = All.groupby('l3').order_number
target_orders = set(orders.get_group(l3))

for i in jaccard_series.index.tolist():
    sample_orders = set(orders.get_group(i))
    intersection = len(target_orders.intersection(sample_orders))
    union = len(target_orders.union(sample_orders))
    jaccard_series[i] = intersection/union

jaccard_series.drop(l3, axis=0, inplace=True)
recs = list(jaccard_series.sort_values(ascending=False).index)
scores = list(jaccard_series.sort_values(ascending=False).values)
print(f"Top {num_recs} products similar to {l3}: {recs[:num_recs] if len(recs)>=num_recs else recs}")
print(f"With score: {scores[:num_recs] if len(scores)>=num_recs else scores}")

Top 3 products similar to Cable and Wire Cutters: ['Wire Strippers and Cable Slitters', 'Cable and Wire Crimping Tools', 'Long Nose and Needle Nose Pliers']
With score: [0.026283240568954855, 0.024378585086042064, 0.0216998191681736]


[Go back to top](#top)