Personalization ML is one of the key drivers of the Ecommerce business in the current scenario. One could leverage the power of Data Science on customer feedback data to come up with personalized recommendations, discount campaigns, marketing etc.

In this project, you will see how personalized recommendations can be generated from **Ecommerce implicit feedback data (views, clicks, time spent viewing, transactions)** as opposed to explicit feedback (ratings).

### Why Implicit feedback data?
- The primary issue in using explicit feedback (user ratings) is that it is very sparce (Just be true to yourselves- do you rate every product that you purchase?) whereas implicit feedback can be captured each and every time the user surfs the app.

- The main reason in using implicit feedback data is that you would like a recommendation according to the context of your query. (Ex. you are looking for a wrist watch in amazon. If you have not bought/rated a wrist watch previously, you would not have any explicit feedback available and the recommendations will be based on cold-start problem. If you are using the implicit feedback from the user, even though he/she doesn't rate the item we'll have information on the taste of the user by the views, clicks, time user spent viewing a watch and even more creative ways. So, as the user keeps viewing different watches, the more confident we can be on the taste of the user and recommend watches accordingly.)

Hence, with plethora of similar products available on an app, the quicker we can assist a user in finding the right product, the better will be the user satisfaction and conversion rate.

This project is an attempt to implement the ideas mentioned in this paper (https://arxiv.org/pdf/1806.11371.pdf). Do give it a read for better understanding of the approach and also the business impact.

### Dataset
It is actually very difficult to find an open source implicit feedback datasets. I luckily found one here from a previous competition
RecSys Challenge 2015 (https://2015.recsyschallenge.com/challenge.html). Though the problem statement is the challenge is different, I just used the dataset to build our recommendation system.

In [29]:
import pandas as pd
import numpy as np
from numpy.random import randint
import os
import implicit
import scipy.sparse as sparse
import ml_metrics as metrics


pd.options.display.max_rows = 3000

### Load and Pre-process the data

In [15]:
df_action = pd.read_csv('../input/제6회 L.POINT Big Data Competition-분석용데이터-01.온라인 행동 정보.csv', parse_dates=['sess_dt'])
df_transaction = pd.read_csv('../input/제6회 L.POINT Big Data Competition-분석용데이터-02.거래 정보.csv', parse_dates=['de_dt'])
df_client_demo = pd.read_csv('../input/제6회 L.POINT Big Data Competition-분석용데이터-03.고객 Demographic 정보.csv')
df_product = pd.read_csv('../input/제6회 L.POINT Big Data Competition-분석용데이터-04.상품분류 정보.csv')

In [16]:
print(df_action.shape)
df_action.head()

(3196362, 14)


Unnamed: 0,clnt_id,sess_id,hit_seq,action_type,biz_unit,sess_dt,hit_tm,hit_pss_tm,trans_id,sech_kwd,tot_pag_view_ct,tot_sess_hr_v,trfc_src,dvc_ctg_nm
0,7809,1,8,5,A03,2019-09-13,01:16,2571103,,,34.0,2663.0,DIRECT,
1,7809,1,4,2,A03,2019-09-13,01:14,2485909,,,34.0,2663.0,DIRECT,
2,7809,1,11,5,A03,2019-09-13,01:17,2646597,,,34.0,2663.0,DIRECT,
3,7809,1,1,2,A03,2019-09-13,00:46,788304,,,34.0,2663.0,DIRECT,
4,7809,1,9,5,A03,2019-09-13,01:17,2617609,,,34.0,2663.0,DIRECT,


In [17]:
# sech_kwd 에서 nan 값만 drop
df_action.dropna(subset = ['sech_kwd'], inplace=True)
print(df_action.shape)
df_action.head()

(651638, 14)


Unnamed: 0,clnt_id,sess_id,hit_seq,action_type,biz_unit,sess_dt,hit_tm,hit_pss_tm,trans_id,sech_kwd,tot_pag_view_ct,tot_sess_hr_v,trfc_src,dvc_ctg_nm
2544724,30605,16,1,0,A03,2019-09-07,22:04,14548,,버터,3.0,39.0,DIRECT,
2544725,30605,12,2,0,A03,2019-08-21,23:36,422952,,카누,5.0,467.0,DIRECT,
2544726,30605,13,1,0,A03,2019-08-22,14:47,0,,카누,1.0,,DIRECT,
2544727,28304,1,13,0,A03,2019-07-16,11:36,933562,,비비고만두,56.0,1303.0,PUSH,mobile_web
2544728,28304,1,11,0,A03,2019-07-16,11:35,820901,,어묵,56.0,1303.0,PUSH,mobile_web


In [18]:
print(df_transaction.shape)
df_transaction.head()

(599961, 9)


Unnamed: 0,clnt_id,trans_id,trans_seq,biz_unit,pd_c,de_dt,de_tm,buy_am,buy_ct
0,21922,104999,1,A03,unknown,2019-09-20,12:41,5990,1
1,21279,104907,4,A03,unknown,2019-09-20,10:27,10900,1
2,39423,105124,11,A03,unknown,2019-09-20,17:26,12900,1
3,18362,104010,1,A03,unknown,2019-09-20,09:57,9900,1
4,39423,105124,13,A03,0565,2019-09-20,17:26,2990,1


In [19]:
print(df_client_demo.shape)
df_client_demo.head()

(72399, 3)


Unnamed: 0,clnt_id,clnt_gender,clnt_age
0,1,unknown,unknown
1,2,F,30
2,3,unknown,unknown
3,4,unknown,unknown
4,5,unknown,unknown


In [20]:
print(df_product.shape)
df_product.head()

(1667, 4)


Unnamed: 0,pd_c,clac_nm1,clac_nm2,clac_nm3
0,1,Automotive Products,Automotive Replacement Repair / Maintanance Kits,Automobile Oil / Additives
1,2,Automotive Products,Automotive Replacement Repair / Maintanance Kits,Car Lights
2,3,Automotive Products,Automotive Replacement Repair / Maintanance Kits,Car Paint
3,4,Automotive Products,Automotive Replacement Repair / Maintanance Kits,Filters
4,5,Automotive Products,Automotive Replacement Repair / Maintanance Kits,Wiper Blades


In [21]:
# convert df_product['pd_c'] datatype : int -> object
df_product['pd_c'] = df_product['pd_c'].apply(lambda num: "{:04n}".format(num))

# df_transaction and df_prodcct merge!
df_transaction = pd.merge(df_transaction, df_product, how='left')

In [None]:
# df = pd.merge(df_action, df_transaction, on='clnt_id')

In [None]:
# df = pd.merge(df_action, df_transaction, how='inner', on='clnt_id')

In [22]:
df_action.loc[df_action['clnt_id'] == 46288]

Unnamed: 0,clnt_id,sess_id,hit_seq,action_type,biz_unit,sess_dt,hit_tm,hit_pss_tm,trans_id,sech_kwd,tot_pag_view_ct,tot_sess_hr_v,trfc_src,dvc_ctg_nm
3195375,46288,37,7,0,A02,2019-07-24,13:47,120486,,레이스커텐,22.0,569.0,unknown,mobile_app
3195376,46288,7,27,0,A02,2019-07-05,10:12,4040256,,험멜,87.0,6443.0,unknown,mobile_app
3195377,46288,74,17,0,A02,2019-08-10,11:08,1017451,,바자르커텐,55.0,4574.0,unknown,mobile_app
3195378,46288,37,11,0,A02,2019-07-24,13:48,179748,,레이스암막커튼,22.0,569.0,unknown,mobile_app
3195379,46288,29,1,0,A02,2019-07-20,11:36,398061,,겨울패딩,33.0,689.0,unknown,mobile_app
3195380,46288,91,1,0,A02,2019-08-19,11:53,55313,,깨끗한나라키친타올,5.0,72.0,unknown,mobile_app
3195381,46288,79,4,0,A02,2019-08-14,11:57,516871,,dhc,21.0,813.0,unknown,mobile_app
3195382,46288,7,24,0,A02,2019-07-05,10:11,3992594,,험멜,87.0,6443.0,unknown,mobile_app
3195383,46288,74,52,0,A02,2019-08-10,12:04,4384136,,바자르커텐,55.0,4574.0,unknown,mobile_app
3195384,46288,103,2,0,A02,2019-08-26,11:49,293809,,역시즌,28.0,753.0,unknown,mobile_app


In [23]:
df_action.loc[df_action['clnt_id'] == 46288]['sech_kwd'].unique()

array(['레이스커텐', '험멜', '바자르커텐', '레이스암막커튼', '겨울패딩', '깨끗한나라키친타올', 'dhc',
       '역시즌', '올리타리아', '마마인하우스by박홍근', '험멜벤치코트', '바자르커텐 뉴웨이브', '역시즌패딩',
       '모르간', '삼성냉장고', '더블구스코트', '덴트릭스', '냉장고 4도어', '지나송', '박홍근 밍크이불',
       '금산인삼삼계탕', '지나송블리스', '마마인인견', '바자르커튼뉴웨이브', '여성 트렌치코트', '여성반바지5부',
       '지나송 암마꺼튼', '커튼', '폭스퍼야상', '여성 반바지', '2018험멜 벤치코트', 'dhc 화장품',
       '트렌치코트', '바로톡흐는곳 바로톡작성', '벤치코트', '비비고삼계탕', '바자르커텐 타이백', '우산',
       '레이스암마꺼튼', '모르간 트렌치코트', '암막커텐 세트', '바로톡작성하는곳', '레이스암마커튼',
       '삼성냉장고 t9000', '하성아카시아벌꿀', '송지나', 'lg냉장고', '벨라웨딩커튼 지나송', '폭스퍼',
       '삼성4도어냉장고', '삼계탕', '인디핑크', '지나송웨딩로망', '지나송로망', '종가집 열무김치',
       '벨라웨딩커튼', '하성벌꿀', '4시간특가', '커튼타슬', '폭스벤치코트', '콜마', '콜마 화장품',
       '바로톡흐는곳', '커튼타슬 백'], dtype=object)

In [24]:
df_transaction.loc[df_transaction['clnt_id'] == 46288]

Unnamed: 0,clnt_id,trans_id,trans_seq,biz_unit,pd_c,de_dt,de_tm,buy_am,buy_ct,clac_nm1,clac_nm2,clac_nm3
581854,46288,72995,1,A02,777,2019-08-10,12:07,33500,1,Home Decor / Lighting,Curtains / Blinds,Curtains
581855,46288,81493,1,A02,196,2019-08-21,17:57,43900,1,Chilled Foods,Packaged Side Dishes,
581856,46288,92912,1,A02,312,2019-09-04,15:24,59750,1,Cosmetics / Beauty Care,Makeup,Eyebrow
581857,46288,60905,1,A02,64,2019-07-26,11:58,76900,1,Bedding / Handicraft,Adults' Bedding,Adults' Bedding Sets
581858,46288,53545,1,A02,64,2019-07-17,16:02,76900,1,Bedding / Handicraft,Adults' Bedding,Adults' Bedding Sets
581859,46288,104679,1,A02,64,2019-09-19,15:53,72820,1,Bedding / Handicraft,Adults' Bedding,Adults' Bedding Sets
581860,46288,41599,1,A02,981,2019-07-03,18:03,51200,1,Meats,Processed Meats,Processed Meats for Ham
581861,46288,46880,1,A02,64,2019-07-09,21:05,71910,1,Bedding / Handicraft,Adults' Bedding,Adults' Bedding Sets


In [25]:
len(df_action['sech_kwd'].unique())

101952

In [26]:
df_action['sech_kwd'].value_counts()

우유               8985
두부               5210
계란               5039
생수               4283
수박               2694
                 ... 
들기름 320             1
여름골프웨어              1
볶음깨                 1
리바이스오리지널남성청바지       1
나트라케어 팬티라이너         1
Name: sech_kwd, Length: 101952, dtype: int64

In [27]:
df_product.loc[df_product['clac_nm3'] == 'Fresh Milk']

Unnamed: 0,pd_c,clac_nm1,clac_nm2,clac_nm3
346,347,Dairy Products,Milk,Fresh Milk


In [30]:
df_transaction.loc[df_transaction['pd_c'] == '0347']

Unnamed: 0,clnt_id,trans_id,trans_seq,biz_unit,pd_c,de_dt,de_tm,buy_am,buy_ct,clac_nm1,clac_nm2,clac_nm3
198,23275,104855,3,A03,0347,2019-09-20,09:34,4790,1,Dairy Products,Milk,Fresh Milk
310,72091,105063,5,A03,0347,2019-09-20,15:00,4780,1,Dairy Products,Milk,Fresh Milk
311,72091,105063,8,A03,0347,2019-09-20,15:00,2400,1,Dairy Products,Milk,Fresh Milk
312,68923,104942,12,A03,0347,2019-09-20,11:03,4790,1,Dairy Products,Milk,Fresh Milk
313,28522,105027,7,A03,0347,2019-09-20,13:54,4790,1,Dairy Products,Milk,Fresh Milk
...,...,...,...,...,...,...,...,...,...,...,...,...
594079,20602,66431,2,A01,0347,2019-08-02,06:40,34000,1,Dairy Products,Milk,Fresh Milk
594119,4625,82920,1,A01,0347,2019-08-23,10:03,21900,1,Dairy Products,Milk,Fresh Milk
595475,5432,92644,1,A01,0347,2019-09-04,13:29,59900,1,Dairy Products,Milk,Fresh Milk
595965,46078,112683,1,A01,0347,2019-09-30,13:20,18900,1,Dairy Products,Milk,Fresh Milk


In [31]:
df_transaction.loc[df_transaction['clnt_id'] == 37474]

Unnamed: 0,clnt_id,trans_id,trans_seq,biz_unit,pd_c,de_dt,de_tm,buy_am,buy_ct,clac_nm1,clac_nm2,clac_nm3
5440,37474,107384,1,A03,0572,2019-09-23,11:23,7790,1,Fruits,Imported Fruits,Kiwi
5518,37474,107384,9,A03,0670,2019-09-23,11:23,13900,1,Grains,Rice,Rice
5764,37474,107384,4,A03,0172,2019-09-23,11:23,1000,1,Chilled Foods,Chilled Beverages,Chilled Fruit and Vegetable Beverages
5815,37474,107384,6,A03,1213,2019-09-23,11:23,1290,1,Snack Foods,Snacks,General Snacks
5816,37474,107384,7,A03,1213,2019-09-23,11:23,1290,1,Snack Foods,Snacks,General Snacks
5817,37474,107384,8,A03,1213,2019-09-23,11:23,1080,1,Snack Foods,Snacks,General Snacks
5818,37474,107384,2,A03,1213,2019-09-23,11:23,1290,1,Snack Foods,Snacks,General Snacks
5819,37474,107384,5,A03,1215,2019-09-23,11:23,1260,1,Snack Foods,Snacks,Potato Snacks
5840,37474,107384,3,A03,0113,2019-09-23,11:23,900,1,Beverages,Water,Sparkling Water
5842,37474,107384,10,A03,0112,2019-09-23,11:23,1000,1,Beverages,Tea Drinks,Korean Traditional Tea Drinks


In [32]:
df_action.loc[df_action['clnt_id'] == 37474]

Unnamed: 0,clnt_id,sess_id,hit_seq,action_type,biz_unit,sess_dt,hit_tm,hit_pss_tm,trans_id,sech_kwd,tot_pag_view_ct,tot_sess_hr_v,trfc_src,dvc_ctg_nm
2682293,37474,11,38,0,A03,2019-07-12,10:36,1794100,,팬티라이너,42.0,1997.0,DIRECT,
2682294,37474,20,26,0,A03,2019-08-10,12:17,854942,,오이,36.0,1205.0,DIRECT,
2682295,37474,26,43,0,A03,2019-09-11,11:14,1461359,,공기대접,55.0,1695.0,DIRECT,
2682296,37474,33,2,0,A03,2019-09-16,22:06,25442,,한우물볶음밥,6.0,98.0,DIRECT,
2682297,37474,29,3,0,A03,2019-09-15,17:24,139022,,생수,33.0,2083.0,DIRECT,
2682298,37474,29,5,0,A03,2019-09-15,17:34,724005,,생수,33.0,2083.0,DIRECT,
2682299,37474,3,37,0,A03,2019-07-01,11:13,1879299,,장조림용소고기,44.0,2470.0,DIRECT,
2682300,37474,1,16,0,A03,2019-07-01,02:24,625772,,장조림용,13.0,673.0,DIRECT,
2682301,37474,38,3,0,A03,2019-09-23,11:20,137186,,육포,23.0,342.0,DIRECT,
2682302,37474,26,39,0,A03,2019-09-11,11:13,1419140,,종이컵,55.0,1695.0,DIRECT,
