## Brand Recommender based on an implicit ALS model

To go through the code just point to the appropriate file path and hit shift+enter to run each cell. At the bottom of the page are some sample outputs as well but I have also attached a csv file with these predictions. I ran it on 4,185 brands out of the total number which is seen as the output called list1. The numerical value is the confidence that that brand is a match to the first brand which always has a confidence of 1. One unique property of ALS is that the confidence increases with a lower number of value counts. Ideally if I could discard values below a certain threshold of value counts.




In [73]:
import pandas as pd
import numpy as np
import scipy
from scipy.sparse import coo_matrix
import implicit

In [74]:
#Read in the data file
path = 'brands_filtered.txt'
df = pd.read_table(path, sep='\t')

Set up a 'values' column which assigns a 1 for each purchase of a brand by a user. Since there are many data points, I built a sparse matrix with coo_matrix which doesn't take up much memory.

In [75]:
def create_sparse(dataframe):
    dataframe['values'] = 1
    shopping_profile_id_u = list(np.sort(dataframe.shopping_profile_id.unique())) #Create sorted profile list
    brand_id_u = list(np.sort(dataframe.brand_id.unique())) #Create sorted brand list
    data = dataframe['values'].astype(float).tolist() #Create list of all the 1's I assigned for a purchase
    #Create row and column objects to build the sparse matrix with.
    row = dataframe.brand_id.astype('category', categories= brand_id_u).cat.codes 
    col = dataframe.shopping_profile_id.astype('category', categories=shopping_profile_id_u).cat.codes
    sparse = coo_matrix((data, (row, col)), shape=(len(brand_id_u), len(shopping_profile_id_u)))
    return sparse

In [76]:
sparse_matrix = create_sparse(df) #Create the sparse matrix

For the fitting of the model I would normally perform some form of a hyperparameter search in order to fine tune the algorithm but due to time constraints I stuck to the defaults.

In [77]:
def fit_model(sparse):
    model = implicit.als.AlternatingLeastSquares(factors=100)
    print "Fitting the model... \n"
    model.fit(sparse)
    print "Done!"
    return model

In [78]:
model = fit_model(sparse_matrix)

Fitting the model... 

Done!


This function takes the fitted model and the brand name that the user inputted and outputs the most similar brands to this one based off of the data given. 

In [79]:
def predict(brand_name, models):
    unique_brands = np.sort(df.brand_id.unique())
    try:
        b_id = df.at[df[df['name']== brand_name].index[0], 'brand_id'] #get the brand id of the input
    except:
        print "brand does not exist"
    arr_val = np.where(unique_brands == b_id) #get the array position in the sparse matrix of the brand
    related = models.similar_items(arr_val[0][0]) #Feed the value into the similarity calculation of the model
    similar = []
    scores = []
    #Convert the sparse matrix positions back to brand names
    for i in related:
        value = int(i[0])
        scores.append(str(i[1]))
        similar_id = unique_brands[value]        
        similar.append(df.loc[df['brand_id'] == similar_id, 'name'].tolist()[0])
    return zip(similar, scores)

In [None]:
unique = df['name'].unique()
list1 = []
for name in unique[4186]:
    list1.append(predict(name, model))
    

In [17]:
list1

[[('BCBG MAX AZRIA', '1.0'),
  ('MICHAEL Michael Kors', '0.212909063687'),
  ('Steve Madden', '0.202693946502'),
  ('Michael Kors', '0.16452204669'),
  ('BCBGirls', '0.152628821119'),
  ('Tory Burch', '0.145836928809'),
  ('J.Crew', '0.138026052446'),
  ('Isaac Mizrahi', '0.125207715556'),
  ('Monique Lhuillier', '0.123510080443'),
  ('Max Mara', '0.11830716188')],
 [('Marc by Marc Jacobs', '1.0'),
  ('Marc Jacobs', '0.384379272803'),
  ('Burberry', '0.239677836706'),
  ('Chlo\xc3\xa9', '0.212122521656'),
  ('Diane von Furstenberg', '0.198390561418'),
  ('Michael Kors', '0.187503188589'),
  ('Jimmy Choo', '0.183985625036'),
  ('Gucci', '0.180952325264'),
  ('Yves Saint Laurent', '0.17006339849'),
  ('Coach', '0.144591906596')],
 [('Steve Madden', '1.0'),
  ('MICHAEL Michael Kors', '0.25409016262'),
  ('BCBG MAX AZRIA', '0.202693946502'),
  ("Victoria's Secret", '0.175590497529'),
  ('Forever 21', '0.164521141151'),
  ('Christian Louboutin', '0.147495562648'),
  ('Juicy Couture', '0.142

In [82]:
predict('Kate Spade', model)

[('Kate Spade', '1.0'),
 ('MICHAEL Michael Kors', '0.231801115111'),
 ('Tory Burch', '0.228936263349'),
 ('Lilly Pulitzer', '0.206599036834'),
 ('Topshop', '0.187133167932'),
 ('Christian Louboutin', '0.186116382761'),
 ('Longchamp', '0.178716982117'),
 ('Jeffrey Campbell', '0.172909921158'),
 ('Marc by Marc Jacobs', '0.172517979584'),
 ('Diane von Furstenberg', '0.15664884989')]

In [81]:
df.name.value_counts()

Marc by Marc Jacobs      119644
Christian Louboutin      115665
Burberry                 112238
Gucci                    102444
Marc Jacobs               93478
Jimmy Choo                88514
Prada                     86677
Diane von Furstenberg     82491
Yves Saint Laurent        82203
Chloé                     82130
Victoria's Secret         74553
Forever 21                67716
Juicy Couture             61811
BCBG MAX AZRIA            59669
GUESS                     55389
Steve Madden              52627
Michael Kors              51732
Tory Burch                50774
J.Crew                    50752
Calvin Klein              48229
Christian Dior            46688
MICHAEL Michael Kors      45815
Miu Miu                   43811
Fendi                     37837
Manolo Blahnik            36018
Free People               34602
Kate Spade                34039
Coach                     33588
Alexander McQueen         33421
Alexander Wang            32484
                          ...  
Catherin