# Recommendation Engines

''' 3. Recommender (User Similarity)
Our recommender will be a modified KNN collaborative algorithm.
Input: A given user's brands that they like
Output: A set (no repeats) of brand recommendations based on
        similar users preferences

1. When a user's brands are given to us, we will calculate the input user's
jaccard similarity with every person in our brandsfor dictionary

2. We will pick the K most similar users and recommend
the brands that they like that the given user doesn't know about

EXAMPLE:
Given User likes ['Target', 'Old Navy', 'Banana Republic', 'H&M']
Outputs: ['Forever 21', 'Gap', 'Steve Madden']
'''

In [1]:
# 1. Load and Summarise the data
import pandas as pd

#read in brands data - fill in
user_brands = pd.read_csv('../../data/user_brand.csv')

# Series of user IDs, note the duplicates
user_ids = user_brands.ID

In [2]:
# turns my data frame into a dictionary
# where the key is a user ID, and the value is a 
# list of stores that the user "likes"
brandsfor = {str(k): list(v) for k,v in user_brands.groupby("ID")["Store"]}


In [3]:
'''
EXERCISE: Complete the jaccard method below.
          It should take in a list of brands, and output the 
          jaccard similarity between them

This should work with anything in the set, for example
jaccard([1,2,3], [2,3,4,5,6])  == .3333333

HINT: set1 & set2 is the intersection
      set1 | set2 is the union

'''

def jaccard(first, second):
  first = set(first)
  second = set(second)
  return float(len(first & second)) / len(first | second)

In [4]:
# try it out!
print(brandsfor['83065']) # brands for user 83065
print(brandsfor['82983']) # brands for user 82983
jaccard(brandsfor['83065'], brandsfor['82983'])


["Kohl's", 'Target']
['Hanky Panky', 'Betsey Johnson', 'Converse', 'Steve Madden', 'Old Navy', 'Target', 'Nordstrom']


0.125

In [5]:

given_user = ['Target', 'Old Navy', 'Banana Republic', 'H&M']

#similarty between user 83065 and given user
brandsfor['83065']
jaccard(brandsfor['83065'], given_user)

0.2

In [13]:
similarities = {k: jaccard(given_user, v) for k, v in brandsfor.iteritems()}

similarities

K = 5 #number of similar users to look at

In [14]:
# Now for the top K most similar users, let's aggregate the brands they like.
# I sort by the jaccard similarty so most similar users are first
# I use the sorted method, but because I'm dorting dictionaries
# I specify the "key" as the value of the dictionary
# the key is what the list should sort on
# so the most similar users end up being on top
most_similar_users = sorted(similarities, key=similarities.get, reverse=True)[:K]


In [15]:
# list of K similar users' IDs
most_similar_users

['81012', '84807', '88549', '82970', '91362']

In [16]:
# let's see what some of the most similar users likes
print(brandsfor[most_similar_users[0]])
print(brandsfor[most_similar_users[1]])
print(brandsfor[most_similar_users[2]])
print(brandsfor[most_similar_users[3]])
print(brandsfor[most_similar_users[4]])

['Banana Republic', 'Old Navy', 'Target']
['Steve Madden', 'Banana Republic', 'Old Navy', 'Target']
['Banana Republic', 'Old Navy', 'Forever 21', 'Target']
['Banana Republic', 'Gap', 'Old Navy', 'Target']
['Banana Republic', 'Gap', 'Old Navy', 'Target']


In [17]:
# Aggregate all brands liked by the K most similar users into a single set
brands_to_recommend = set()
for user in most_similar_users:
    # for each user
    brands_to_recommend.update(set(brandsfor[user]))
    # add to the set of brands_to_recommend
    
    
brands_to_recommend


{'Banana Republic', 'Forever 21', 'Gap', 'Old Navy', 'Steve Madden', 'Target'}

In [18]:
# EXERCISE: use a set difference so brands_to_recommend only has
# brands that given_user hasn't seen yet
brands_to_recommend = brands_to_recommend - set(given_user)
brands_to_recommend

{'Forever 21', 'Gap', 'Steve Madden'}

In [20]:
import collections

# We can take this one step further and caculate a "score" of recommendation
# We will define the score as being the number of times
# a brand appears within the first K users
brands_to_recommend = []
for user in most_similar_users:
    brands_to_recommend += list(set(brandsfor[user]) - set(given_user))

print brands_to_recommend
# Use a counter to count the number of times a brand appears
recommend_with_scores = collections.Counter(brands_to_recommend)

# Now we see Gap has the highest score!
recommend_with_scores


['Steve Madden', 'Forever 21', 'Gap', 'Gap']


Counter({'Forever 21': 1, 'Gap': 2, 'Steve Madden': 1})

##4. Recommender (Item Similarity)

In [None]:
'''
We can also define a similary between items using jaccard similarity.
We can say that the similarity between two items is the jaccard similarity
between the sets of people who like the two brands.

Example: similarity of Gap to Target is:
'''

In [21]:
# filter users by liking Gap
gap_lovers = set(user_brands['Gap' == user_brands.Store].ID)
old_navy_lovers = set(user_brands['Old Navy' == user_brands.Store].ID)

In [22]:
# similarty between Gap and Old Navy
jaccard(gap_lovers, old_navy_lovers)

0.35437212360289283

In [23]:
guess_lovers = set(user_brands['Guess' == user_brands.Store].ID)
# similarty between Gap andGuess
jaccard(guess_lovers, gap_lovers)


0.21257750221434898

In [24]:
calvin_lovers = set(user_brands['Calvin Klein' == user_brands.Store].ID)
# similarty between Gap and Calvin Klein
jaccard(calvin_lovers, gap_lovers)


0.2068654019873532