# RECOMMENDATION SYSTEMS
Before coding recommendation systems, it's useful to consider general recommendation strategies. Imagine being a salesperson recommending items to a customer: if you know them, you can make personalized suggestions based on their preferences. For a new customer, you might base suggestions on what they browse, or, if they haven't browsed yet, recommend popular items. This need to recommend without prior customer knowledge is called the "cold-start problem." In such cases, recommending popular items is a straightforward approach that online retailers also use when they lack visitor data.

In [1]:
#Popularity-Based Recommendations
import pandas as pd
import numpy as np
interaction=pd.read_csv('https://bradfordtuckfield.com/purchasehistory1.csv')
interaction.set_index("Unnamed: 0", inplace = True)
print(interaction)

            user1  user2  user3  user4  user5
Unnamed: 0                                   
item1           1      1      0      1      1
item2           1      0      1      1      0
item3           1      1      0      1      1
item4           1      0      1      0      1
item5           1      1      0      0      1


In [2]:
interaction_withcounts=interaction.copy()
interaction_withcounts.loc[:,'counts']=interaction_withcounts.sum(axis=1)
interaction_withcounts=interaction_withcounts.sort_values(by='counts',ascending=False)
print(list(interaction_withcounts.index))

['item1', 'item3', 'item2', 'item4', 'item5']


In [4]:
def popularity_based(interaction):
  interaction_withcounts=interaction.copy()
  interaction_withcounts.loc[:,'counts']=interaction_withcounts.sum(axis=1)
  sorted = interaction_withcounts.sort_values(by='counts',ascending=False)
  most_popular=list(sorted.index)
  return(most_popular)

In [5]:
#Item-Based Collaborative Filtering
#Measuring Vector Similarity
print(list(interaction.loc['item1',:]))

[1, 1, 0, 1, 1]


In [6]:
#Calculating Cosine Similarity
def dot_product(vector1,vector2):
  thedotproduct=np.sum([vector1[k]*vector2[k] for k in range(0,len(vector1))])
  return(thedotproduct)

In [7]:
def vector_norm(vector):
  thenorm=np.sqrt(dot_product(vector,vector))
  return(thenorm)

In [8]:
def cosine_similarity(vector1,vector2):
  thedotproduct=dot_product(vector1,vector2)
  thecosine=thedotproduct/(vector_norm(vector1)*vector_norm(vector2))
  thecosine=np.round(thecosine,4)
  return(thecosine)

In [9]:
import numpy as np
item1=interaction.loc['item1',:]
item3=interaction.loc['item3',:]
print(cosine_similarity(item1,item3))

1.0


  thedotproduct=np.sum([vector1[k]*vector2[k] for k in range(0,len(vector1))])


In [10]:
#Implementing Item-Based Collaborative Filtering
ouritem='item1'
otherrows=[rowname for rowname in interaction.index if rowname!=ouritem]
otheritems=interaction.loc[otherrows,:]
theitem=interaction.loc[ouritem,:]

In [12]:
similarities=[]
for items in otheritems.index:
  similarities.append(cosine_similarity(theitem,otheritems.loc[items,:]))
otheritems['similarities']=similarities
recommendations = list(otheritems.sort_values(by='similarities',ascending=False).index)
print(recommendations)

['item3', 'item5', 'item2', 'item4']


  thedotproduct=np.sum([vector1[k]*vector2[k] for k in range(0,len(vector1))])


In [13]:
def get_item_recommendations(interaction,itemname):
  otherrows=[rowname for rowname in interaction.index if rowname!=itemname]
  otheritems=interaction.loc[otherrows,:]
  theitem=list(interaction.loc[itemname,:])
  similarities=[]
  for items in otheritems.index:
    similarities.append(cosine_similarity(theitem,list(otheritems.loc[items,:])))
  otheritems['similarities']=similarities
  return list(otheritems.sort_values(by='similarities',ascending=False).index)

In [14]:
#User-Based Collaborative Filtering
user2=interaction.loc[:,'user2']
user5=interaction.loc[:,'user5']
print(cosine_similarity(user2,user5))

0.866


  thedotproduct=np.sum([vector1[k]*vector2[k] for k in range(0,len(vector1))])


In [15]:
user3=interaction.loc[:,'user3']
user5=interaction.loc[:,'user5']
print(cosine_similarity(user3,user5))

0.3536


  thedotproduct=np.sum([vector1[k]*vector2[k] for k in range(0,len(vector1))])


In [16]:
def get_similar_users(interaction,username):
  othercolumns=[columnname for columnname in interaction.columns if columnname!=username]
  otherusers=interaction[othercolumns]
  theuser=list(interaction[username])
  similarities=[]
  for users in otherusers.columns:
    similarities.append(cosine_similarity(theuser,list(otherusers.loc[:,users])))
  otherusers.loc['similarities',:]=similarities
  return list(otherusers.sort_values(by='similarities',axis=1,ascending=False).columns)

In [17]:
def get_user_recommendations(interaction,username):
  similar_users=get_similar_users(interaction,username)
  purchase_history=interaction[similar_users[0]]
  purchased=list(purchase_history.loc[purchase_history==1].index)
  purchased2=list(interaction.loc[interaction[username]==1,:].index)
  recs=sorted(list(set(purchased) - set(purchased2)))
  return(recs)

In [18]:
get_user_recommendations(interaction,'user2')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherusers.loc['similarities',:]=similarities


['item4']

In [19]:
#Case Study: Music Recommendations
import pandas as pd
lastfm = pd.read_csv("https://bradfordtuckfield.com/lastfm-matrix-germany.csv")
print(lastfm.head())

   user  a perfect circle  abba  ac/dc  adam green  aerosmith  afi  air  \
0     1                 0     0      0           0          0    0    0   
1    33                 0     0      0           1          0    0    0   
2    42                 0     0      0           0          0    0    0   
3    51                 0     0      0           0          0    0    0   
4    62                 0     0      0           0          0    0    0   

   alanis morissette  alexisonfire  ...  timbaland  tom waits  tool  \
0                  0             0  ...          0          0     0   
1                  0             0  ...          0          0     0   
2                  0             0  ...          0          0     0   
3                  0             0  ...          0          0     0   
4                  0             0  ...          0          0     0   

   tori amos  travis  trivium  u2  underoath  volbeat  yann tiersen  
0          0       0        0   0          0        

In [20]:
lastfm.drop(['user'],axis=1,inplace=True)

In [21]:
lastfmt=lastfm.T

In [22]:
print(lastfmt.shape)

(285, 1257)


In [23]:
get_item_recommendations(lastfmt,'abba')[0:10]

['madonna',
 'robbie williams',
 'elvis presley',
 'michael jackson',
 'queen',
 'the beatles',
 'kelly clarkson',
 'groove coverage',
 'duffy',
 'mika']

In [24]:
print(get_user_recommendations(lastfmt,0)[0:3])

  thecosine=thedotproduct/(vector_norm(vector1)*vector_norm(vector2))


['billy talent', 'bob marley', 'die toten hosen']


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherusers.loc['similarities',:]=similarities


Summary: I discussed recommendation systems, beginning with popularity-based models to illustrate how to suggest trending items and bestsellers. I then explored collaborative filtering, covering how to measure item and customer similarity and how to apply these measures to make item-based and user-based recommendations. I presented a case study using collaborative-filtering code to generate recommendations for a music-streaming service. I concluded with advanced considerations, such as alternative approaches and additional data sources that could enhance recommendations.