# Recommender Systemns

"A recommender system, or a recommendation system (sometimes replacing system with terms such as platform, engine, or algorithm), is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer."

## User-Based Colabborative Filtering

- Build a matrix of things each user bought/viewed/rated.

- Compute similarity scores between users

- Find users similar to you

- Recommend stuff they bought/viewed/rated that you haven't

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcR5KKqpokcJyrw_g1aWT70JcQJhuSnXZnar1A&s)

### Problems with User-Based Colabborative Filtering

- People are fickle (tastes change)

- There are usually many more people than things (finding similarities between people is a bigger computacional problem than finding similarities between items)

- People do bad things (fabricate fake personas) (or Shilling attack)

Advice: base your system on data from people who spent money (ex. who bought instead of someone who just saw)

## Item-Based Colabborative Filtering

Recommendations bases on relationships between things (instead of people). Similarities between items.

- A thing will stay the same (ex. a movie will not change)

- There are usually fewer things than people (less computation to do)

- Harder to game the system

<img src="https://miro.medium.com/v2/resize:fit:1400/1*3ALliiz9hG79_2xopzgyrQ.png" alt="Drawing" style="width: 550px;"/>

One way of doing it:

- Find every pair of movies that were watched by the same person.

- Measure the similarity of their ratings across all users who watched both

- Sort by movie, then by similarity strenght

# Movies similarities

In [14]:
import pandas as pd
import numpy as np

In [10]:
r_cols = ['user_id','movie_id','rating']
ratings = pd.read_csv('.\\MLCourse\\ml-100k\\u.data', sep = "\t", names = r_cols, usecols=range(3))

m_cols = ['movie_id','title']
movies = pd.read_csv('.\\MLCourse\\ml-100k\\u.item', sep = "|", names = m_cols, usecols=range(2), encoding='latin-1')

ratings = pd.merge(movies,ratings)
ratings.shape


(100003, 4)

In [11]:
ratings

Unnamed: 0,movie_id,title,user_id,rating
0,1,Toy Story (1995),308,4
1,1,Toy Story (1995),287,5
2,1,Toy Story (1995),148,4
3,1,Toy Story (1995),280,4
4,1,Toy Story (1995),66,3
...,...,...,...,...
99998,1678,Mat' i syn (1997),863,1
99999,1679,B. Monkey (1998),863,3
100000,1680,Sliding Doors (1998),863,2
100001,1681,You So Crazy (1994),896,3


In [12]:
ratings['rating'].describe()

count    100003.000000
mean          3.529864
std           1.125704
min           1.000000
25%           3.000000
50%           4.000000
75%           4.000000
max           5.000000
Name: rating, dtype: float64

In [17]:
len(np.unique(ratings['movie_id'])), len(np.unique(ratings['title']))

(1682, 1664)

In [18]:
len(np.unique(ratings['user_id']))

944

Format to useful to find correlations between users and between movies

In [13]:
movieRatings = ratings.pivot_table(index=['user_id'], columns=['title'], values = 'rating')
movieRatings

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
939,,,,,,,,,,,...,,,,,,,,,,
940,,,,,,,,,,,...,,,,,,,,,,
941,,,,,,,,,,,...,,,,,,,,,,
942,,,,,,,,3.0,,3.0,...,,,,,,,,,,


In [19]:
movieRatings.shape

(944, 1664)

In [21]:
starwarsRating = movieRatings['Star Wars (1977)']
starwarsRating.describe()

count    584.000000
mean       4.359589
std        0.880985
min        1.000000
25%        4.000000
50%        5.000000
75%        5.000000
max        5.000000
Name: Star Wars (1977), dtype: float64

In [22]:
#pairwise correlation
similarMovies = movieRatings.corrwith(starwarsRating)
similarMovies = similarMovies.dropna()
df = pd.DataFrame(similarMovies)
df

  c /= stddev[:, None]
  c /= stddev[None, :]
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c *= np.true_divide(1, fact)


Unnamed: 0_level_0,0
title,Unnamed: 1_level_1
'Til There Was You (1997),0.872872
1-900 (1994),-0.645497
101 Dalmatians (1996),0.211132
12 Angry Men (1957),0.184289
187 (1997),0.027398
...,...
Young Guns (1988),0.186377
Young Guns II (1990),0.228615
"Young Poisoner's Handbook, The (1995)",-0.007374
Zeus and Roxanne (1997),0.818182


In [28]:
similarMovies.sort_values(ascending = False)

title
Hollow Reed (1996)            1.0
Commandments (1997)           1.0
Cosi (1996)                   1.0
No Escape (1994)              1.0
Stripes (1981)                1.0
                             ... 
For Ever Mozart (1996)       -1.0
Frankie Starlight (1995)     -1.0
I Like It Like That (1994)   -1.0
American Dream (1990)        -1.0
Theodore Rex (1995)          -1.0
Length: 1410, dtype: float64

It's not correct/normal to have perfect correlation between movies.

It could be a really small number of users who whatched Star Wars movie and another random film.

However, we want to our recommendation system to be based on such few users for the perfect match?

In [46]:
# for example:
movieRatings[["Hollow Reed (1996)",'Star Wars (1977)']].dropna()

title,Hollow Reed (1996),Star Wars (1977)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1
655,4.0,4.0
662,2.0,3.0
782,2.0,3.0


In [47]:
# for example:
movieRatings[["Commandments (1997)",'Star Wars (1977)']].dropna()

title,Commandments (1997),Star Wars (1977)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1
57,3.0,5.0
345,3.0,5.0
782,2.0,3.0


In [48]:
# for example:
movieRatings[["Theodore Rex (1995)",'Star Wars (1977)']].dropna()

title,Theodore Rex (1995),Star Wars (1977)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1.0,5.0
276,1.0,5.0
648,1.0,5.0
847,3.0,4.0


So it's important to make a minimum bound of people who watched the movie.

In [49]:
# identify how many times films have been watched e mean rating
movieStats = ratings.groupby('title').agg({'rating': [np.size,np.mean]})
movieStats

  movieStats = ratings.groupby('title').agg({'rating': [np.size,np.mean]})


Unnamed: 0_level_0,rating,rating
Unnamed: 0_level_1,size,mean
title,Unnamed: 1_level_2,Unnamed: 2_level_2
'Til There Was You (1997),9,2.333333
1-900 (1994),5,2.600000
101 Dalmatians (1996),109,2.908257
12 Angry Men (1957),125,4.344000
187 (1997),41,3.024390
...,...,...
Young Guns II (1990),44,2.772727
"Young Poisoner's Handbook, The (1995)",41,3.341463
Zeus and Roxanne (1997),6,2.166667
unknown,9,3.444444


Filtering data for movies rated by at least 100 people

In [52]:
popularMovies = movieStats['rating']['size'] >=100
movieStats[popularMovies].sort_values([('rating','mean')], ascending = False) #top movies

Unnamed: 0_level_0,rating,rating
Unnamed: 0_level_1,size,mean
title,Unnamed: 1_level_2,Unnamed: 2_level_2
"Close Shave, A (1995)",112,4.491071
Schindler's List (1993),298,4.466443
"Wrong Trousers, The (1993)",118,4.466102
Casablanca (1942),243,4.456790
"Shawshank Redemption, The (1994)",283,4.445230
...,...,...
Spawn (1997),143,2.615385
Event Horizon (1997),127,2.574803
Crash (1996),128,2.546875
Jungle2Jungle (1997),132,2.439394


In [81]:
df = movieStats[popularMovies].droplevel(0,axis=1).join(pd.DataFrame(similarMovies, columns=['similarity'])) #similarity to Star Wars
df

Unnamed: 0_level_0,size,mean,similarity
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
101 Dalmatians (1996),109,2.908257,0.211132
12 Angry Men (1957),125,4.344000,0.184289
2001: A Space Odyssey (1968),259,3.969112,0.230884
Absolute Power (1997),127,3.370079,0.085440
"Abyss, The (1989)",151,3.589404,0.203709
...,...,...,...
Willy Wonka and the Chocolate Factory (1971),326,3.631902,0.221902
"Wizard of Oz, The (1939)",246,4.077236,0.266335
"Wrong Trousers, The (1993)",118,4.466102,0.216204
Young Frankenstein (1974),200,3.945000,0.192589


In [82]:
df.sort_values(['similarity'], ascending=False)

Unnamed: 0_level_0,size,mean,similarity
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Star Wars (1977),584,4.359589,1.000000
"Empire Strikes Back, The (1980)",368,4.206522,0.748353
Return of the Jedi (1983),507,4.007890,0.672556
Raiders of the Lost Ark (1981),420,4.252381,0.536117
Austin Powers: International Man of Mystery (1997),130,3.246154,0.377433
...,...,...,...
"Edge, The (1997)",113,3.539823,-0.127167
As Good As It Gets (1997),112,4.196429,-0.130466
Crash (1996),128,2.546875,-0.148507
G.I. Jane (1997),175,3.360000,-0.176734


# Item-Based Collaborative Filtering

In [83]:
movieRatings.head()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


In [84]:
corrMatrix = movieRatings.corr()
corrMatrix

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Til There Was You (1997),1.0,,-1.000000e+00,-0.500000,-0.500000,0.522233,,-4.264014e-01,,,...,,,,,,,,,,
1-900 (1994),,1.0,,,,,,-9.819805e-01,,,...,,,,-0.944911,,,,,,
101 Dalmatians (1996),-1.0,,1.000000e+00,-0.049890,0.269191,0.048973,0.266928,-4.340657e-02,,0.111111,...,,-1.000000,,0.158840,0.119234,0.680414,-4.875600e-17,0.707107,,
12 Angry Men (1957),-0.5,,-4.989024e-02,1.000000,0.666667,0.256625,0.274772,1.788483e-01,,0.457176,...,,,,0.096546,0.068944,-0.361961,1.443376e-01,1.000000,1.0,
187 (1997),-0.5,,2.691910e-01,0.666667,1.000000,0.596644,,-5.547002e-01,,1.000000,...,,0.866025,,0.455233,-0.500000,0.500000,4.753271e-01,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Young Guns II (1990),,,6.804138e-01,-0.361961,0.500000,0.132017,-0.518476,-1.547646e-01,,-0.054554,...,,,,0.355001,0.722460,1.000000,8.660254e-01,,,
"Young Poisoner's Handbook, The (1995)",,,-4.875600e-17,0.144338,0.475327,0.204926,0.623795,-3.417534e-01,,0.707107,...,,,,-0.413197,-0.019672,0.866025,1.000000e+00,,,
Zeus and Roxanne (1997),,,7.071068e-01,1.000000,,,,-1.000000e+00,,,...,,,,,,,,1.000000,,
unknown,,,,1.000000,,,,-1.986027e-16,,,...,,,,0.866025,,,,,1.0,


In [85]:
corrMatrix = movieRatings.corr(method='pearson', min_periods=100) #filtering for movies rated at least 100 times
corrMatrix

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Til There Was You (1997),,,,,,,,,,,...,,,,,,,,,,
1-900 (1994),,,,,,,,,,,...,,,,,,,,,,
101 Dalmatians (1996),,,1.0,,,,,,,,...,,,,,,,,,,
12 Angry Men (1957),,,,1.0,,,,,,,...,,,,,,,,,,
187 (1997),,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Young Guns II (1990),,,,,,,,,,,...,,,,,,,,,,
"Young Poisoner's Handbook, The (1995)",,,,,,,,,,,...,,,,,,,,,,
Zeus and Roxanne (1997),,,,,,,,,,,...,,,,,,,,,,
unknown,,,,,,,,,,,...,,,,,,,,,,


Recommendation for user ID 0

In [88]:
myRatings = movieRatings.loc[0].dropna()
myRatings

title
Empire Strikes Back, The (1980)    5.0
Gone with the Wind (1939)          1.0
Star Wars (1977)                   5.0
Name: 0, dtype: float64

In [93]:
similarCandidates = pd.Series()

for i in range(0, len(myRatings.index)):
    print("Adding similarities for "+myRatings.index[i]+"...")

    #retrieve similar movies to this on that I rated
    sims = corrMatrix[myRatings.index[i]].dropna()

    #now scale its similarity by how well I rated this movie
    sims = sims.map(lambda x: x*myRatings[i])

    #add the score to the list of similarity candidates

    similarCandidates = similarCandidates._append(sims)

print("sorting...")
similarCandidates.sort_values(inplace=True, ascending=False)
print(similarCandidates.head(10))

Adding similarities for Empire Strikes Back, The (1980)...
Adding similarities for Gone with the Wind (1939)...
Adding similarities for Star Wars (1977)...
sorting...
Empire Strikes Back, The (1980)                       5.000000
Star Wars (1977)                                      5.000000
Empire Strikes Back, The (1980)                       3.741763
Star Wars (1977)                                      3.741763
Return of the Jedi (1983)                             3.606146
Return of the Jedi (1983)                             3.362779
Raiders of the Lost Ark (1981)                        2.693297
Raiders of the Lost Ark (1981)                        2.680586
Austin Powers: International Man of Mystery (1997)    1.887164
Sting, The (1973)                                     1.837692
dtype: float64


  sims = sims.map(lambda x: x*myRatings[i])
  similarCandidates = similarCandidates._append(sims)
  sims = sims.map(lambda x: x*myRatings[i])
  sims = sims.map(lambda x: x*myRatings[i])


refine results

In [94]:
similarCandidates = similarCandidates.groupby(similarCandidates.index).sum()

In [95]:
similarCandidates.sort_values(inplace=True, ascending=False)
similarCandidates

Empire Strikes Back, The (1980)              8.877450
Star Wars (1977)                             8.870971
Return of the Jedi (1983)                    7.178172
Raiders of the Lost Ark (1981)               5.519700
Indiana Jones and the Last Crusade (1989)    3.488028
                                               ...   
Annie Hall (1977)                           -0.511775
Real Genius (1985)                          -0.552871
Remains of the Day, The (1993)              -0.560337
This Is Spinal Tap (1984)                   -0.636474
First Wives Club, The (1996)                -0.972480
Length: 268, dtype: float64

filter movies already rated

In [97]:
myRatings.index #movies already rated

Index(['Empire Strikes Back, The (1980)', 'Gone with the Wind (1939)',
       'Star Wars (1977)'],
      dtype='object', name='title')

In [96]:
filteredSimilarities = similarCandidates.drop(myRatings.index)
filteredSimilarities

Return of the Jedi (1983)                    7.178172
Raiders of the Lost Ark (1981)               5.519700
Indiana Jones and the Last Crusade (1989)    3.488028
Bridge on the River Kwai, The (1957)         3.366616
Back to the Future (1985)                    3.357941
                                               ...   
Annie Hall (1977)                           -0.511775
Real Genius (1985)                          -0.552871
Remains of the Day, The (1993)              -0.560337
This Is Spinal Tap (1984)                   -0.636474
First Wives Club, The (1996)                -0.972480
Length: 265, dtype: float64