# Recommender Systems (Movie Recommender in Python)

This is a python implementation of the tutorial we did in R. Here we are essentially looking at three recommender system methods, namely user and item based collaborative filtering and lastly, matrix factorisation. We use these above mentioned approaches to **build a system for recommending movies (or anything for that matter) to users based on their past viewing habits**. 

    TLDR; we apply collaborative filtering  to build a system for recommending movies to users based on their past viewing habits.

In [1]:
# get modules 

library(tidyverse)

load("output/recommender.RData")
viewed_movies

-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.3.0 --

[32mv[39m [34mggplot2[39m 3.3.2     [32mv[39m [34mpurrr  [39m 0.3.4
[32mv[39m [34mtibble [39m 3.0.3     [32mv[39m [34mdplyr  [39m 1.0.1
[32mv[39m [34mtidyr  [39m 1.1.1     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 1.3.1     [32mv[39m [34mforcats[39m 0.5.0

-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



userId,2001: A Space Odyssey (1968),Apocalypse Now (1979),"Big Lebowski, The (1998)","Bourne Identity, The (2002)",Clear and Present Danger (1994),"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)","Departed, The (2006)",Donnie Darko (2001),Ferris Bueller's Day Off (1986),...,Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001),Indiana Jones and the Temple of Doom (1984),Interview with the Vampire: The Vampire Chronicles (1994),Jumanji (1995),Kill Bill: Vol. 2 (2004),"Shining, The (1980)",Sleepless in Seattle (1993),Star Trek: Generations (1994),There's Something About Mary (1998),Up (2009)
<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,...,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0,1,1,0,1,0,0,0,0,...,0,1,0,0,0,1,0,0,0,0
20,0,0,0,1,0,1,0,0,0,...,1,0,0,1,0,0,0,0,0,0
187,1,0,1,0,0,1,0,1,0,...,0,0,1,0,1,1,0,0,0,0
198,1,0,1,0,0,1,0,0,0,...,0,1,0,0,0,1,0,0,0,0
212,1,0,1,0,0,0,0,0,1,...,1,0,0,0,0,1,0,0,0,1
222,0,1,1,0,0,0,1,0,0,...,0,0,0,1,1,0,0,0,1,1
282,0,1,1,1,0,0,1,0,1,...,0,1,0,0,1,1,0,0,0,0
328,1,1,1,1,0,0,1,1,1,...,1,1,0,0,1,1,0,0,0,1
330,1,1,1,1,0,1,1,1,1,...,1,0,1,1,1,1,1,0,1,0
372,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0


For the sake of neater display throughout, let's shorten the Harry Potter movie title.

In [2]:
viewed_movies <- rename(viewed_movies, `Harry Potter and the Philosopher's Stone (2001)` = `Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)`)

We first need to convert the data to matrix form otherwise some of the later functions we use will give an error (see what happens if you don't make the change).

In [3]:
sorted_my_users <- as.character(unlist(viewed_movies[,1]))
viewed_movies <- as.matrix(viewed_movies[,-1])
row.names(viewed_movies) <- sorted_my_users
viewed_movies

Unnamed: 0,2001: A Space Odyssey (1968),Apocalypse Now (1979),"Big Lebowski, The (1998)","Bourne Identity, The (2002)",Clear and Present Danger (1994),"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)","Departed, The (2006)",Donnie Darko (2001),Ferris Bueller's Day Off (1986),"Green Mile, The (1999)",Harry Potter and the Philosopher's Stone (2001),Indiana Jones and the Temple of Doom (1984),Interview with the Vampire: The Vampire Chronicles (1994),Jumanji (1995),Kill Bill: Vol. 2 (2004),"Shining, The (1980)",Sleepless in Seattle (1993),Star Trek: Generations (1994),There's Something About Mary (1998),Up (2009)
1,0,1,1,0,1,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0
20,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0
187,1,0,1,0,0,1,0,1,0,0,0,0,1,0,1,1,0,0,0,0
198,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0
212,1,0,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1
222,0,1,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,1,1
282,0,1,1,1,0,0,1,0,1,1,0,1,0,0,1,1,0,0,0,0
328,1,1,1,1,0,0,1,1,1,0,1,1,0,0,1,1,0,0,0,1
330,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1,0,1,0
372,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


## User-based collaborative filtering

### The basic idea behind user-based collaborative filtering

A really simple recommender system would just recommend the most popular movies (that a user hasn't seen before). This information is obtained by summing the values of each column of *viewed movies*:

In [4]:
t(sort(apply(viewed_movies, 2, sum), decreasing = TRUE))

"Shining, The (1980)","Big Lebowski, The (1998)",Apocalypse Now (1979),Kill Bill: Vol. 2 (2004),2001: A Space Odyssey (1968),"Departed, The (2006)",Ferris Bueller's Day Off (1986),"Green Mile, The (1999)","Bourne Identity, The (2002)",Jumanji (1995),"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",Harry Potter and the Philosopher's Stone (2001),Indiana Jones and the Temple of Doom (1984),Up (2009),Donnie Darko (2001),Interview with the Vampire: The Vampire Chronicles (1994),There's Something About Mary (1998),Sleepless in Seattle (1993),Clear and Present Danger (1994),Star Trek: Generations (1994)
11,10,9,8,7,7,7,7,6,6,5,5,5,5,4,4,4,3,2,1


This approach has an intuitive appeal but is pretty unsophisticated (everyone gets the same recommendations, barring the filtering out of seen movies!) In other words, everyone's vote counts the same.

User-based CF extends the approach by changing how much each person's vote counts. Specifically, when recommending what I should watch next, a user-based CF system will upweight the votes of people that are "more similar" to me. In this context "similar" means "has seen many of the same movies as me". You can think of this as replacing the 1's in the *viewed_movies* matrix with a number that increases with similarity to the user we're trying to recommend a movie to.

There are lots of different similarity measures. The one we'll use is called cosine similarity and is widely used, but search online for others and try them out.

Cosine similarity derives its name from the fact that it measures the cosine of the angle between two non-zero vectors. The closer the vectors lie to each other, the smaller the angle, and the closer the cosine is to 1. It can be shown that for two vectors $\boldsymbol x$ and $\boldsymbol y$:

$$cos(\theta) = \frac{\boldsymbol x \cdot \boldsymbol y}{||\boldsymbol x|| \ ||\boldsymbol y||} = \frac{\sum_{i=1}^{n}x_iy_i}{\sqrt{\sum_{i=1}^{n}x^2_i} \sqrt{\sum_{i=1}^{n}y^2_i}}$$

We can use the `crossprod()` function in R to calculate the dot products.

In [5]:
# function calculating cosine similarity
cosine_sim <- function(a, b){crossprod(a, b) / sqrt(crossprod(a) * crossprod(b))}

Cosine similarity lies between 0 and 1 inclusive and increases with similarity. Here are a few test cases to get a feel for it:

In [6]:
# maximally similar
x1 <- c(1,1,1,0,0)
x2 <- c(1,1,1,0,0)
cosine_sim(x1,x2)

0
1


In [7]:
# maximally dissimilar
x1 <- c(1,1,1,0,0)
x2 <- c(0,0,0,1,1)
cosine_sim(x1,x2)

0
0


In [8]:
x1 <- c(1,1,0,0,0)
x2 <- c(0,0,0,1,1)
cosine_sim(x1,x2)

0
0


In [9]:
# try an example from our data
as.numeric(viewed_movies[1,]) # user 1's viewing history
as.numeric(viewed_movies[2,]) # user 2's viewing history
cosine_sim(viewed_movies[1,], viewed_movies[2,])

0
0



Let's get similarities between user pairs. We'll do this with a loop below, because it's easier to see what's going on, but this will be inefficient and very slow for bigger datasets. 

> As an exercise, see if you can do the same without loops.

In [10]:
user_similarities <- matrix(0, nrow = 15, ncol = 15)
for (i in 1:14) {
  for (j in (i + 1):15) {
    user_similarities[i,j] <- cosine_sim(viewed_movies[i,], viewed_movies[j,])
  }
}
user_similarities <- user_similarities + t(user_similarities)
diag(user_similarities) <- 0
row.names(user_similarities) <- row.names(viewed_movies)
colnames(user_similarities) <- row.names(viewed_movies)
round(user_similarities, 3)

Unnamed: 0,1,20,187,198,212,222,282,328,330,372,432,434,495,562,594
1,0.0,0.0,0.309,0.667,0.333,0.309,0.68,0.471,0.408,0.471,0.289,0.594,0.365,0.408,0.167
20,0.0,0.0,0.189,0.204,0.204,0.189,0.167,0.289,0.5,0.0,0.354,0.485,0.0,0.0,0.204
187,0.309,0.189,0.0,0.617,0.463,0.286,0.378,0.546,0.661,0.436,0.401,0.55,0.338,0.189,0.154
198,0.667,0.204,0.617,0.0,0.5,0.154,0.544,0.471,0.51,0.471,0.289,0.594,0.183,0.408,0.0
212,0.333,0.204,0.463,0.5,0.0,0.309,0.408,0.707,0.51,0.471,0.289,0.594,0.365,0.408,0.0
222,0.309,0.189,0.286,0.154,0.309,0.0,0.504,0.546,0.567,0.218,0.535,0.642,0.676,0.0,0.463
282,0.68,0.167,0.378,0.544,0.408,0.504,0.0,0.77,0.667,0.385,0.589,0.728,0.745,0.5,0.136
328,0.471,0.289,0.546,0.471,0.707,0.546,0.77,0.0,0.722,0.5,0.51,0.84,0.645,0.289,0.118
330,0.408,0.5,0.661,0.51,0.51,0.567,0.667,0.722,0.0,0.433,0.619,0.849,0.559,0.5,0.51
372,0.471,0.0,0.436,0.471,0.471,0.218,0.385,0.5,0.433,0.0,0.204,0.42,0.258,0.289,0.236


In [11]:
# who are the most similar users to user 222?
t(sort(user_similarities["222",]))

222,562,198,20,372,187,1,212,594,282,432,328,330,434,495
0,0,0.1543033,0.1889822,0.2182179,0.2857143,0.3086067,0.3086067,0.46291,0.5039526,0.5345225,0.5455447,0.5669467,0.6416889,0.6761234


Let's see if this makes sense from the viewing histories. Below we show user 222's history, together with the user who is most similar to user 222 (user 495) and another user who is very dissimilar (user 562).

In [12]:
t(viewed_movies[c("222","495","562"),])

Unnamed: 0,222,495,562
2001: A Space Odyssey (1968),0,0,0
Apocalypse Now (1979),1,1,0
"Big Lebowski, The (1998)",1,1,0
"Bourne Identity, The (2002)",0,0,0
Clear and Present Danger (1994),0,0,0
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",0,0,0
"Departed, The (2006)",1,1,0
Donnie Darko (2001),0,0,0
Ferris Bueller's Day Off (1986),0,1,1
"Green Mile, The (1999)",0,0,1


### Recommending movies for a single user

As an example, let's consider the process of recommending a movie to one user, say user 222. How would we do this with a user-based collaborative filtering system? 

First, we need to know what movies have they already seen (so we don't recommend these).

In [13]:
t(viewed_movies["222",])

2001: A Space Odyssey (1968),Apocalypse Now (1979),"Big Lebowski, The (1998)","Bourne Identity, The (2002)",Clear and Present Danger (1994),"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)","Departed, The (2006)",Donnie Darko (2001),Ferris Bueller's Day Off (1986),"Green Mile, The (1999)",Harry Potter and the Philosopher's Stone (2001),Indiana Jones and the Temple of Doom (1984),Interview with the Vampire: The Vampire Chronicles (1994),Jumanji (1995),Kill Bill: Vol. 2 (2004),"Shining, The (1980)",Sleepless in Seattle (1993),Star Trek: Generations (1994),There's Something About Mary (1998),Up (2009)
0,1,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,1,1


The basic idea is now to recommend what's popular by adding up the number of users that have seen each movie, but *to weight each user by their similarity to user 222*. 

Let's work through the calculations for one movie, say 2001: A Space Odyssey (movie 1). The table below shows who's seen 2001: A Space Odyssey, and how similar each person is to user 222.

In [14]:
seen_movie <- viewed_movies[,"2001: A Space Odyssey (1968)"]
sim_to_user <- user_similarities["222",]
cbind(seen_movie,sim_to_user)

Unnamed: 0,seen_movie,sim_to_user
1,0,0.3086067
20,0,0.1889822
187,1,0.2857143
198,1,0.1543033
212,1,0.3086067
222,0,0.0
282,0,0.5039526
328,1,0.5455447
330,1,0.5669467
372,1,0.2182179


The basic idea in user-based collaborative filtering is that user 372's vote counts less than user 434's, because user 434 is more similar to user 222 (in terms of viewing history). 

Note that this only means user 434 counts more in the context of making recommendations to user 222. When recommending to users *other than user 222*, user 372 may carry more weight.

We can now work out an overall recommendation score for 2001: A Space Odyssey by multiplying together the two elements in each row of the table above, and summing these products (taking the dot product):

In [15]:
# overall score for 2001: A Space Odyssey
crossprod(viewed_movies[, "2001: A Space Odyssey (1968)"], user_similarities["222",])

0
2.721023


Note this score will increase with (a) the number of people who've seen the movie (more 1's in the first column above) and (b) if the people who've seen it are similar to user 1

Let's repeat this calculation for all movies and compare recommendation scores:

In [16]:
t(user_similarities["222",] %*% viewed_movies)

0,1
2001: A Space Odyssey (1968),2.7210226
Apocalypse Now (1979),3.9239911
"Big Lebowski, The (1998)",3.9914875
"Bourne Identity, The (2002)",2.9816377
Clear and Present Danger (1994),0.9502956
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.8376355
"Departed, The (2006)",3.4687789
Donnie Darko (2001),2.0398947
Ferris Bueller's Day Off (1986),3.2428631
"Green Mile, The (1999)",2.7100208


To come up with a final recommendation, we just need to remember to remove movies user 222 has already seen, and sort the remaining movies in descending order of recommendation score.

We do that below, after tidying up the results a bit by putting them in a data frame.

In [17]:
user_scores <- data.frame(title = colnames(viewed_movies), 
                          score = as.vector(user_similarities["222",] %*% viewed_movies), 
                          seen = as.vector(viewed_movies["222",]))
user_scores %>% filter(seen == 0) %>% arrange(desc(score))

title,score,seen
<chr>,<dbl>,<dbl>
"Shining, The (1980)",4.0681044,0
Ferris Bueller's Day Off (1986),3.2428631,0
"Bourne Identity, The (2002)",2.9816377,0
2001: A Space Odyssey (1968),2.7210226,0
"Green Mile, The (1999)",2.7100208,0
Harry Potter and the Philosopher's Stone (2001),2.2517693,0
Indiana Jones and the Temple of Doom (1984),2.1540964,0
Donnie Darko (2001),2.0398947,0
Interview with the Vampire: The Vampire Chronicles (1994),1.8500935,0
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.8376355,0


Therefore, our top recommendation for user 222 is "The Shining".

Now that we've understood the calculations, let's get recommendations for one more user, user 372:

In [18]:
# recommendations for user 372
user_scores <- data.frame(title = colnames(viewed_movies), 
                          score = as.vector(user_similarities["372",] %*% viewed_movies), 
                          seen = as.vector(viewed_movies["372",]))
user_scores %>% filter(seen == 0) %>% arrange(desc(score))

title,score,seen
<chr>,<dbl>,<dbl>
"Big Lebowski, The (1998)",4.065063,0
Kill Bill: Vol. 2 (2004),2.8549736,0
Ferris Bueller's Day Off (1986),2.7562755,0
"Green Mile, The (1999)",2.6736052,0
"Departed, The (2006)",2.4185378,0
Indiana Jones and the Temple of Doom (1984),2.2477932,0
"Bourne Identity, The (2002)",1.9421211,0
Harry Potter and the Philosopher's Stone (2001),1.8245012,0
Up (2009),1.8138306,0
Donnie Darko (2001),1.7895325,0


We would recommend "The Big Lebowski" to user 372.

### A simple function to generate a user-based CF recommendation for any user

In [19]:
# a function to generate a recommendation for any user
user_based_recommendations <- function(user, user_sim, viewed_mov){
  
  # turn into character if not already
  user <- ifelse(is.character(user), user, as.character(user))
  
  # get scores
  user_scores <- data.frame(title = colnames(viewed_mov), 
                            score = as.vector(user_sim[user,] %*% viewed_mov), 
                            seen = as.vector(viewed_mov[user,]))
  
  # sort unseen movies by score and remove the 'seen' column
  user_scores %>% 
    filter(seen == 0) %>% 
    arrange(desc(score)) %>% 
    select(-seen)
}

Let's check the function is working by running it on a user we've used before:

In [20]:
user_based_recommendations(user = 222, user_sim = user_similarities, viewed_mov = viewed_movies)

title,score
<chr>,<dbl>
"Shining, The (1980)",4.0681044
Ferris Bueller's Day Off (1986),3.2428631
"Bourne Identity, The (2002)",2.9816377
2001: A Space Odyssey (1968),2.7210226
"Green Mile, The (1999)",2.7100208
Harry Potter and the Philosopher's Stone (2001),2.2517693
Indiana Jones and the Temple of Doom (1984),2.1540964
Donnie Darko (2001),2.0398947
Interview with the Vampire: The Vampire Chronicles (1994),1.8500935
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.8376355


Now do it for all users with `lapply`:

In [21]:
lapply(sorted_my_users, user_based_recommendations, user_similarities, viewed_movies)

title,score
<chr>,<dbl>
Kill Bill: Vol. 2 (2004),3.4251921
Ferris Bueller's Day Off (1986),3.2608851
2001: A Space Odyssey (1968),3.2537526
"Departed, The (2006)",3.1165854
"Bourne Identity, The (2002)",2.4428303
Up (2009),1.9961082
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.9776102
Harry Potter and the Philosopher's Stone (2001),1.8070747
Donnie Darko (2001),1.782348
Jumanji (1995),1.7662853

title,score
<chr>,<dbl>
"Shining, The (1980)",2.391197
"Big Lebowski, The (1998)",2.2266258
Kill Bill: Vol. 2 (2004),2.1719309
"Departed, The (2006)",1.9829487
2001: A Space Odyssey (1968),1.8709769
Apocalypse Now (1979),1.8335194
"Green Mile, The (1999)",1.7094155
Ferris Bueller's Day Off (1986),1.6445372
Up (2009),1.5204062
Donnie Darko (2001),1.4627286

title,score
<chr>,<dbl>
Apocalypse Now (1979),3.6580879
"Departed, The (2006)",3.159634
Ferris Bueller's Day Off (1986),3.1249201
"Green Mile, The (1999)",3.1051156
"Bourne Identity, The (2002)",2.7248402
Harry Potter and the Philosopher's Stone (2001),2.4088939
Indiana Jones and the Temple of Doom (1984),2.3993484
Up (2009),2.24508
Jumanji (1995),2.2413487
There's Something About Mary (1998),1.6514746

title,score
<chr>,<dbl>
Apocalypse Now (1979),3.5950832
Kill Bill: Vol. 2 (2004),3.3629005
Ferris Bueller's Day Off (1986),3.2109569
"Departed, The (2006)",2.7456871
"Bourne Identity, The (2002)",2.6129337
Harry Potter and the Philosopher's Stone (2001),2.2799276
Donnie Darko (2001),2.1930168
Up (2009),2.0084715
Jumanji (1995),1.7515015
Interview with the Vampire: The Vampire Chronicles (1994),1.4161989

title,score
<chr>,<dbl>
Apocalypse Now (1979),3.6982469
Kill Bill: Vol. 2 (2004),3.6450942
"Departed, The (2006)",3.1821842
"Green Mile, The (1999)",3.0429039
"Bourne Identity, The (2002)",2.7125532
Indiana Jones and the Temple of Doom (1984),2.5427769
Donnie Darko (2001),2.2744157
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",2.2714331
Jumanji (1995),1.9058049
There's Something About Mary (1998),1.4130056

title,score
<chr>,<dbl>
"Shining, The (1980)",4.0681044
Ferris Bueller's Day Off (1986),3.2428631
"Bourne Identity, The (2002)",2.9816377
2001: A Space Odyssey (1968),2.7210226
"Green Mile, The (1999)",2.7100208
Harry Potter and the Philosopher's Stone (2001),2.2517693
Indiana Jones and the Temple of Doom (1984),2.1540964
Donnie Darko (2001),2.0398947
Interview with the Vampire: The Vampire Chronicles (1994),1.8500935
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.8376355

title,score
<chr>,<dbl>
2001: A Space Odyssey (1968),3.8795179
Up (2009),2.9988638
Jumanji (1995),2.7902313
Harry Potter and the Philosopher's Stone (2001),2.7389889
Donnie Darko (2001),2.5420384
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",2.4832357
There's Something About Mary (1998),2.0343089
Interview with the Vampire: The Vampire Chronicles (1994),1.7699696
Clear and Present Danger (1994),1.4080207
Sleepless in Seattle (1993),1.3027494

title,score
<chr>,<dbl>
"Green Mile, The (1999)",4.0734508
Jumanji (1995),3.0242372
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",2.8674803
There's Something About Mary (1998),2.2252517
Interview with the Vampire: The Vampire Chronicles (1994),1.8953941
Clear and Present Danger (1994),1.3115726
Sleepless in Seattle (1993),1.1282141
Star Trek: Generations (1994),0.1178511

title,score
<chr>,<dbl>
Up (2009),3.266538
Indiana Jones and the Temple of Doom (1984),3.1557878
Clear and Present Danger (1994),1.257123
Star Trek: Generations (1994),0.5103104

title,score
<chr>,<dbl>
"Big Lebowski, The (1998)",4.065063
Kill Bill: Vol. 2 (2004),2.8549736
Ferris Bueller's Day Off (1986),2.7562755
"Green Mile, The (1999)",2.6736052
"Departed, The (2006)",2.4185378
Indiana Jones and the Temple of Doom (1984),2.2477932
"Bourne Identity, The (2002)",1.9421211
Harry Potter and the Philosopher's Stone (2001),1.8245012
Up (2009),1.8138306
Donnie Darko (2001),1.7895325

title,score
<chr>,<dbl>
"Big Lebowski, The (1998)",4.436197
Apocalypse Now (1979),3.9507542
Ferris Bueller's Day Off (1986),3.2769858
2001: A Space Odyssey (1968),2.9116401
Harry Potter and the Philosopher's Stone (2001),2.3715024
Indiana Jones and the Temple of Doom (1984),2.2771613
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",2.2620839
Donnie Darko (2001),2.1301657
There's Something About Mary (1998),2.0421611
Sleepless in Seattle (1993),1.260947

title,score
<chr>,<dbl>
Interview with the Vampire: The Vampire Chronicles (1994),2.2961831
Sleepless in Seattle (1993),1.5097224
Star Trek: Generations (1994),0.2970443

title,score
<chr>,<dbl>
"Shining, The (1980)",4.5411624
"Green Mile, The (1999)",2.9342563
2001: A Space Odyssey (1968),2.8908235
"Bourne Identity, The (2002)",2.8084241
Up (2009),2.5453229
Indiana Jones and the Temple of Doom (1984),2.4809019
Jumanji (1995),2.2762685
Harry Potter and the Philosopher's Stone (2001),2.1119887
Donnie Darko (2001),2.0849021
There's Something About Mary (1998),1.9600407

title,score
<chr>,<dbl>
"Big Lebowski, The (1998)",3.2898125
Apocalypse Now (1979),2.7771329
2001: A Space Odyssey (1968),2.4466325
Kill Bill: Vol. 2 (2004),2.418621
"Departed, The (2006)",2.2296388
"Bourne Identity, The (2002)",2.006032
Indiana Jones and the Temple of Doom (1984),1.9689752
Harry Potter and the Philosopher's Stone (2001),1.5607269
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.461034
Jumanji (1995),1.421481

title,score
<chr>,<dbl>
Kill Bill: Vol. 2 (2004),2.1497512
"Shining, The (1980)",2.1107601
"Big Lebowski, The (1998)",2.0277428
"Departed, The (2006)",1.9954479
"Green Mile, The (1999)",1.6029033
"Bourne Identity, The (2002)",1.5540878
Ferris Bueller's Day Off (1986),1.4479869
2001: A Space Odyssey (1968),1.3152114
Up (2009),1.1664806
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.1657821


>As an exercise, display all these recommendation scores in the $15 \times 20$ matrix relating users to movies, with blanks in the cells where a user has already watched a movie.

>A variant on the above is a *k-nearest-neighbours* approach that bases recommendations *only on k most similar users*. This is faster when there are many users. Try to implement this as an additional exercise.

## Item-based collaborative filtering

### The basic idea behind item-based collaborative filtering

Item-based collaborative filtering works very similarly to its user-based counterpart, although you might find it slightly less intuitive. It is also based on similarities, but similarities between *movies* rather than *users*.

There are two main conceptual parts to item-based collaborative filtering:

1. One movie is similar to another if many of the same users have seen both movies.
2. When deciding what movie to recommend to a particular user, movies are evaluated on how similar they are to movies *that the user has already seen*.

Let's start by computing the similarities between all pairs of movies. We can reuse the same code we used to compute user similarities, if we first transpose the *viewed_movies* matrix.

In [22]:
# transpose the viewed_movies matrix
movies_user <- t(viewed_movies)

# get all similarities between MOVIES
movie_similarities = matrix(0, nrow = 20, ncol = 20)
for (i in 1:19) {
  for (j in (i + 1):20) {
    movie_similarities[i,j] <- cosine_sim(viewed_movies[,i], viewed_movies[,j])
  }
}
movie_similarities <- movie_similarities + t(movie_similarities)
diag(movie_similarities) <- 0
row.names(movie_similarities) <- colnames(viewed_movies)
colnames(movie_similarities) <- colnames(viewed_movies)
movies_user

Unnamed: 0,1,20,187,198,212,222,282,328,330,372,432,434,495,562,594
2001: A Space Odyssey (1968),0,0,1,1,1,0,0,1,1,1,0,1,0,0,0
Apocalypse Now (1979),1,0,0,0,0,1,1,1,1,1,0,1,1,0,1
"Big Lebowski, The (1998)",1,0,1,1,1,1,1,1,1,0,0,1,1,0,0
"Bourne Identity, The (2002)",0,1,0,0,0,0,1,1,1,0,1,1,0,0,0
Clear and Present Danger (1994),1,0,0,0,0,0,0,0,0,0,0,1,0,0,0
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",0,1,1,1,0,0,0,0,1,0,0,1,0,0,0
"Departed, The (2006)",0,0,0,0,0,1,1,1,1,0,1,1,1,0,0
Donnie Darko (2001),0,0,1,0,0,0,0,1,1,0,0,1,0,0,0
Ferris Bueller's Day Off (1986),0,0,0,0,1,0,1,1,1,0,0,1,1,1,0
"Green Mile, The (1999)",1,0,0,1,0,0,1,0,1,0,1,1,0,1,0


We can use the result to see, for example, what movies are most similar to "Apocalypse Now":

In [23]:
t(sort(movie_similarities[,"Apocalypse Now (1979)"], decreasing = TRUE))

"Departed, The (2006)","Big Lebowski, The (1998)",Kill Bill: Vol. 2 (2004),There's Something About Mary (1998),Ferris Bueller's Day Off (1986),"Shining, The (1980)",Indiana Jones and the Temple of Doom (1984),"Bourne Identity, The (2002)",Jumanji (1995),2001: A Space Odyssey (1968),"Green Mile, The (1999)",Donnie Darko (2001),Clear and Present Danger (1994),Harry Potter and the Philosopher's Stone (2001),Up (2009),Sleepless in Seattle (1993),Interview with the Vampire: The Vampire Chronicles (1994),Star Trek: Generations (1994),"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",Apocalypse Now (1979)
0.7559289,0.7378648,0.7071068,0.6666667,0.6299408,0.6030227,0.5962848,0.5443311,0.5443311,0.5039526,0.5039526,0.5,0.4714045,0.4472136,0.4472136,0.3849002,0.3333333,0.3333333,0.2981424,0


### Recommending movies for a single user

Let's again look at a concrete example of recommending a movie to a particular user, say user 372.

User 372 has seen the following movies:

In [24]:
t(which(viewed_movies["372", ] == 1))

2001: A Space Odyssey (1968),Apocalypse Now (1979),"Shining, The (1980)"
1,2,16


Another way of doing the same thing:

In [25]:
ratings_red %>% 
  filter(userId == 372) %>% 
  select(userId, title)

userId,title
<int>,<chr>
372,2001: A Space Odyssey (1968)
372,Apocalypse Now (1979)
372,"Shining, The (1980)"


We now implement the main idea behind item-based filtering. For each movie, we find the similarities between that movie and each of the three movies user 372 has seen, and sum up those similarities. The resulting sum is that movie's "recommendation score".

We start by identifying the movies the user has seen:

In [26]:
user_seen <- ratings_red %>% 
        filter(userId == 372) %>% 
        select(title) %>% 
        unlist() %>% 
        as.character()

We then compute the similarities between all movies and these "seen" movies. For example, similarities for the first seen movie, *2001: A Space Odyssey* are:

In [27]:
user_seen[1]
t(sort(movie_similarities[,user_seen[1]], decreasing = TRUE))

"Shining, The (1980)",Donnie Darko (2001),"Big Lebowski, The (1998)","Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",Harry Potter and the Philosopher's Stone (2001),Ferris Bueller's Day Off (1986),Kill Bill: Vol. 2 (2004),Indiana Jones and the Temple of Doom (1984),Up (2009),Apocalypse Now (1979),"Bourne Identity, The (2002)","Departed, The (2006)","Green Mile, The (1999)",Interview with the Vampire: The Vampire Chronicles (1994),There's Something About Mary (1998),Jumanji (1995),Clear and Present Danger (1994),Sleepless in Seattle (1993),2001: A Space Odyssey (1968),Star Trek: Generations (1994)
0.797724,0.7559289,0.7171372,0.6761234,0.6761234,0.5714286,0.5345225,0.5070926,0.5070926,0.5039526,0.46291,0.4285714,0.4285714,0.3779645,0.3779645,0.3086067,0.2672612,0.2182179,0,0


We can do the same for each of the three seen movies or, more simply, do all three at once:

In [28]:
movie_similarities[,user_seen]

Unnamed: 0,2001: A Space Odyssey (1968),Apocalypse Now (1979),"Shining, The (1980)"
2001: A Space Odyssey (1968),0.0,0.5039526,0.797724
Apocalypse Now (1979),0.5039526,0.0,0.6030227
"Big Lebowski, The (1998)",0.7171372,0.7378648,0.7627701
"Bourne Identity, The (2002)",0.46291,0.5443311,0.6154575
Clear and Present Danger (1994),0.2672612,0.4714045,0.4264014
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",0.6761234,0.2981424,0.5393599
"Departed, The (2006)",0.4285714,0.7559289,0.5698029
Donnie Darko (2001),0.7559289,0.5,0.6030227
Ferris Bueller's Day Off (1986),0.5714286,0.6299408,0.6837635
"Green Mile, The (1999)",0.4285714,0.5039526,0.797724


Each movie's recommendation score is obtained by summing across columns, each column representing a seen movie:

In [29]:
t(sort(apply(movie_similarities[, user_seen], 1, sum), decreasing = T))

"Big Lebowski, The (1998)",Ferris Bueller's Day Off (1986),Kill Bill: Vol. 2 (2004),Donnie Darko (2001),Indiana Jones and the Temple of Doom (1984),"Departed, The (2006)","Green Mile, The (1999)",Harry Potter and the Philosopher's Stone (2001),"Bourne Identity, The (2002)","Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",Up (2009),"Shining, The (1980)",There's Something About Mary (1998),2001: A Space Odyssey (1968),Jumanji (1995),Clear and Present Danger (1994),Interview with the Vampire: The Vampire Chronicles (1994),Apocalypse Now (1979),Sleepless in Seattle (1993),Star Trek: Generations (1994)
2.217772,1.885133,1.881231,1.858952,1.777577,1.754303,1.730248,1.662697,1.622699,1.513626,1.493666,1.400747,1.346142,1.301677,1.222212,1.165067,1.163565,1.106975,0.9512734,0.3333333


The preceding explanation hopefully makes the details of the calculations clear, but it is quite unwieldy. We can do all the calculations more neatly as:

In [30]:
user_scores <- tibble(title = row.names(movie_similarities), 
                      score = apply(movie_similarities[,user_seen], 1, sum),
                      seen = viewed_movies["372",])

user_scores %>% 
  filter(seen == 0) %>% 
  arrange(desc(score))

title,score,seen
<chr>,<dbl>,<dbl>
"Big Lebowski, The (1998)",2.217772,0
Ferris Bueller's Day Off (1986),1.8851328,0
Kill Bill: Vol. 2 (2004),1.8812314,0
Donnie Darko (2001),1.8589516,0
Indiana Jones and the Temple of Doom (1984),1.7775772,0
"Departed, The (2006)",1.7543033,0
"Green Mile, The (1999)",1.7302481,0
Harry Potter and the Philosopher's Stone (2001),1.6626969,0
"Bourne Identity, The (2002)",1.6226986,0
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.5136257,0


Again we will end up recommending "The Big Lebowski" to this particular user.

Let's repeat the process to generate a recommendation for one more user, user 222:

In [31]:
# do for user 222
user <- "222"
user_seen <- ratings_red %>% 
  filter(userId == user) %>% 
  select(title) %>% 
  unlist() %>% 
  as.character()

user_scores <- tibble(title = row.names(movie_similarities), 
                      score = apply(movie_similarities[,user_seen],1,sum),
                      seen = viewed_movies[user,])

user_scores %>% 
  filter(seen == 0) %>% 
  arrange(desc(score))

title,score,seen
<chr>,<dbl>,<dbl>
"Bourne Identity, The (2002)",4.176571,0
Ferris Bueller's Day Off (1986),3.92318,0
"Shining, The (1980)",3.785343,0
Donnie Darko (2001),3.761971,0
Harry Potter and the Philosopher's Stone (2001),3.589269,0
"Green Mile, The (1999)",3.386454,0
2001: A Space Odyssey (1968),3.377847,0
Indiana Jones and the Temple of Doom (1984),3.091007,0
Interview with the Vampire: The Vampire Chronicles (1994),2.893835,0
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",2.871167,0


Here we see a different top recommendation (The Bourne Identity) to what was produced by the user-based system.

### A simple function to generate an item-based CF recommendation for any user

In [32]:
# a function to generate an item-based recommendation for any user
item_based_recommendations <- function(user, movie_sim, viewed_mov){
  
  # turn into character if not already
  user <- ifelse(is.character(user), user, as.character(user))
  
  # get scores
  user_seen <- row.names(movie_similarities)[viewed_movies[user,] == TRUE]
  user_scores <- tibble(title = row.names(movie_similarities), 
                        score = apply(movie_similarities[,user_seen], 1, sum),
                        seen = viewed_movies[user,])
  
  # sort unseen movies by score and remove the 'seen' column
  user_scores %>% 
    filter(seen == 0) %>% 
    arrange(desc(score)) %>% 
    select(-seen)
}

Let's check that its working with a user we've seen before, user 372:

In [33]:
item_based_recommendations(user = 372, movie_sim = movie_similarities, viewed_mov = viewed_movies)

title,score
<chr>,<dbl>
"Big Lebowski, The (1998)",2.217772
Ferris Bueller's Day Off (1986),1.8851328
Kill Bill: Vol. 2 (2004),1.8812314
Donnie Darko (2001),1.8589516
Indiana Jones and the Temple of Doom (1984),1.7775772
"Departed, The (2006)",1.7543033
"Green Mile, The (1999)",1.7302481
Harry Potter and the Philosopher's Stone (2001),1.6626969
"Bourne Identity, The (2002)",1.6226986
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.5136257


And now do it for all users with `lapply`

In [34]:
lapply(sorted_my_users, item_based_recommendations, movie_similarities, viewed_movies)

title,score
<chr>,<dbl>
"Departed, The (2006)",3.3886514
Kill Bill: Vol. 2 (2004),3.3881969
Ferris Bueller's Day Off (1986),3.3766238
2001: A Space Odyssey (1968),3.2217391
"Bourne Identity, The (2002)",3.1297974
Donnie Darko (2001),2.9142097
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",2.626508
Harry Potter and the Philosopher's Stone (2001),2.6065484
Up (2009),2.6065484
There's Something About Mary (1998),2.3976443

title,score
<chr>,<dbl>
Donnie Darko (2001),2.3622615
Kill Bill: Vol. 2 (2004),2.2477214
"Departed, The (2006)",2.2338844
2001: A Space Odyssey (1968),2.1237636
There's Something About Mary (1998),2.1191721
"Shining, The (1980)",2.0634517
"Big Lebowski, The (1998)",2.035067
Ferris Bueller's Day Off (1986),1.9400052
"Green Mile, The (1999)",1.9252777
Up (2009),1.8954451

title,score
<chr>,<dbl>
"Departed, The (2006)",3.933899
"Bourne Identity, The (2002)",3.884796
Harry Potter and the Philosopher's Stone (2001),3.749938
Ferris Bueller's Day Off (1986),3.734473
Apocalypse Now (1979),3.683423
"Green Mile, The (1999)",3.621454
Jumanji (1995),3.210873
Indiana Jones and the Temple of Doom (1984),3.209954
There's Something About Mary (1998),3.131361
Up (2009),3.115414

title,score
<chr>,<dbl>
Donnie Darko (2001),3.487406
Kill Bill: Vol. 2 (2004),3.439954
Ferris Bueller's Day Off (1986),3.388912
"Bourne Identity, The (2002)",3.307424
Apocalypse Now (1979),3.24322
"Departed, The (2006)",3.132094
Harry Potter and the Philosopher's Stone (2001),3.11923
Clear and Present Danger (1994),2.624082
Up (2009),2.5502
Jumanji (1995),2.258386

title,score
<chr>,<dbl>
Kill Bill: Vol. 2 (2004),3.731699
Donnie Darko (2001),3.676388
"Departed, The (2006)",3.613013
"Bourne Identity, The (2002)",3.489998
Apocalypse Now (1979),3.369208
Indiana Jones and the Temple of Doom (1984),3.195492
"Green Mile, The (1999)",3.071462
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",2.91923
Jumanji (1995),2.469231
There's Something About Mary (1998),2.426209

title,score
<chr>,<dbl>
"Bourne Identity, The (2002)",4.176571
Ferris Bueller's Day Off (1986),3.92318
"Shining, The (1980)",3.785343
Donnie Darko (2001),3.761971
Harry Potter and the Philosopher's Stone (2001),3.589269
"Green Mile, The (1999)",3.386454
2001: A Space Odyssey (1968),3.377847
Indiana Jones and the Temple of Doom (1984),3.091007
Interview with the Vampire: The Vampire Chronicles (1994),2.893835
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",2.871167

title,score
<chr>,<dbl>
Donnie Darko (2001),5.0140289
2001: A Space Odyssey (1968),4.9519103
Harry Potter and the Philosopher's Stone (2001),4.678175
Up (2009),4.6537147
Jumanji (1995),4.1162251
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",4.0084679
There's Something About Mary (1998),3.9275805
Clear and Present Danger (1994),3.5851952
Interview with the Vampire: The Vampire Chronicles (1994),2.9853177
Sleepless in Seattle (1993),2.4465455

title,score
<chr>,<dbl>
"Green Mile, The (1999)",6.3526667
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",5.6483191
Jumanji (1995),5.4656152
There's Something About Mary (1998),5.3220077
Clear and Present Danger (1994),4.3039429
Interview with the Vampire: The Vampire Chronicles (1994),3.9325313
Sleepless in Seattle (1993),2.7752017
Star Trek: Generations (1994),0.3333333

title,score
<chr>,<dbl>
Up (2009),7.226564
Indiana Jones and the Temple of Doom (1984),6.850451
Clear and Present Danger (1994),4.848238
Star Trek: Generations (1994),2.318932

title,score
<chr>,<dbl>
"Big Lebowski, The (1998)",2.217772
Ferris Bueller's Day Off (1986),1.8851328
Kill Bill: Vol. 2 (2004),1.8812314
Donnie Darko (2001),1.8589516
Indiana Jones and the Temple of Doom (1984),1.7775772
"Departed, The (2006)",1.7543033
"Green Mile, The (1999)",1.7302481
Harry Potter and the Philosopher's Stone (2001),1.6626969
"Bourne Identity, The (2002)",1.6226986
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.5136257

title,score
<chr>,<dbl>
"Big Lebowski, The (1998)",4.6457546
Apocalypse Now (1979),4.4392201
Ferris Bueller's Day Off (1986),4.2595257
Donnie Darko (2001),4.222875
Harry Potter and the Philosopher's Stone (2001),3.9604819
There's Something About Mary (1998),3.9487111
2001: A Space Odyssey (1968),3.8459632
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",3.6015145
Indiana Jones and the Temple of Doom (1984),3.4620542
Sleepless in Seattle (1993),2.4913902

title,score
<chr>,<dbl>
Interview with the Vampire: The Vampire Chronicles (1994),5.870082
Sleepless in Seattle (1993),4.518591
Star Trek: Generations (1994),1.241582

title,score
<chr>,<dbl>
"Shining, The (1980)",3.2589613
"Bourne Identity, The (2002)",3.1711468
Donnie Darko (2001),2.9734557
Up (2009),2.8285705
Indiana Jones and the Temple of Doom (1984),2.7919183
"Green Mile, The (1999)",2.7789466
2001: A Space Odyssey (1968),2.7556123
Harry Potter and the Philosopher's Stone (2001),2.6704566
There's Something About Mary (1998),2.6162496
Jumanji (1995),2.4347998

title,score
<chr>,<dbl>
"Big Lebowski, The (1998)",2.2600957
Apocalypse Now (1979),2.1218163
"Bourne Identity, The (2002)",2.0855865
"Departed, The (2006)",2.0737351
Kill Bill: Vol. 2 (2004),2.0464019
2001: A Space Odyssey (1968),2.0159419
Indiana Jones and the Temple of Doom (1984),1.8574158
Donnie Darko (2001),1.836609
Harry Potter and the Philosopher's Stone (2001),1.8117439
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.642713

title,score
<chr>,<dbl>
Kill Bill: Vol. 2 (2004),2.549241
"Departed, The (2006)",2.536271
"Bourne Identity, The (2002)",2.263197
Donnie Darko (2001),2.196923
"Green Mile, The (1999)",2.159227
"Big Lebowski, The (1998)",2.098307
"Shining, The (1980)",2.074231
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",1.998491
Ferris Bueller's Day Off (1986),1.94193
Harry Potter and the Philosopher's Stone (2001),1.923955


> This would once again be better displayed in a user $\times$ movie matrix, with blanks in the already-seen cells.  

## Collaborative filtering with matrix factorization 

In this section we're going to look at a different way of doing collaborative filtering, one based on the idea of *matrix factorization*, a topic from linear algebra.

Matrix factorization, also called matrix decomposition, takes a matrix and represents it as a product of other (usually two) matrices. There are many ways to do matrix factorization, and different problems tend to use different methods. Factorization often involves finding underlying **latent factors** containing information about the dataset. 

In recommendation systems, matrix factorization is used to decompose the ratings matrix into the product of two matrices. This is done in such a way that the known ratings are matched as closely as possible. 

The key feature of matrix factorization for recommendation systems is that while the ratings matrix is incomplete (i.e. some entries are blank), the two matrices the ratings matrix is decomposed into are *complete* (no blank entries). This gives a straightforward way of filling in blank spaces in the original ratings matrix, as we'll see.

Its actually easier to see the underlying logic and calculations in a spreadsheet setting, so we'll first save the ratings matrix as a .csv file and then jump over to Excel for a bit, before returning to work in R again.

In [35]:
# get ratings in wide format
ratings_wide <- ratings_red %>% 
  select(userId,title,rating) %>% 
  complete(userId, title) %>% 
  spread(key = title, value = rating)

# convert data to matrix form 
sorted_my_users <- as.character(unlist(ratings_wide[,1]))
ratings_wide <- as.matrix(ratings_wide[,-1])
row.names(ratings_wide) <- sorted_my_users

# save as csv for Excel demo
write.csv(ratings_wide,"output/ratings_for_excel_example.csv")

Now let's set up the same computations in R, which will be faster and easier to generalise beyond a particular size dataset. We start by defining a function that will compute the sum of squared differences between the observed movie ratings and any other set of predicted ratings (for example, ones predicted by matrix factorization). Note that we only count movies that have already been rated in the accuracy calculation.

In [36]:
recommender_accuracy <- function(x, observed_ratings){
    
  # extract user and movie factors from parameter vector (note x is defined such that 
  # the first 75 elements are latent factors for users and rest are for movies)
  user_factors <- matrix(x[1:75], 15, 5)
  movie_factors <- matrix(x[76:175], 5, 20)
  
  # get predictions from dot products of respective user and movie factor
  predicted_ratings <- user_factors %*% movie_factors
  
  # model accuracy is sum of squared errors over all rated movies
  errors <- (observed_ratings - predicted_ratings) ^ 2 
  
  sqrt(mean(errors[!is.na(observed_ratings)]))   # only use rated movies
}

> **Exercise**: This function isn't general, because it refers specifically to a ratings matrix with 15 users, 20 movies, and 5 latent factors. Make the function general.

We'll now optimize the values in the user and movie latent factors, choosing them so that the root mean square error (the square root of the average squared difference between observed and predicted ratings) is a minimum. I've done this using R's inbuilt numerical optimizer `optim()`, with the default "Nelder-Mead" method. There are better ways to do this - experiment! Always check whether the optimizer has converged (although you can't always trust this), see `help(optim)` for details.

In [37]:
set.seed(10)
# optimization step
rec1 <- optim(par = runif(175), recommender_accuracy, 
            observed_ratings = ratings_wide, control = list(maxit = 100000))
rec1$convergence
rec1$value

The best value of the objective function found by `optim()` after 100000 iterations is `r round(rec1$value, 3)`, but note that it hasn't converged yet, so we should really run for longer or try another optimizer! Ignoring this for now, we can extract the optimal user and movie factors. With a bit of work, these can be interpreted and often give useful information. Unfortunately we don't have time to look at this further (although it is similar to the interpretation of principal components, if you are familiar with that).

In [38]:
# extract optimal user factors
user_factors <- matrix(rec1$par[1:75], 15, 5)
head(user_factors)

0,1,2,3,4
1.8704994,1.1994809,0.7636127,-0.92636218,-1.200934246
2.3724241,0.3416254,0.3262135,-0.49322154,1.889202017
1.1111085,1.839412,0.5013395,-1.0257056,0.328460876
0.4730928,0.7980941,1.9927642,-1.44234539,-0.006360044
-0.7133853,1.6069591,1.7957153,0.09586535,1.13898548
-0.9429216,2.10161,1.259503,1.24631247,1.399925211


In [39]:
# extract optimal movie factors
movie_factors <- matrix(rec1$par[76:175], 5, 20)
head(movie_factors)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0.45213586,0.5664772,1.5820719,0.1568805,1.6091279,0.4922459,1.17595084,-0.7182418,1.2398087,1.27433003,0.58822856,0.580422,0.89899374,1.3108956,1.1921026,1.0087201,1.8586159,1.7595161,1.8285378,-0.5284306
1.24068067,3.2651238,1.6937969,2.3894585,-2.4668102,-0.1371224,1.26457554,1.2561888,1.5108612,0.45180827,1.6993947,1.004728,0.07560978,0.737289,0.9595835,1.6531187,-0.27566768,-1.0703908,1.7410831,1.6426011
1.38205926,-1.7512861,0.5652246,0.8756689,0.8256912,-0.2268436,1.0642614,-1.9826785,0.4479486,1.64782246,-0.28835189,0.2869269,1.79849298,1.1676913,1.4426121,0.8330061,-0.01684526,-2.0657576,-0.6007035,0.1925521
-0.09118251,0.6880326,0.3485811,0.9556278,-3.0262817,-2.220391,-0.08885995,-2.0247182,0.5343086,-0.67048311,0.04518522,-0.9562937,-2.03705529,0.8482193,-0.2506995,0.2437435,0.62911225,0.3398119,0.6548961,-0.7082477
-0.17154368,-1.0659075,0.2554457,1.0377295,-0.5186627,0.740177,0.98989633,2.5735647,0.1750985,0.07328022,1.38497948,-1.49307,-0.73585067,-0.2217906,1.2143087,0.9179228,-1.02828691,-1.5899725,0.7074759,-0.4665972


Most importantly, we can get **predicted movie ratings** for any user, by taking the appropriate dot product of user and movie factors. Here we show the predictions for user 1:

In [40]:
# check predictions for one user
predicted_ratings <- user_factors %*% movie_factors
rbind(round(predicted_ratings[1,], 1), as.numeric(ratings_wide[1,]))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
3.7,4.3,4.8,1.7,4.1,1.8,3.4,-2.6,3.8,4.7,1.2,5.2,5.9,3.7,3.3,3.2,3.8,2.0,3.6,2.3
,4.0,5.0,,4.0,,,,,5.0,,5.0,,,,3.0,,,,


### Adding L2 regularization

One trick that can improve the performance of matrix factorization collaborative filtering is to add L2 regularization. L2 regularization adds a penalty term to the function that we're trying to minimize, which penalizes large parameter values. 

We first rewrite the *evaluate_fit* function to make use of L2 regularization:

In [41]:
## adds L2 regularization, often improves accuracy

evaluate_fit_l2 <- function(x, observed_ratings, lambda){
  
  # extract user and movie factors from parameter vector
  user_factors <- matrix(x[1:75], 15, 5)
  movie_factors <- matrix(x[76:175], 5, 20)
  
  # get predictions from dot products
  predicted_ratings <- user_factors %*% movie_factors
  
  errors <- (observed_ratings - predicted_ratings) ^ 2 
  
  # L2 norm penalizes large parameter values
  penalty <- sqrt(sum(user_factors ^ 2, movie_factors ^ 2))
  
  # model accuracy contains an error term and a weighted penalty 
  accuracy <- sqrt(mean(errors[!is.na(observed_ratings)])) + lambda * penalty
  
  return(accuracy)
}

We now rerun the optimization with this new evaluation function:

In [42]:
set.seed(10)
# optimization step
rec2 <- optim(par = runif(175), evaluate_fit_l2, 
            lambda = 3e-2, observed_ratings = ratings_wide, control = list(maxit = 100000))
rec2$convergence
rec2$value

The best value found is **worse** than before, but remember that we changed the objective function to include the L2 penalty term, so the numbers are not comparable. We need to extract just the RMSE that we're interested in. To do that we first need to extract the optimal parameter values (user and movie factors), and multiply these matrices together to get predicted ratings. From there, its easy to calculate the errors.

In [43]:
# extract optimal user and movie factors
user_factors <- matrix(rec2$par[1:75], 15, 5)
movie_factors <- matrix(rec2$par[76:175], 5, 20)

# get predicted ratings
predicted_ratings <- user_factors %*% movie_factors

# check accuracy
errors <- (ratings_wide - predicted_ratings) ^ 2 
sqrt(mean(errors[!is.na(ratings_wide)]))

Compare this with what we achieved without L2 regularization: did it work? As before, we can extract user and movie factors, and get predictions for any user.

In [44]:
# check predictions for one user
rbind(round(predicted_ratings[1,],1), as.numeric(ratings_wide[1,]))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
4.2,3.7,4.8,4.3,4.1,4.4,5.2,5.7,4.0,5,0.9,5,5.0,-0.8,1.1,3,6.0,-2.0,2.3,3.8
,4.0,5.0,,4.0,,,,,5,,5,,,,3,,,,


### Adding bias terms

We've already seen bias terms in the Excel example. Bias terms are additive factors that model the fact that some users are more generous than others (and so will give higher ratings, on average) and some movies are better than others (and so will get higher ratings, on average). 

Let's adapt our evaluation function further to include bias terms for both users and movies:

In [45]:
## add an additive bias term for each user and movie

evaluate_fit_l2_bias <- function(x, observed_ratings, lambda){
  # extract user and movie factors and bias terms from parameter vector
  user_factors <- matrix(x[1:75], 15, 5)
  movie_factors <- matrix(x[76:175], 5, 20)
  # the bias vectors are repeated to make the later matrix calculations easier 
  user_bias <- matrix(x[176:190],nrow = 15, ncol = 20)
  movie_bias <- t(matrix(x[191:210], nrow = 20, ncol = 15))
  
  # get predictions from dot products + bias terms
  predicted_ratings <- user_factors %*% movie_factors + user_bias + movie_bias
  
  errors <- (observed_ratings - predicted_ratings) ^ 2 
  
  # L2 norm penalizes large parameter values (note not applied to bias terms)
  penalty <- sqrt(sum(user_factors ^ 2, movie_factors ^ 2))
  
  # model accuracy contains an error term and a weighted penalty 
  sqrt(mean(errors[!is.na(observed_ratings)])) + lambda * penalty
}

Again, rerun the optimization:

In [46]:
set.seed(10)
# optimization step (note longer parameter vector to include bias)
rec3 <- optim(par = runif(220), evaluate_fit_l2_bias,
              observed_ratings = ratings_wide, lambda = 3e-2, control = list(maxit = 100000))
rec3$convergence
rec3$value

This value isn't comparable to either of the previous values, for the same reason as before: the objective function has changed to include bias terms. Extracting just the RMSE:

In [47]:
# extract optimal user and movie factors and bias terms
user_factors <- matrix(rec3$par[1:75], 15, 5)
movie_factors <- matrix(rec3$par[76:175], 5, 20)
user_bias <- matrix(rec3$par[176:190], nrow = 15, ncol = 20)
movie_bias <- t(matrix(rec3$par[191:210], nrow = 20, ncol = 15))

# get predicted ratings
predicted_ratings <- user_factors %*% movie_factors + user_bias + movie_bias

# check accuracy
errors <- (ratings_wide - predicted_ratings) ^ 2 
sqrt(mean(errors[!is.na(ratings_wide)]))

This is indeed an improvement over what we've seen before (at least, for the parameter settings above!). 

We can examine and interpret the user or movie latent factors, or bias terms, if we want to. Below we show the movie bias terms, which gives some reflection of movie quality (with some notable exceptions!)

In [48]:
data.frame(movies = colnames(viewed_movies), bias = movie_bias[1,]) %>% arrange(desc(bias))

movies,bias
<chr>,<dbl>
Clear and Present Danger (1994),3.3846413
"Green Mile, The (1999)",2.6689232
Ferris Bueller's Day Off (1986),2.1920628
"Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)",2.1114224
Kill Bill: Vol. 2 (2004),1.9200531
Donnie Darko (2001),1.8430949
There's Something About Mary (1998),1.812324
"Shining, The (1980)",1.729378
Apocalypse Now (1979),1.6043468
"Big Lebowski, The (1998)",1.5533379


Finally, we again get predicted ratings for one user:

In [49]:
# check predictions for one user
rbind(round(predicted_ratings[1,], 1), as.numeric(ratings_wide[1,]))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
1.6,4,5.1,3.1,4,2.7,2.9,4.3,6.0,5,4.4,4.9,4.5,3.4,3.2,3,5.7,1.6,4.8,3.6
,4,5.0,,4,,,,,5,,5.0,,,,3,,,,


## Exercises

There are a few places in the notebook where an exercise is indicated. Specifically:

1. Adapt the pairwise similarity function so that it doesn't use loops.
2. Display the output of the user-based and item-based recommendations in single matrices.
3. Implement a k-nearest-neighbours version of item-based collaborative filtering.
4. Adapt the `recommender_accuracy()` function so that it can be used with an arbitrary number of users and movies.
5. Experiment with the optimizers used in the matrix factorization collaborative filter.