-
Notifications
You must be signed in to change notification settings - Fork 83
/
collaborative-filtering.Rmd
498 lines (387 loc) · 14.7 KB
/
collaborative-filtering.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
---
title: "Movie recommendations with collaborative filtering"
output:
html_notebook:
toc: yes
toc_float: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
ggplot2::theme_set(ggplot2::theme_bw())
```
In this example, we are going to learn how to make a recommendation system using
collaborative filtering. Collaborative filtering is one of the most common
approaches used to recommend products or services to customers and became very
popular after the famous [Netflix competition](https://en.wikipedia.org/wiki/Netflix_Prize).
By creating a collaborative filtering algorithm with keras, you will also be
exposed to how we can create more customized models with keras' functional
model options.
Learning objectives:
- How to create a neural network collaborative filtering algorithm
- How to create a customized functional keras model
# Requirements
```{r}
library(keras)
library(tidyverse)
library(glue)
```
# Prepare our data
For this module we'll use [MovieLens data](https://grouplens.org/datasets/movielens/),
which provides user rating information for movies. There are multiple dataset
sizes; however, for efficiency we will use the smaller dataset that contains
100,836 ratings of 9,724 movies rated by 610 users.
```{r}
# get path to data
if (stringr::str_detect(here::here(), "conf-2020-user")) {
data_dir <- "/home/conf-2020-user/data/ml-latest-small"
} else {
data_dir <- here::here("materials", "data", "ml-latest-small")
}
movies <- read_csv(file.path(data_dir, "movies.csv"))
ratings <- read_csv(file.path(data_dir, "ratings.csv"))
```
Currently our datasets are separate and movie ID ranges from 1 to 193,609 even
though our data only contains 9,724 unique movie IDs. Consequently, the following:
1. creates a `dense_movie_id` so there are no gaps, which makes future mapping
of our word vector to embeddings simpler,
2. joins our datasets,
3. cleans up our column names,
4. and converts our IDs to be zero-based (makes things easier).
```{r}
movie_data <- ratings %>%
distinct(movieId) %>%
rowid_to_column(var = "dense_movie_id") %>%
inner_join(ratings) %>%
inner_join(movies) %>%
select(user_id = userId, movie_id = movieId, dense_movie_id, rating, everything()) %>%
mutate(user_id = user_id - 1, dense_movie_id = dense_movie_id - 1)
movie_data
```
Let's extract the number of movies and users. We'll use these parameters later
in our keras model.
```{r}
n_movies <- n_distinct(movie_data$dense_movie_id)
n_users <- n_distinct(movie_data$user_id)
glue("This dataset includes {nrow(movie_data)} ratings by {n_users} users on {n_movies} unique movies")
```
Lastly, let's randomize our data and then create our feature and response
tensors. Note that our feature set simply contains the user and movie ID.
```{r}
set.seed(123)
movie_data <- movie_data %>% sample_frac()
x_train <- movie_data %>% select(c(user_id, dense_movie_id)) %>% as.matrix()
y_train <- movie_data %>% pull(rating)
head(x_train)
```
# Create a collaborative filtering algorithm
Collaborative filtering is a general concept and there are several algorithms to
implement it. Here is a good [article](https://bit.ly/34sQV8g) that discusses
the different types but they can loosely be categorized as:
* Distance-based (i.e. cosine similarity, correlation)
* Matrix factorization ([ℹ️](http://bit.ly/dl-07-Excel))
* Clustering
* Deep learning
The following implements a neural network approach.
## Embeddings
One of the first things we need to do is select the dimension of the embeddings
that we will use for users and movies. As with word embeddings, the dimension of
our embeddings is a tunable hyperparameter. For now, we'll use 64.
```{r}
embedding_dim <- 64
```
## Basic model
To build our model, we need to take a different approach than the traditional
`keras_model_sequential()` approach. Instead we need to build a model that
resembles this:
![](images/collaborative-filtering-keras-model.png)
First, let's create our input and embedding layers. We create an input and
embedding for our user IDs and our movie IDs. Since each of these inputs are a
single dimension we specify `shape = 1` in our `layer_input()`.
Our embedding layers build onto each of these inputs:
- `input_dim`: number of unique user and movie IDs
- `output_dim`: represents the desired embeddings dimension (64 in this example).
```{r}
# input layers
input_users <- layer_input(shape = 1, name = "users")
input_movies <- layer_input(shape = 1, name = "movies")
user_embeddings <- input_users %>%
layer_embedding(
input_dim = n_users,
output_dim = embedding_dim,
name = "user_embeddings"
)
movie_embeddings <- input_movies %>%
layer_embedding(
input_dim = n_movies,
output_dim = embedding_dim,
name = "movie_embeddings"
)
```
Recall from our Excel example, we multiplied the user embeddings by the movie
embeddings. This is referred to as a dot product and we can use `layer_dot()` to
execute this computation. Since our embeddings outputs are matrices we want to
perform a dot product with the embedding columns (`axes = 2`). If our outputs
were vectors we would use `axes = 1`.
We add our final prediction layer with `layer_dense()`. Since our predicted
rating can't be < 0 I use `activation = "relu"` rather than a purely linear
activation.
```{r}
dot <- layer_dot(
inputs = list(user_embeddings, movie_embeddings),
axes = 2,
name = "dot_product"
)
pred <- dot %>% layer_dense(
units = 1,
activation = "relu",
name = "rating_prediction"
)
```
Now, we just need to combine these layers into a keras model. We use
`keras_model()` to do so and we specify our 2 input layers and map them to our
output layer. We can then add our compilation information as usual.
Note how our model summary illustrates how our layers are connected together.
```{r}
# define model inputs/outputs
model <- keras_model(inputs = c(input_users, input_movies), outputs = pred)
model %>% compile(
optimizer = "rmsprop",
loss = "mse",
metric = "mae"
)
# inspect model
summary(model)
```
We are now ready to train our model. The only difference in this step is since
we have two different input layers (`input_users` & `input_movies`), we need to
supply a list of two inputs:
- `x_train[, "user_id", drop = FALSE]`: tensor (matrix) of user IDs
- `x_train[, "dense_movie_id", drop = FALSE]`: tensor (matrix) of movie IDs
```{r}
# train the model
history <- model %>% fit(
x = list(
x_train[, "user_id", drop = FALSE],
x_train[, "dense_movie_id", drop = FALSE]
),
y = y_train,
epochs = 10,
batch_size = 32,
validation_split = 0.2,
callbacks = list(callback_early_stopping(patience = 2))
)
```
Our model obtains a loss in the lower 0.8 range.
```{r}
best_epoch <- which(history$metrics$val_loss == min(history$metrics$val_loss))
loss <- history$metrics$val_loss[best_epoch] %>% round(3)
mae <- history$metrics$val_mae[best_epoch] %>% round(3)
glue("The best epoch had a loss of {loss} and mean absolute error of {mae}")
```
## Accounting for bias
Unfortunately, our simple model does not account for biases. For example, some
people tend to rate everything favorably and some movies are consistently highly
rated. We can capture this extra information by including extra bias weights in
our model [ℹ️](http://bit.ly/dl-07-Excel).
Doing this results in a neural net architecture that looks like:
![](images/collaborative-filtering-keras-model2.png)
We follow the same procedure as before to set up the user and movie embeddings.
We also create two new bias layers (`user_bias` & `movie_bias`) that will have
an output dimension of 1 since this is creating a single bias weight for each
user and movie.
```{r}
# input layers
input_users <- layer_input(shape = 1, name = "users")
input_movies <- layer_input(shape = 1, name = "movies")
user_embeddings <- input_users %>%
layer_embedding(
input_dim = n_users,
output_dim = embedding_dim,
name = "user_embeddings"
)
movie_embeddings <- input_movies %>%
layer_embedding(
input_dim = n_movies,
output_dim = embedding_dim,
name = "movie_embeddings"
)
user_bias <- input_users %>%
layer_embedding(
input_dim = n_users,
output_dim = 1,
name = "user_bias"
)
movie_bias <- input_users %>%
layer_embedding(
input_dim = n_movies,
output_dim = 1,
name = "movie_bias"
)
```
We create our dot product and then add one more layer that adds the dot product
with the user and movie biases (via `layer_add()`). We then complete our model
with our final prediction layer.
```{r}
dot <- layer_dot(list(user_embeddings, movie_embeddings), axes = 2,
name = "dot_product")
dot_bias <- layer_add(list(dot, user_bias, movie_bias), name = "add_bias")
pred <- dot_bias %>% layer_dense(units = 1, activation = "relu",
name = "rating_prediction")
```
We follow the same procedure to build our model with `keras_model()` and then
compile. Our model summary shows our new layers that include, or are connected
to, our biases.
```{r}
# define model inputs/outputs
model <- keras_model(inputs = c(input_users, input_movies), outputs = pred)
model %>% compile(
optimizer = "rmsprop",
loss = "mse",
metric = "mae"
)
# inspect model
summary(model)
```
We train our model the same way as before:
```{r}
# train the model
history <- model %>% fit(
x = list(
x_train[, "user_id", drop = FALSE],
x_train[, "dense_movie_id", drop = FALSE]
),
y = y_train,
epochs = 10,
batch_size = 32,
validation_split = 0.2,
callbacks = list(callback_early_stopping(patience = 2))
)
```
Our results show an improvement of over 5 percentage points! Spending some time
on hyperparameter optimization could very well lead to even better results.
```{r}
best_epoch <- which(history$metrics$val_loss == min(history$metrics$val_loss))
loss <- history$metrics$val_loss[best_epoch] %>% round(3)
mae <- history$metrics$val_mae[best_epoch] %>% round(3)
glue("The best epoch had a loss of {loss} and mean absolute error of {mae}")
```
# A closer look at the embeddings
If we wanted to take a closer look at our beddings we can always access them.
For example, let's grab the movie embeddings:
```{r}
movie_embeddings <- model %>%
get_layer("movie_embeddings") %>%
get_weights() %>%
.[[1]]
```
The following just adds the actual movie titles to the embeddings after some
regex clean up to remove unncessary info. Note that the movie embeddings are
ordered based on the `dense_movie_id` value (i.e. 1, 2, ..., n) so we need to
properly order the titles before adding them as row names.
```{r}
movie_titles <- movie_data %>%
select(dense_movie_id, title) %>%
distinct() %>%
arrange(dense_movie_id) %>%
mutate(title = title %>% str_remove("\\(.+\\)") %>% str_trim())
row.names(movie_embeddings) <- movie_titles$title
movie_embeddings[1:10, 1:4]
```
We can now use some kind of dimension reduction procedure. The following applies
TSNe to group our movie embeddings along two dimensions and then plot them. If
you zoom in you will see some clear themes among the groupings (i.e. Billy
Madison, The Wedding Singer, Dumb & Dumber, Austin Powers are similar comedies).
```{r, fig.width=10, fig.height=6}
n_words_to_plot <- 200
tsne <- Rtsne::Rtsne(
X = movie_embeddings[1:n_words_to_plot,],
perplexity = 30,
pca = FALSE
)
p <- tsne$Y %>%
as.data.frame() %>%
mutate(word = row.names(movie_embeddings)[1:n_words_to_plot]) %>%
ggplot(aes(x = V1, y = V2, label = word)) +
geom_text(size = 3)
plotly::ggplotly(p)
```
You could do a similar process to find similar groupings of customers.
# Make a customer prediction
Now that we have a model, we often want to make recommendations to customers
about new products we think they'd like. For example, let's look at customer 53.
The following does some data wrangling to identify the movies that user 53 has
and has not watched.
We can use this info to recommend a movie to this customer
that we think they would enjoy but have not watched yet.
```{r}
# convert customer of interest to align to our zero-based customer IDs
original_customer_id <- 53
new_customer_id <- original_customer_id - 1
# get movies watched by our user
movies_watched <- movie_data %>%
filter(user_id == new_customer_id) %>%
pull(dense_movie_id)
# get all available movies
all_movies <- movie_data %>%
distinct(dense_movie_id) %>%
pull()
# identify movies not watched
movies_not_watched <- setdiff(all_movies, movies_watched)
movie_options <- movie_data %>%
filter(dense_movie_id %in% movies_not_watched) %>%
distinct(dense_movie_id, title)
movie_options
```
To do so, we create a new matrix that includes the user's zero-based index ID.
In this example we can see this column is always "52" since we are only focusing
on this one user. We then add a second column of all the `dense_movie_id`s for
the movies that the user has not watched.
```{r}
customer_options <- expand.grid(
user_id = new_customer_id,
dense_movie_id = movies_not_watched
) %>%
as.matrix()
head(customer_options)
```
We can now feed this information into our `predict()` function. Remember, our
keras model takes two inputs (`user_id` & `dense_movie_id`) so our `predict()`
function is going to expect a list of two inputs as well.
```{r}
inputs <- list(
customer_options[, "user_id", drop = FALSE],
customer_options[, "dense_movie_id", drop = FALSE]
)
pred <- model %>% predict(inputs)
head(pred)
```
We can now add these predictions to our `customer_options` data, join the
`movie_options` dataset that has the titles for the movies and rank-order our
movies for those that have the highest expected rating.
```{r}
customer_options %>%
as_tibble() %>%
mutate(predictions = as.vector(pred)) %>%
left_join(movie_options, by = "dense_movie_id") %>%
arrange(desc(predictions))
```
# Key takeaways
* Collaborative filtering
- A common and relatively simple approach to make recommendations
- There are many algorithms to choose from but matrix factorization and our
deep learning extension is probably the most common.
- All we're doing is
1. creating embeddings for both our users and products
2. dot product multiplies these matrices of embeddings
3. use additional bias weights to account for user/product biases
4. and we can extend this with typical deep learning layers (i.e. hidden
layers, dropout, etc.)
* Keras functional model
- Allows us flexibility in creating custom models
- We can have multiple inputs (and subsequent layers) along with multiple
outputs
- Naming our layers allows us to easily view the layer connections
- For more information on keras' functional model see:
- [Deep Learning with R](https://bit.ly/2PvOrBv), Ch. 7
- [Guide to the Functional API](https://bit.ly/35wZqAx)
[🏠](https://github.com/rstudio-conf-2020/dl-keras-tf)