## Building basic recommendation engine

1. Loading and formatting data.
2. Calculating similarity between users.
3. Predicting the unknown ratings for users.
4. Recommending items to users based on user-similarity score.

In [46]:
library(reshape2)
library(data.table)
library(dplyr)

### data loading

In [47]:
ratings = read.csv("movie_rating.csv")
tail(ratings)

Unnamed: 0,critic,title,rating
26,Gene Seymour,Lady in the Water,3.0
27,Gene Seymour,Snakes on a Plane,3.5
28,Gene Seymour,Just My Luck,1.5
29,Gene Seymour,Superman Returns,5.0
30,Gene Seymour,You Me and Dupree,3.5
31,Gene Seymour,The Night Listener,3.0


The acast() function in reshape2 package casts a data frame to matrix representation.

In [49]:
#data processing and formatting
movie_ratings = as.data.frame(acast(ratings, title~critic, value.var="rating"))
head(movie_ratings)

Unnamed: 0,Claudia Puig,Gene Seymour,Jack Matthews,Lisa Rose,Mick LaSalle,Toby
Just My Luck,3.0,1.5,,3.0,2,
Lady in the Water,,3.0,3.0,2.5,3,
Snakes on a Plane,3.5,3.5,4.0,3.5,4,4.5
Superman Returns,4.0,5.0,5.0,3.5,3,4.0
The Night Listener,4.5,3.0,3.0,3.0,3,
You Me and Dupree,2.5,3.5,3.5,2.5,2,1.0


In [50]:
#similarity calculation
sim_users = cor(movie_ratings[,1:6],use="complete.obs")
sim_users

Unnamed: 0,Claudia Puig,Gene Seymour,Jack Matthews,Lisa Rose,Mick LaSalle,Toby
Claudia Puig,1.0,0.7559289,0.9285714,0.9449112,0.6546537,0.8934051
Gene Seymour,0.7559289,1.0,0.9449112,0.5,0.0,0.3812464
Jack Matthews,0.9285714,0.9449112,1.0,0.7559289,0.3273268,0.662849
Lisa Rose,0.9449112,0.5,0.7559289,1.0,0.8660254,0.9912407
Mick LaSalle,0.6546537,0.0,0.3273268,0.8660254,1.0,0.9244735
Toby,0.8934051,0.3812464,0.662849,0.9912407,0.9244735,1.0


#### Predicting the unknown values
1. Extract the titles which Toby has not rated.
2. For these titles, separate all the ratings given by other critics.
3. Multiply the ratings given for these movies by all critics other than Toby with the similarity values of critics with Toby.
4. Sum up the total ratings for each movie, and divide this summed up value with the sum of similarity critic values.


The set* functions in data.table help manipulate input data by reference instead of value, that is, while transforming data, there won't be any physical copy of the data.

Sum up all the rating values for each title calculated in the preceding step, and then divide
this summed up value for each title with the sum of similarity values of each critic, that
is, for the Just My Luck title, the rating for Toby is calculated by summing up all the
sim_rating values for Just My Luck divided by the sum of similarity values of all the
critics who have rated the Just My Luck title:
(2.6802154+0.5718696+2.9737221+1.8489469)/(0.8934051+0.3812464+0.9912407+0.9244735)
= 2.530981

In [51]:
#sim_users[colnames(sim_users) == 'Toby']
#sim_users[,6]
#predicting the unknown values
#seperating the non rated movies of Toby
rating_critic  = setDT(movie_ratings[colnames(movie_ratings)[6]],keep.rownames = TRUE)[]
names(rating_critic) = c('title','rating')
titles_na_critic = rating_critic$title[is.na(rating_critic$rating)]
ratings_t =ratings[ratings$title %in% titles_na_critic,]
#ratings_t

In [52]:
#add similarity values for each user as new variable
x = (setDT(data.frame(sim_users[,6]),keep.rownames = TRUE)[])
names(x) = c('critic','similarity')
ratings_t =  merge(x = ratings_t, y = x, by = "critic", all.x = TRUE)

In [53]:
#mutiply rating with similarity values
ratings_t$sim_rating = ratings_t$rating*ratings_t$similarity
#predicting the non rated titles
result = ratings_t %>% group_by(title) %>% summarise(sum(sim_rating)/sum(similarity))

In [58]:
#function to make recommendations 
generateRecommendations <- function(userId){
  rating_critic  = setDT(movie_ratings[colnames(movie_ratings)[userId]],keep.rownames = TRUE)[]
  names(rating_critic) = c('title','rating')
  titles_na_critic = rating_critic$title[is.na(rating_critic$rating)]
  ratings_t =ratings[ratings$title %in% titles_na_critic,]
  #add similarity values for each user as new variable
  x = (setDT(data.frame(sim_users[,userId]),keep.rownames = TRUE)[])
  names(x) = c('critic','similarity')
  ratings_t =  merge(x = ratings_t, y = x, by = "critic", all.x = TRUE)
  #mutiply rating with similarity values
  ratings_t$sim_rating = ratings_t$rating*ratings_t$similarity
  #predicting the non rated titles
  result = ratings_t %>% group_by(title) %>% summarise(sum(sim_rating)/sum(similarity))
  return(result)
}

Ref:- Building Recommendation Engines- Suresh Kumar Gorakala 
https://www.amazon.com/Building-Recommendation-Engines-Suresh-Gorakala/dp/1785884859