# Recommendations

## Types of Recommendations

1. Knowledge Based Recommendations
2. Collaborative Filtering Based Recommendations
    - Model Based
    - Neighborhood Based
3. Content Based Recommendations

## Similarity Metrics
1. Pearson's correlation coefficient
2. Spearman's correlation coefficient
3. Kendall's Tau
4. Euclidean Distance
5. Manhattan Distance

## Business Cases For Recommendations

1. Relevance
2. Novelty
3. Serendipity
4. Increased Diversity

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Read in the datasets
movies = pd.read_csv('https://raw.githubusercontent.com/sidooms/MovieTweetings/master/latest/movies.dat', delimiter='::', header=None, names=['movie_id', 'movie', 'genre'], dtype={'movie_id': object}, engine='python')
reviews = pd.read_csv('https://raw.githubusercontent.com/sidooms/MovieTweetings/master/latest/ratings.dat', delimiter='::', header=None, names=['user_id', 'movie_id', 'rating', 'timestamp'], dtype={'movie_id': object, 'user_id': object, 'timestamp': object}, engine='python')

In [2]:
movies.head()

Unnamed: 0,movie_id,movie,genre
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short
1,10,La sortie des usines Lumière (1895),Documentary|Short
2,12,The Arrival of a Train (1896),Documentary|Short
3,25,The Oxford and Cambridge University Boat Race ...,
4,91,Le manoir du diable (1896),Short|Horror


In [3]:
reviews.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,114508,8,1381006850
1,2,358273,9,1579057827
2,2,10039344,5,1578603053
3,2,6751668,9,1578955697
4,2,7131622,8,1579559244


In [4]:
movies.shape

(35253, 3)

In [5]:
reviews.shape

(856178, 4)

In [6]:
movies.genre.nunique()

2727

In [7]:
reviews.user_id.nunique()

66653

In [9]:
movies.isnull().sum()

movie_id      0
movie         0
genre       251
dtype: int64

In [10]:
movies.head()

Unnamed: 0,movie_id,movie,genre
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short
1,10,La sortie des usines Lumière (1895),Documentary|Short
2,12,The Arrival of a Train (1896),Documentary|Short
3,25,The Oxford and Cambridge University Boat Race ...,
4,91,Le manoir du diable (1896),Short|Horror


In [38]:
movies['year'] = movies.movie.apply(lambda x: x[-5:-1])

In [39]:
assert set([len(i) for i in movies.year]) == {4}

In [41]:
movies.head()

Unnamed: 0,movie_id,movie,genre,year
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895
2,12,The Arrival of a Train (1896),Documentary|Short,1896
3,25,The Oxford and Cambridge University Boat Race ...,,1895
4,91,Le manoir du diable (1896),Short|Horror,1896


In [42]:
century = []

for i in movies.year:
    
    if i[:2] == '18':
        century.append("1800's")
    elif i[:2] == '19':
        century.append("1900's")
    elif i[:2] == '20':
        century.append("2000's")
    else:
        century.append("out of century")
        
movies['century'] = century

In [73]:
# number of different genres
genres = []
for val in movies.genre:
    try:
        genres.extend(val.split('|'))
    except AttributeError:
        pass

genres = set(genres)
print("The number of genres is {}.".format(len(genres)))

The number of genres is 28.


In [97]:
# Create dummy variables for all genres
movies.genre.fillna('',inplace=True)

for g in genres:
    movies[g] = movies.genre.apply(lambda x: int(g in x))

In [98]:
movies.head()

Unnamed: 0,movie_id,movie,genre,year,century,Sport,Western,Crime,Fantasy,Comedy,...,Documentary,Adventure,Mystery,Short,Musical,Adult,Drama,Family,Reality-TV,Animation
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1800's,0,0,0,0,0,...,1,0,0,1,0,0,0,0,0,0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1800's,0,0,0,0,0,...,1,0,0,1,0,0,0,0,0,0
2,12,The Arrival of a Train (1896),Documentary|Short,1896,1800's,0,0,0,0,0,...,1,0,0,1,0,0,0,0,0,0
3,25,The Oxford and Cambridge University Boat Race ...,,1895,1800's,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,91,Le manoir du diable (1896),Short|Horror,1896,1800's,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


In [100]:
movies = pd.concat([movies, pd.get_dummies(movies.century)], axis=1)

In [105]:
reviews

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,0114508,8,1381006850
1,2,0358273,9,1579057827
2,2,10039344,5,1578603053
3,2,6751668,9,1578955697
4,2,7131622,8,1579559244
...,...,...,...,...
856173,66652,1843866,10,1396584788
856174,66652,1951264,10,1385101263
856175,66652,2267998,8,1415578263
856176,66652,2582846,10,1402022562


In [110]:
reviews['date'] = pd.to_datetime(reviews.timestamp, unit='s')

In [112]:
movies.to_csv('movies_clean.csv',index=False)
reviews.to_csv('reviews_clean.csv',index=False)