# Übung 4 - Matrix Faktorisierung

In dieser Übung verwenden wir die Library [Surprise](http://surpriselib.com/) um ein Matrix Faktorisierungs Model zu trainieren

In [1]:
import numpy as np
import pandas as pd
from surprise import Dataset, Reader, SVD

import warnings; warnings.simplefilter('ignore')

## Einlesen der Daten

In [2]:
movies = pd.read_csv('data/movies.csv')
movies = movies[['movieId', 'title']]

In [3]:
ratings = pd.read_csv('data/ratings.csv')

In [4]:
# verwerfen aller Film mit weniger als 25 ratings
ratings = ratings.groupby('movieId').filter(lambda x: x['movieId'].count() > 25)

# movieIds anpassen
movies = movies[movies.movieId.isin(set(ratings.movieId))].reset_index(drop=True)
ratings['movieId'] = ratings.movieId.map({mi: i for i, mi in movies.movieId.items()})
movies['movieId'] = movies.movieId.map({mi: i for i, mi in movies.movieId.items()})

Daten in Dataset von Surprise laden\
Hinweis: [Getting Started - dataset from a pandas dataframe](https://surprise.readthedocs.io/en/stable/getting_started.html?highlight=Reader#load-dom-dataframe-py)

In [5]:
# dataset einlesen
reader = Reader(rating_scale=(1, 5))
dataset = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

Jetzt erstellen wir das trainset und testset\
Hinweis:\
[surprise.dataset.DatasetAutoFolds.build_full_trainset](https://surprise.readthedocs.io/en/stable/dataset.html#surprise.dataset.DatasetAutoFolds.build_full_trainset)\
[surprise.Trainset.build_anti_testset](https://surprise.readthedocs.io/en/stable/trainset.html#surprise.Trainset.build_anti_testset)\
[surprise.Trainset.build_testset](https://surprise.readthedocs.io/en/stable/trainset.html#surprise.Trainset.build_testset)



In [6]:
# trainset erstellen
trainset = dataset.build_full_trainset()
# testset erstellen aus anti-testset (alle fehlende Bewertungen) + testset aus trainset
testset = trainset.build_anti_testset() + trainset.build_testset()

Nun können wir die Matrix Faktorisierung berechnen und Vorschläge generieren\
Hinweis:
[how-to-get-the-top-n-recommendations-for-each-user](https://surprise.readthedocs.io/en/stable/FAQ.html#how-to-get-the-top-n-recommendations-for-each-user)

In [7]:
# recommender initialisieren und trainieren
recommender = SVD(n_factors=150, n_epochs=100)
recommender.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x11efba1a0>

In [8]:
# alle predictions machen und in einem DataFrame mit den Filenamen mergen
recommendations = pd.DataFrame(recommender.test(testset))
recommendations = recommendations.merge(movies, left_on='iid', right_on='movieId')

In [9]:
ratings.sort_values(['userId', 'movieId'])

Unnamed: 0,userId,movieId,rating,timestamp,title
6,0,358,2.0,1260759108,Antz
18,0,384,3.0,1260759179,Dumbo
14,0,411,3.5,1260759125,Dracula
5,0,426,2.5,1260759113,The Fly (1986)
8,0,464,2.0,1260759151,The Deer Hunter
...,...,...,...,...,...
99714,670,891,3.0,1064891236,A Grand Day Out
99756,670,961,2.5,1063503739,City Slickers II: The Legend of Curly's Gold
99720,670,965,4.0,1064891534,Searching for Bobby Fischer
99744,670,969,4.0,1063502750,Roger & Me


In [10]:
recommendations.sort_values(['uid', 'iid']).head(20)

Unnamed: 0,uid,iid,r_ui,est,details,movieId,title
285175,0,0,3.669661,3.059559,{'was_impossible': False},0,Inception
54351,0,1,3.669661,3.425769,{'was_impossible': False},1,The Dark Knight
455609,0,2,3.669661,2.821915,{'was_impossible': False},2,Avatar
540155,0,3,3.669661,2.461706,{'was_impossible': False},3,The Avengers (2012)
334829,0,4,3.669661,2.9277,{'was_impossible': False},4,Interstellar
559614,0,5,3.669661,3.059636,{'was_impossible': False},5,Django Unchained
348920,0,6,3.669661,2.389299,{'was_impossible': False},6,Guardians of the Galaxy
57706,0,7,3.669661,3.171762,{'was_impossible': False},7,Fight Club
284504,0,8,3.669661,2.762307,{'was_impossible': False},8,The Hunger Games
341539,0,9,3.669661,2.848403,{'was_impossible': False},9,Mad Max: Fury Road


In [11]:
# Ratings eines User ausgeben
user_id = 45
ratings[ratings.userId == user_id].sort_values('rating', ascending=False).head(20)

Unnamed: 0,userId,movieId,rating,timestamp,title
7286,45,78,5.0,1366393308,Ratatouille
7287,45,725,5.0,1366389683,The Craft
7320,45,20,5.0,1366391137,The Lord of the Rings: The Two Towers
7319,45,754,5.0,1366391610,A Christmas Story
7316,45,939,5.0,1366389759,Repo Man
7315,45,46,5.0,1366393591,Harry Potter and the Deathly Hallows: Part 1
7314,45,601,5.0,1366389699,Entrapment
7313,45,647,5.0,1366389671,Analyze This
7311,45,16,5.0,1366390910,The Lord of the Rings: The Return of the King
7310,45,34,5.0,1366392989,Back to the Future


In [12]:
# Recommendations für einen User ausgeben
recommendations[recommendations.uid == user_id].sort_values('est', ascending=False).head(20)

Unnamed: 0,uid,iid,r_ui,est,details,movieId,title
666345,45,509,3.669661,5.0,{'was_impossible': False},509,Cinema Paradiso
465045,45,123,3.669661,5.0,{'was_impossible': False},123,District 9
436194,45,129,3.669661,5.0,{'was_impossible': False},129,The Curious Case of Benjamin Button
439547,45,29,3.669661,5.0,{'was_impossible': False},29,Inglourious Basterds
440890,45,252,3.669661,5.0,{'was_impossible': False},252,Moon
184560,45,418,3.669661,5.0,{'was_impossible': False},418,Office Space
181208,45,67,3.669661,5.0,{'was_impossible': False},67,The Truman Show
445587,45,346,3.669661,5.0,{'was_impossible': False},346,Zoolander
179863,45,126,3.669661,5.0,{'was_impossible': False},126,The Godfather: Part II
179197,45,293,3.669661,5.0,{'was_impossible': False},293,The Godfather: Part III
