# Übung 2.1 - Content Based Recommender - Ähnliche Filme

Die Vorschläge welche wir in der 1. Übung gemacht haben, sind noch nicht wirklich gut. Denn alle Personen erhalten die gleichen Vorschläge, unabhängig vom Geschmack der Person. Um bessere Vorschläge machen zu können, werden wir nun die Meta-Daten der Filme hinzuziehen. So können wir z.B. nachdem eine Person einen Film geschaut hat, ähnliche Filme vorschlagen. z.B. Filme mit einem ähnlichen Cast oder ähnlichen Beschreibung.

In [1]:
import ast
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity, pairwise_distances

Einlesen der Film Metadaten

In [2]:
list_columns = ['genres', 'keywords', 'production_companies', 'production_countries', 'spoken_languages', 'cast', 'director', 'producer', 'writer', 'music']

movies = pd.read_csv('data/movies.csv', keep_default_na=False, converters={col: ast.literal_eval for col in list_columns})
movies = movies.set_index('title', drop=False)

In [3]:
movies.head()

Unnamed: 0_level_0,movieId,title,genres,release_date,runtime,tagline,overview,keywords,production_companies,production_countries,spoken_languages,cast,director,producer,writer,music,vote_average,vote_count,popularity
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
Inception,0,Inception,"[Action, Thriller, Science Fiction, Mystery, A...",2010-07-14,148.0,Your mind is the scene of the crime.,"Cobb, a skilled thief who commits corporate es...","[loss of lover, dream, kidnapping, sleep, subc...","[Legendary Pictures, Warner Bros., Syncopy]","[United Kingdom, United States of America]",[English],"[Leonardo DiCaprio, Joseph Gordon-Levitt, Elle...",[Christopher Nolan],"[Christopher Nolan, Emma Thomas]",[],[],8.1,14075,29.108149
The Dark Knight,1,The Dark Knight,"[Drama, Action, Crime, Thriller]",2008-07-16,152.0,Why So Serious?,Batman raises the stakes in his war on crime. ...,"[dc comics, crime fighter, secret identity, sc...","[DC Comics, Legendary Pictures, Warner Bros., ...","[United Kingdom, United States of America]","[English, 普通话]","[Christian Bale, Michael Caine, Heath Ledger, ...",[Christopher Nolan],"[Charles Roven, Christopher Nolan, Emma Thomas...",[],[],8.3,12269,123.167259
Avatar,2,Avatar,"[Action, Adventure, Fantasy, Science Fiction]",2009-12-10,162.0,Enter the World of Pandora.,"In the 22nd century, a paraplegic Marine is di...","[culture clash, future, space war, space colon...","[Ingenious Film Partners, Twentieth Century Fo...","[United States of America, United Kingdom]","[English, Español]","[Sam Worthington, Zoe Saldana, Sigourney Weave...",[James Cameron],"[James Cameron, Jon Landau]",[James Cameron],[],7.2,12114,185.070892
The Avengers (2012),3,The Avengers (2012),"[Science Fiction, Action, Adventure]",2012-04-25,143.0,Some assembly required.,When an unexpected enemy emerges and threatens...,"[new york, shield, marvel comic, superhero, ba...","[Paramount Pictures, Marvel Studios]",[United States of America],[English],"[Robert Downey Jr., Chris Evans, Mark Ruffalo,...",[Joss Whedon],[Kevin Feige],[],[],7.4,12000,89.887648
Deadpool,4,Deadpool,"[Action, Adventure, Comedy]",2016-02-09,108.0,Witness the beginning of a happy ending,Deadpool tells the origin story of former Spec...,"[anti hero, mercenary, marvel comic, superhero...","[Twentieth Century Fox Film Corporation, Marve...",[United States of America],[English],"[Ryan Reynolds, Morena Baccarin, Ed Skrein, T....",[Tim Miller],"[Ryan Reynolds, Simon Kinberg, Lauren Shuler D...",[],[Junkie XL],7.4,11444,187.860492


### Movie Description Based Recommender
Wir werden nun einen Recommender implementieren, welcher die Beschreibungen der Filme beachtet.\
Hinweis: [sklearn.feature_extraction.text.TfidfVectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html)

In [19]:
# Transformiere die descriptions mit dem TFidfVectorizer zu einer tfidf-Matrix
tf = TfidfVectorizer(analyzer='word', stop_words='english')
tfidf_matrix = tf.fit_transform(movies['tagline'] + ' ' + movies['overview'])
tfidf_matrix.shape


(9025, 30483)

Hinweis: [sklearn.feature_extraction.text.TfidfVectorizer.get_feature_names_out](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer.get_feature_names_out)

In [20]:
# Um die tfidf_matrix besser zu verstehen können wir diese wieder in ein DataFrame umwandeln, 
# mit den Filmtitel als Zeilen (index) und Feature-Names in den Spalten (columns)
text_features = pd.DataFrame(tfidf_matrix.toarray(), index=movies.index, columns=tf.get_feature_names_out())
text_features

Unnamed: 0_level_0,00,000,007,01,05,05pm,06,08,09,10,...,élan,émigré,état,étienne,évocateur,ôtomo,østergaard,žižek,ˈfil,ˌrän
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Inception,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
The Dark Knight,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Avatar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
The Avengers (2012),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Deadpool,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
The Fern Flower,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Wonderland,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
To Have (Or Not),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Swedish Auto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [21]:
text_features.loc['Toy Story'].sort_values(ascending=False).head(20)

buzz             0.497922
woody            0.444590
andy             0.428771
lightyear        0.189024
toys             0.161764
aside            0.153902
separate         0.143716
plots            0.142924
differences      0.141424
afraid           0.139359
circumstances    0.137479
happily          0.135753
duo              0.133654
birthday         0.131294
room             0.122936
scene            0.121770
losing           0.118374
led              0.112888
brings           0.111280
owner            0.109787
Name: Toy Story, dtype: float64

#### Cosine Similarity

Wir werden nun eine Ähnlichkeits-Matrix aller Filme berechnen. Verwende hierfür die Cosine-Similarity, welche folgendermassen definiert ist:

$cosine(x,y) = \frac{x. y^\intercal}{||x||.||y||} $

Hinweis: [sklearn.metrics.pairwise.cosine_similarity](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html)

In [22]:
# Berechne die Ähnlichkeits-Matrix
cosine_sime = cosine_similarity(text_features.values, text_features.values)
cosine_sime.shape

(9025, 9025)

In [23]:
# Wir können Ähnlichkeitsmatrix ebenfalls wieder in ein DataFrame umwandeln, 
# jetzt mit dem Film Titel im Index und auch in den Spalten
text_similarities = pd.DataFrame(cosine_sime, index=movies.index, columns=movies.index)
text_similarities.head()

title,Inception,The Dark Knight,Avatar,The Avengers (2012),Deadpool,Interstellar,Django Unchained,Guardians of the Galaxy,Fight Club,The Hunger Games,...,Life After Tomorrow,Life Is Sacred,"This World, Then the Fireworks",Hav Plenty,Dear Jesse,The Fern Flower,Wonderland,To Have (Or Not),Swedish Auto,Men with Guns
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Inception,1.0,0.013137,0.0,0.0,0.004119,0.0,0.007773,0.031783,0.0,0.019787,...,0.0,0.0,0.0,0.0,0.0,0.0,0.004991,0.006115,0.006811,0.0
The Dark Knight,0.013137,1.0,0.0,0.016492,0.0,0.0,0.036738,0.0,0.0,0.016152,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Avatar,0.0,0.0,1.0,0.009178,0.0,0.0,0.0,0.0,0.0,0.00521,...,0.0,0.011221,0.0,0.0,0.0,0.0,0.0,0.0,0.010981,0.0
The Avengers (2012),0.0,0.016492,0.009178,1.0,0.0,0.0,0.0,0.013542,0.0,0.004654,...,0.0,0.010024,0.0,0.0,0.0,0.0,0.0,0.066166,0.009809,0.0
Deadpool,0.004119,0.0,0.0,0.0,1.0,0.0,0.006275,0.0,0.004492,0.029084,...,0.0,0.0,0.0,0.007663,0.0,0.0,0.008716,0.029266,0.013205,0.0


Da wir nun die paarweisen Ähnlichkeiten aller Filme haben, ist der nächste Schritt nun die 20 änlichsten Filme eines Filmes zu finden.

In [24]:
# implementiere nun folgende Funktion die basierend auf einem Film-Titel die 20 ähnlichsten Filme zurückgibt
def get_recommendations(similarities, title):
    if title not in similarities:
        return ''
    return similarities[title].sort_values(ascending=False).head(20)

Jetzt können wir für ein paar Filme die ähnlichste Filme ermitteln

In [25]:
get_recommendations(text_similarities, 'The Dark Knight').head(30)

title
The Dark Knight                            1.000000
The Dark Knight Rises                      0.305602
Batman Returns                             0.240747
Batman: The Dark Knight Returns, Part 2    0.234297
Batman Forever                             0.218458
Batman: Under the Red Hood                 0.211061
Batman: Mask of the Phantasm               0.184428
Batman: Year One                           0.183901
Batman: The Dark Knight Returns, Part 1    0.174043
Batman                                     0.152652
Batman Begins                              0.152145
JFK                                        0.120650
Batman v Superman: Dawn of Justice         0.117945
Criminal Law                               0.117328
Q & A                                      0.109159
To End All Wars                            0.101575
Batman & Robin                             0.099902
Law Abiding Citizen                        0.099254
The Wrong Man                              0.090724
The Ro

### Metadata Based Recommender

Der vorherige Recommender basiert nur auf den Beschreibungen der Filme. Ein grosser Anteil was einen Film ausmacht sind aber auch Schauspieler und der Direktor.
Darum werden wir nun einen Recommender bauen, welche diese Aspekte auch einbezieht.

Zuerst erstellen wir eine Funktion um aus einer Spalte *col* mit Listen ein One Hot Encoding zu erstellen
Dazu wählen wir die ersten *take_n* Elemente jedes Eintrag, erstellen ein One Hot Encoding der Elemente welche mindestens *min_occurence* auftreten

In [26]:
# Wir verkleinern das Dataset etwas um die Berechnungen etwas zu beschleunigen
movies_small = movies[movies.vote_count > 1000]
movies_small.shape

(1019, 19)

In [27]:
# Betrachten wir die Spalte mit den Schauspielern
col = movies_small.cast
take_n = 3
min_occurence = 2
col

title
Inception              [Leonardo DiCaprio, Joseph Gordon-Levitt, Elle...
The Dark Knight        [Christian Bale, Michael Caine, Heath Ledger, ...
Avatar                 [Sam Worthington, Zoe Saldana, Sigourney Weave...
The Avengers (2012)    [Robert Downey Jr., Chris Evans, Mark Ruffalo,...
Deadpool               [Ryan Reynolds, Morena Baccarin, Ed Skrein, T....
                                             ...                        
Secret Window          [Johnny Depp, John Turturro, Maria Bello, Timo...
Ouija                  [Olivia Cooke, Ana Coto, Daren Kagasoff, Bianc...
One Day                [Anne Hathaway, Jim Sturgess, Patricia Clarkso...
Goldfinger             [Sean Connery, Honor Blackman, Gert Fröbe, Shi...
The Ugly Truth         [Katherine Heigl, Gerard Butler, Eric Winter, ...
Name: cast, Length: 1019, dtype: object

Hinweis: auf einer pandas.Series welche aus Listen besteht, können auch die Stringfuktionen angewendet werden\
Ersten n Elemente auswählen: [pandas.Series.str.slice](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.slice.html)\
Elemente der Liste zusammenführen: [pandas.Series.str.join](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.join.html)\
Onehot-Encoding eines Strings: [pandas.Series.str.get_dummies](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.get_dummies.html)

In [32]:
# Selectieren der ersten take_n Elemente 
col_n = col.apply(lambda x: x[:take_n] if len(x) > take_n else x)
col_n

title
Inception              [Leonardo DiCaprio, Joseph Gordon-Levitt, Elle...
The Dark Knight            [Christian Bale, Michael Caine, Heath Ledger]
Avatar                  [Sam Worthington, Zoe Saldana, Sigourney Weaver]
The Avengers (2012)       [Robert Downey Jr., Chris Evans, Mark Ruffalo]
Deadpool                     [Ryan Reynolds, Morena Baccarin, Ed Skrein]
                                             ...                        
Secret Window                  [Johnny Depp, John Turturro, Maria Bello]
Ouija                           [Olivia Cooke, Ana Coto, Daren Kagasoff]
One Day                 [Anne Hathaway, Jim Sturgess, Patricia Clarkson]
Goldfinger                    [Sean Connery, Honor Blackman, Gert Fröbe]
The Ugly Truth             [Katherine Heigl, Gerard Butler, Eric Winter]
Name: cast, Length: 1019, dtype: object

In [33]:
# Zusammenführen der Elemente der Listen, mit '|' getrennt (da get_dummies default-Seperator | ist) 
col_n_str = col_n.apply(lambda x: '|'.join(x))
col_n_str

title
Inception              Leonardo DiCaprio|Joseph Gordon-Levitt|Ellen Page
The Dark Knight                Christian Bale|Michael Caine|Heath Ledger
Avatar                      Sam Worthington|Zoe Saldana|Sigourney Weaver
The Avengers (2012)           Robert Downey Jr.|Chris Evans|Mark Ruffalo
Deadpool                         Ryan Reynolds|Morena Baccarin|Ed Skrein
                                             ...                        
Secret Window                      Johnny Depp|John Turturro|Maria Bello
Ouija                               Olivia Cooke|Ana Coto|Daren Kagasoff
One Day                     Anne Hathaway|Jim Sturgess|Patricia Clarkson
Goldfinger                        Sean Connery|Honor Blackman|Gert Fröbe
The Ugly Truth                 Katherine Heigl|Gerard Butler|Eric Winter
Name: cast, Length: 1019, dtype: object

In [34]:
# Onehot-Endcoding mit get_dummies()
col_one_hot = col_n_str.str.get_dummies()
col_one_hot

Unnamed: 0_level_0,A.J. Cook,Aaron Eckhart,Aaron Paul,Aaron Taylor-Johnson,Aasif Mandvi,Abbie Cornish,Abigail Breslin,Abigail Hargrove,Adam Baldwin,Adam Driver,...,Zach Galligan,Zach Gilford,Zachary Levi,Zachary Quinto,Zhang Ziyi,Zoe Saldana,Zooey Deschanel,Zoë Bell,Óscar Jaenada,Моррис Честнат
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Inception,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
The Dark Knight,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Avatar,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
The Avengers (2012),0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Deadpool,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Secret Window,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Ouija,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
One Day,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Goldfinger,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Hinweis: [pandas.DataFrame.sum](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sum.html)

In [36]:
# Filtern der Spalten, mit weniger als min_occurence Elementen
col_one_hot = col_one_hot.loc[:, col_one_hot.sum() >= min_occurence]
col_one_hot

Unnamed: 0_level_0,Aaron Eckhart,Aaron Taylor-Johnson,Abbie Cornish,Adam Sandler,Adrien Brody,Al Pacino,Alan Rickman,Alan Tudyk,Albert Brooks,Alden Ehrenreich,...,William H. Macy,William Moseley,Winona Ryder,Woody Allen,Woody Harrelson,Zac Efron,Zach Galifianakis,Zachary Quinto,Zoe Saldana,Zooey Deschanel
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Inception,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
The Dark Knight,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Avatar,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
The Avengers (2012),0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Deadpool,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Secret Window,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Ouija,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
One Day,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Goldfinger,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Für nun die einzelnen Schritte von oben, in folgender Funktion zusammen, um für eine beliebige Spalte ein Onehot-Encoding zu machen

In [37]:
# One Hot Encoder
def one_hot_encoder(col, take_n=3, min_occurence=2):
  col_n = col.apply(lambda x: x[:take_n] if len(x) > take_n else x)
  col_n_str = col_n.apply(lambda x: '|'.join(x))
  col_one_hot = col_n_str.str.get_dummies()
  col_one_hot = col_one_hot.loc[:, col_one_hot.sum() >= min_occurence]
  return col_one_hot


Berechne nun mit one_hot_encoder verschiedene Splaten und führe sie in ein grosses features-DataFrame zusammen\
Hinweis: [pandas.concat](https://pandas.pydata.org/docs/reference/api/pandas.concat.html)

In [38]:
# Erstelle meta_features-DataFrame
# Erstelle meta_features-DataFrame
meta_features = pd.concat([
    one_hot_encoder(movies_small.cast),
    one_hot_encoder(movies_small.director),
    #one_hot_encoder(movies_small.producer),
    #one_hot_encoder(movies_small.writer),
    #one_hot_encoder(movies_small.music),
    one_hot_encoder(movies_small.genres, -1),
    #one_hot_encoder(movies_small.keywords, -1),
    one_hot_encoder(movies_small.production_companies),
    #one_hot_encoder(movies_small.production_countries),
    one_hot_encoder(movies_small.spoken_languages),
], axis=1)

meta_features = meta_features.set_index(movies_small.title)

In [39]:
meta_features

Unnamed: 0_level_0,Aaron Eckhart,Aaron Taylor-Johnson,Abbie Cornish,Adam Sandler,Adrien Brody,Al Pacino,Alan Rickman,Alan Tudyk,Albert Brooks,Alden Ehrenreich,...,עִבְרִית,اردو,العربية,فارسی,हिन्दी,ภาษาไทย,广州话 / 廣州話,日本語,普通话,한국어/조선말
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Inception,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
The Dark Knight,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
Avatar,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
The Avengers (2012),0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Deadpool,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Secret Window,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Ouija,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
One Day,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Goldfinger,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


In [40]:
# Schaue die Features für einen Film an, welche auf 1 gesetzt sind
meta_features.loc['The Dark Knight'].sort_values(ascending=False).head(30)

Heath Ledger          1
Christopher Nolan     1
Legendary Pictures    1
Warner Bros.          1
DC Comics             1
Michael Caine         1
Action                1
English               1
Christian Bale        1
Crime                 1
Drama                 1
普通话                   1
Timur Bekmambetov     0
Todd Phillips         0
Wes Anderson          0
Tim Story             0
Tim Johnson           0
Tom Hooper            0
Tim Burton            0
Tom McGrath           0
Tom Shadyac           0
Tom Tykwer            0
Tony Scott            0
Vicky Jenson          0
Aaron Eckhart         0
Wes Ball              0
Wes Craven            0
Wilfred Jackson       0
Will Gluck            0
Terry Jones           0
Name: The Dark Knight, dtype: int64

Nun können wir eine neue Ählichkeits-Matrix berechnen mit unseren Meta-Features und z.B. der Jaccard-Ähnlichkeit\
Hinweis: [sklearn.metrics.pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html)

In [None]:
# Features für einen Film
meta_features.loc['The Dark Knight'].sort_values(ascending=False).head(30)

Jetzt können wir, wie zuvor mit der neuen Ähnlichkeitsmatrix wieder Vorschläge generieren

In [None]:
get_recommendations(meta_similarities_jaccard, 'The Dark Knight').head(10)