# <center>Movie Recommendation Systems</center>

So, let’s make a real world Movie Recommendation System ourselves.

### Requirements:
- Data/Dataset: https://www.kaggle.com/code/rounakbanik/movie-recommender-systems/input
- Programming language: Python v3.9 or higher.
- IDE / environment: Jupiter Notebook.
- Python Packages: Numpy, Pandas, scikit-learn.


Our Movie Recommendation System is going to include the following:
- Classic / simple recommender system.
- Content-based recommender system.
- Metadata-based recommender system.
- User-based recommender system (Hybrid Approaches).


# Data Preparation
First, we need to load & clean up the date before start using it.

In [21]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [22]:
movies_df = pd.read_csv("data/movies_metadata.csv", low_memory=False)

movies_df

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45461,False,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n...",http://www.imdb.com/title/tt6209470/,439050,tt6209470,fa,رگ خواب,Rising and falling between a man and woman.,...,,0.0,90.0,"[{'iso_639_1': 'fa', 'name': 'فارسی'}]",Released,Rising and falling between a man and woman,Subdue,False,4.0,1.0
45462,False,,0,"[{'id': 18, 'name': 'Drama'}]",,111109,tt2028550,tl,Siglo ng Pagluluwal,An artist struggles to finish his work while a...,...,2011-11-17,0.0,360.0,"[{'iso_639_1': 'tl', 'name': ''}]",Released,,Century of Birthing,False,9.0,3.0
45463,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",,67758,tt0303758,en,Betrayal,"When one of her hits goes wrong, a professiona...",...,2003-08-01,0.0,90.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,A deadly game of wits.,Betrayal,False,3.8,6.0
45464,False,,0,[],,227506,tt0008536,en,Satana likuyushchiy,"In a small town live two brothers, one a minis...",...,1917-10-21,0.0,87.0,[],Released,,Satan Triumphant,False,0.0,0.0




Removing / excluding unnecessary columns.

In [23]:
movies_df.drop(columns=[
    "homepage",
    "poster_path",
    "release_date",
    "runtime",
    "revenue", 
    "video", 
    "status", 
    "spoken_languages", 
    "belongs_to_collection", 
    "production_countries", 
    "original_language", 
    "original_title", 
    #"production_companies",
    "imdb_id",
    "popularity",
    "budget",
    "adult"
], inplace=True)

In [24]:
movies_df.dropna(subset=["title", "vote_average", "vote_count"], inplace=True)
movies_df["tagline"].replace(np.nan, "", inplace=True)
movies_df["overview"].replace(np.nan, "", inplace=True)


Formating `genres` and `production_companies` as a python list

In [25]:
movies_df["genres"] = movies_df["genres"].apply(eval).apply(lambda l: [f"{d['name']}" for d in l] if isinstance(l, list) else [])
movies_df["production_companies"] = movies_df["production_companies"].apply(eval).apply(lambda l: [f"{d['name']}" for d in l] if isinstance(l, list) else [])

Casting `id` and `cote_count` columns into integers values

In [26]:
#movies_df["adult"] = movies_df["adult"].apply(lambda x: True if x == "True" else False).astype(bool)
movies_df["id"] = movies_df["id"].astype(int)
movies_df["vote_count"] = movies_df["vote_count"].astype(int)

In [27]:
movies_df.isnull().sum()

genres                  0
id                      0
overview                0
production_companies    0
tagline                 0
title                   0
vote_average            0
vote_count              0
dtype: int64

In [28]:
movies_df.dtypes

genres                   object
id                        int64
overview                 object
production_companies     object
tagline                  object
title                    object
vote_average            float64
vote_count                int64
dtype: object

#### Now `movies_df` is a clean dataframe. 

In [29]:
movies_df.head()

Unnamed: 0,genres,id,overview,production_companies,tagline,title,vote_average,vote_count
0,"[Animation, Comedy, Family]",862,"Led by Woody, Andy's toys live happily in his ...",[Pixar Animation Studios],,Toy Story,7.7,5415
1,"[Adventure, Fantasy, Family]",8844,When siblings Judy and Peter discover an encha...,"[TriStar Pictures, Teitler Film, Interscope Co...",Roll the dice and unleash the excitement!,Jumanji,6.9,2413
2,"[Romance, Comedy]",15602,A family wedding reignites the ancient feud be...,"[Warner Bros., Lancaster Gate]",Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,6.5,92
3,"[Comedy, Drama, Romance]",31357,"Cheated on, mistreated and stepped on, the wom...",[Twentieth Century Fox Film Corporation],Friends are the people who let you be yourself...,Waiting to Exhale,6.1,34
4,[Comedy],11862,Just when George Banks has recovered from his ...,"[Sandollar Productions, Touchstone Pictures]",Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,5.7,173


<hr>
<br>
<br>

# Classic / Simple recommender System

Now, let create a classic / simple recommender system, <br>to do so, we need to give / create a score for each movie to sort by, as well as adding the ability to filter by `genre`.


In [30]:
# Simple Recommender System

# Calculating the score
def score(vote_average, vote_count):
    return np.log((vote_average * vote_count) + 1)

def simple_recommender(df, n_results=10, genre=None):
    # make a copy of the data first
    df = df.copy()
    # Filter by genres.
    if isinstance(genre, (list, tuple)):
        df = df[df['genres'].apply(lambda x: set(genre) <= set(eval(str(x))))] # is a subset
    elif genre is not None:
        df = df[df['genres'].apply(lambda x: genre in str(x))]
    
    # Calgulating the score
    df["score"] = score(df['vote_average'], df['vote_count'])
    df.sort_values(by='score', ascending=False, ignore_index=True, inplace=True)
    
    return df.head(n_results)

In [31]:
simple_recommender(movies_df, n_results=10, genre=["Action", "Drama"])

Unnamed: 0,genres,id,overview,production_companies,tagline,title,vote_average,vote_count,score
0,"[Drama, Action, Crime, Thriller]",155,Batman raises the stakes in his war on crime. ...,"[DC Comics, Legendary Pictures, Warner Bros., ...",Why So Serious?,The Dark Knight,8.3,12269,11.531096
1,"[Action, Crime, Drama, Thriller]",49026,Following the death of District Attorney Harve...,"[Legendary Pictures, Warner Bros., DC Entertai...",The Legend Ends,The Dark Knight Rises,7.6,9263,11.161946
2,"[Action, Crime, Drama]",272,"Driven by tragedy, billionaire Bruce Wayne ded...","[DC Comics, Legendary Pictures, Warner Bros., ...",Evil fears the knight.,Batman Begins,7.5,7511,10.939045
3,"[Drama, Action, Thriller, War]",16869,"In Nazi-occupied France during World War II, a...","[Universal Pictures, A Band Apart, The Weinste...",Once upon a time in Nazi occupied France...,Inglourious Basterds,7.9,6598,10.861404
4,"[Action, Drama, Science Fiction]",263115,"In the near future, a weary Logan cares for an...","[Twentieth Century Fox Film Corporation, Donne...",His time has come,Logan,7.6,6310,10.77806
5,"[Action, Drama, Adventure]",98,"In the year 180, the death of emperor Marcus A...","[DreamWorks SKG, Universal Pictures, Scott Fre...",A Hero Will Rise.,Gladiator,7.9,5566,10.691317
6,"[Adventure, Drama, Action]",87827,"The story of an Indian boy named Pi, a zookeep...","[Ingenious Film Partners, Ingenious Media, Dun...",Believe The Unbelievable,Life of Pi,7.2,5912,10.658844
7,"[Action, Drama, Horror, Science Fiction, Thril...",72190,Life for former United Nations investigator Ge...,"[Paramount Pictures, GK Films, Skydance Produc...",Remember Philly!,World War Z,6.7,5683,10.547368
8,"[Drama, Horror, Action, Thriller, Science Fict...",6479,Robert Neville is a scientist who was unable t...,"[Village Roadshow Pictures, Original Film, Wee...",The last man on Earth is not alone,I Am Legend,6.9,4977,10.444133
9,"[Science Fiction, Action, Drama, Thriller]",119450,A group of scientists in San Francisco struggl...,"[Ingenious Media, Chernin Entertainment, TSG E...",One last chance for peace.,Dawn of the Planet of the Apes,7.3,4511,10.402179


As you can see in the table above, we've added the `score` column, and the results are sorted in desending order,<br>
and that brought us the Best Movies of all time for `action` and `drama` genres.

## Simple Recommender System usage / benefits:
- This type of Recommendation System (classic one) comes in handy / useful for showing `Trending`, or `Trending per genre`.
- Useful for making suggestions for new users (Those who have no history or preferences yet).


<hr>
<br>
<br>

# Content-Based Recommender System

In this type of recommendation system we're going to focus on movies content (e.g. `title`, `tagline`, `production_companies`, `overview`)<br>
This type of recommendation is a little bit computation hangry, so to achieve that, we do the following:<br>


- To address issue of the required computation power, not all the movies are going to be in the recommendation system,<br>
so, let's set a `vote` and `vote_count` threshold as a movie minimum requirements in order to enter the recommandation system.
- Once we got our movies dataframe `reduced_df` that is going to be used for recommendation system, <br>
We can use `TfidfVectorizer` from sklearn to convert text into vectors, this process called vectorazation / embedding.
- We got a `matrix` that represents `reduced_df` after the vectorazation process.<br>
by the way, this matrix could be caled a vector database or a vectorstore.
- moreover, we calculate a similarity score for each movie after vectorizing the `query` input.
- in the end, we append the `cos_similarity` column to the result dataframe, then sorting the last one on the descending order.

In [141]:
# Functions will be needed later

# returns an interger value (-1 to 1).
def cos_similarity(matrix, vec):
    return (matrix @ vec.T) / np.sqrt(np.sum(matrix ** 2, axis=1) * np.sum(vec.T ** 2, axis=0))


In [34]:
# Content Based Recommender System.
from sklearn.feature_extraction.text import TfidfVectorizer

# data needs to be reduced.
threshold = movies_df['vote_count'].quantile(0.95)
vote_mean = movies_df['vote_average'].mean() / 2

print("Vote_count threshold:", threshold)
print("Vote mean:", vote_mean)
reduced_df = movies_df[(movies_df['vote_count'] >= threshold) & (movies_df['vote_average'] >= vote_mean)] 

reduced_df = reduced_df.reset_index(drop=True)

print("Dataframe size: %s rows" % reduced_df.shape[0])


# Making a Vector database.
tf = TfidfVectorizer(analyzer='word', stop_words='english')

descriptions = reduced_df.apply(lambda s: f"{s['title']} {s['tagline']} {s['production_companies']} {s['overview']}", axis=1)
matrix = tf.fit_transform(descriptions).toarray()


def content_based_recommender(query, df, n_results=10):
    df = df.copy()

    # Calculate the similarity (cosine)
    vec = tf.transform([query]).toarray().ravel()
    
    # Append & sort to dataframe
    df['cos_similarity'] = cos_similarity(matrix, vec)
    df.sort_values(by='cos_similarity', ascending=False, ignore_index=True, inplace=True)

    return df.head(n_results)



Vote_count threshold: 434.0
Vote mean: 2.8091036075669447
Dataframe size: 2274 rows


In [35]:
content_based_recommender("Batman", reduced_df, n_results=10)

Unnamed: 0,genres,id,overview,production_companies,tagline,title,vote_average,vote_count,cos_similarity
0,"[Action, Fantasy]",364,"Having defeated the Joker, Batman now faces th...","[PolyGram Filmed Entertainment, Warner Bros.]","The Bat, the Cat, the Penguin.",Batman Returns,6.6,1706,0.536871
1,"[Action, Animation, Comedy, Family, Fantasy]",324849,In the irreverent spirit of fun that made “The...,"[Lin Pictures, Warner Bros. Animation, Warner ...",Always be yourself. Unless you can be Batman.,The Lego Batman Movie,7.2,1473,0.433553
2,"[Action, Animation]",40662,Batman faces his ultimate challenge as the mys...,"[DC Comics, Warner Bros. Animation]",Dare to Look Beneath the Hood.,Batman: Under the Red Hood,7.6,459,0.40184
3,"[Action, Crime, Fantasy]",414,The Dark Knight of Gotham City confronts a das...,"[Warner Bros., Polygram Filmed Entertainment]","Courage now, truth always...",Batman Forever,5.2,1529,0.369199
4,"[Action, Animation, Crime, Drama]",382322,"As Batman hunts for the escaped Joker, the Clo...","[DC Comics, Warner Bros. Animation]",The madness begins.,Batman: The Killing Joke,6.2,485,0.339751
5,"[Action, Crime, Drama]",272,"Driven by tragedy, billionaire Bruce Wayne ded...","[DC Comics, Legendary Pictures, Warner Bros., ...",Evil fears the knight.,Batman Begins,7.5,7511,0.325422
6,"[Drama, Action, Crime, Thriller]",155,Batman raises the stakes in his war on crime. ...,"[DC Comics, Legendary Pictures, Warner Bros., ...",Why So Serious?,The Dark Knight,8.3,12269,0.292171
7,"[Action, Crime, Fantasy]",415,Along with crime-fighting partner Robin and ne...,"[PolyGram Filmed Entertainment, Warner Bros.]",Strength. Courage. Honor. And loyalty.,Batman & Robin,4.2,1447,0.281644
8,"[Action, Adventure, Fantasy]",209112,Fearing the actions of a god-like Super Hero l...,"[DC Comics, Atlas Entertainment, Warner Bros.,...",Justice or revenge,Batman v Superman: Dawn of Justice,5.7,7189,0.270846
9,"[Action, Crime, Drama, Thriller]",49026,Following the death of District Attorney Harve...,"[Legendary Pictures, Warner Bros., DC Entertai...",The Legend Ends,The Dark Knight Rises,7.6,9263,0.261025


As you can see in the table above, using `Batman` as a query input leads in a best relevant results like:
- Batman Returns
- The Lego Batman Movie
- Batman: Under the Red Hood
- Batman Forever
- Batman: The Killing Joke
- The Dark Knight
- The Dark Knight Rises
So far everything is working as intended.

## Content-Based Recommender System Usage / Benefits:
This type of recommendation system is powerful and has various applications, for example:
- Could be used as an advanced web search engine.
- `Giving relevent (ads/posts/videos/products) recommendations basing on sound/microphone inputs as a query input (the same technique has been adopted by Facebook, Google & Amazon)`


<br>
<hr>
<br>
<br>

# Metadata-Based Recommender System

We perform the same approach as the content-based recommendation but for movies metadata.




In [35]:
# Metadata recommender System

credits_df = pd.read_csv("data/credits.csv")

credits_df

Unnamed: 0,cast,crew,id
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862
...,...,...,...
45471,"[{'cast_id': 0, 'character': '', 'credit_id': ...","[{'credit_id': '5894a97d925141426c00818c', 'de...",439050
45472,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109
45473,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758
45474,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506


In [41]:
# Useful functions

# Get movie first n casts from metadata dataframe
def get_movie_cast(s, limit=3):
    l = eval(str(s))
    return [f"{cast['name']} - {cast['character']}" for cast in l[:limit]]

# Get movie director from metadata dataframe
def get_movie_director(s):
    l = eval(str(s))
    return l[0]['name'] if l else ""

# Get a dataframe subset movies by movie ids
def get_movies_by_ids(df, ids, key='id'):
    return df[df[key].isin(ids)]

#### Cleaning up the credits/metadata dataframe.

In [37]:
casts = credits_df['cast'].apply(get_movie_cast)
directors = credits_df['crew'].apply(get_movie_director)

metadata_df = pd.DataFrame({"casts": casts, "director": directors, 'movie_id': credits_df['id'], 'genres': movies_df['genres']})

metadata_df.dropna(inplace=True)
metadata_df = metadata_df[metadata_df['movie_id'].isin(reduced_df['id'])]

metadata_df

Unnamed: 0,casts,director,movie_id,genres
0,"[Tom Hanks - Woody (voice), Tim Allen - Buzz L...",John Lasseter,862,"[Animation, Comedy, Family]"
1,"[Robin Williams - Alan Parrish, Jonathan Hyde ...",Larry J. Franco,8844,"[Adventure, Fantasy, Family]"
5,"[Al Pacino - Lt. Vincent Hanna, Robert De Niro...",Michael Mann,949,"[Action, Crime, Drama, Thriller]"
9,"[Pierce Brosnan - James Bond, Sean Bean - Alec...",Martin Campbell,710,"[Adventure, Action, Thriller]"
15,"[Robert De Niro - Sam 'Ace' Rothstein, Sharon ...",Martin Scorsese,524,"[Drama, Crime]"
...,...,...,...,...
44634,[Noomi Rapace - Monday / Tuesday / Wednesday /...,Tommy Wirkola,406990,[Animation]
44642,"[Charlize Theron - Lorraine Broughton, James M...",David Leitch,341013,[Drama]
44688,"[Fionn Whitehead - Tommy, Tom Glynn-Carney - P...",Hans Zimmer,374720,[]
44852,"[Mark Wahlberg - Cade Yeager, Josh Duhamel - C...",Akiva Goldsman,335988,"[Drama, Thriller]"


In [46]:
# Create metadata content

# This function creates a text/string for each movie, this string is going to be vectorized.
def format_metadata(row):
    casts = ", ".join(eval(str(row['casts'])))
    genres = ", ".join(eval(str(row['genres'])))
    return f"{row['director']}, {casts}\n{genres}"

metadata_content = metadata_df.apply(format_metadata, axis=1)

metadata_content.head(10)

0     John Lasseter, Tom Hanks - Woody (voice), Tim ...
1     Larry J. Franco, Robin Williams - Alan Parrish...
5     Michael Mann, Al Pacino - Lt. Vincent Hanna, R...
9     Martin Campbell, Pierce Brosnan - James Bond, ...
15    Martin Scorsese, Robert De Niro - Sam 'Ace' Ro...
17    Combustible Edison, Tim Roth - Ted the Bellhop...
18    Steve Oedekerk, Jim Carrey - Ace Ventura, Ian ...
31    Terry Gilliam, Bruce Willis - James Cole, Made...
33    Andrew Lesnie, Christine Cavanaugh - Babe the ...
38    Amy Heckerling, Alicia Silverstone - Cher Horo...
dtype: object

In [47]:
from sklearn.feature_extraction.text import TfidfVectorizer

tf2 = TfidfVectorizer(analyzer='word', stop_words='english')

matrix2 = tf2.fit_transform(metadata_content).toarray()


def metadata_recommender(query, df, n_results=10):
    df = df.copy()

    # Calculate the similarity (cosine)
    vec = tf2.transform([query]).toarray().ravel()
    
    # Append & sort to dataframe
    df['metadata_cos_similarity'] = cos_similarity(matrix2, vec)
    df.sort_values(by='metadata_cos_similarity', ascending=False, ignore_index=True, inplace=True)

    df = df.head(n_results)
    # Get movies titles
    df2 = get_movies_by_ids(reduced_df, df['movie_id'])[['title', 'vote_average', 'vote_count']]
    
    
    #df = df.assign(title=titles.tolist())
    df = df.assign(title=df2['title'].tolist(), vote_average=df2['vote_average'].tolist(), vote_count=df2['vote_count'].tolist())
    
    # reorder columns
    df = df.reindex(columns=['movie_id', 'title', 'director', 'genres', 'vote_average', 'vote_count', 'metadata_cos_similarity'])
    
    return df


In [49]:
metadata_recommender('Arnold Schwarzenegger', metadata_df, n_results=15)

Unnamed: 0,movie_id,title,director,genres,vote_average,vote_count,metadata_cos_similarity
0,36955,True Lies,Mark Goldblatt,"[Action, Thriller]",6.8,1138,0.44173
1,280,Last Action Hero,Dody Dorn,"[Action, Thriller, Science Fiction]",6.1,725,0.356553
2,9593,Terminator 2: Judgment Day,John McTiernan,"[Adventure, Fantasy, Action, Comedy, Family]",7.7,4274,0.348099
3,218,The Terminator,Gale Anne Hurd,"[Action, Thriller, Science Fiction]",7.4,4208,0.347082
4,296,Jingle All the Way,Jonathan Mostow,"[Action, Thriller, Science Fiction]",5.5,583,0.345141
5,8452,Conan the Barbarian,Trevor Rabin,[Science Fiction],6.6,663,0.337804
6,861,Total Recall,Paul Verhoeven,"[Action, Adventure, Science Fiction]",7.1,1745,0.335016
7,107846,End of Days,David Lazan,[Horror],5.5,488,0.328579
8,9279,The Running Man,David Newman,"[Family, Comedy]",6.4,713,0.320551
9,87101,The 6th Day,Ronna Kress,"[Comedy, Drama]",5.7,603,0.31881


<br>

As you can see in the table above, we performed a search about `Arnold Schwarzenegger` and we got a very relevant movie results, all the movies mentioned in the results (table above) are `Arnold Schwarzenegger` movies, and that is great an proofs that the Metadata-Based recommendation system is working as intended.

<br>
<br>
<hr>
<br>

# User-Based Recommender System

This type of recommendation system is based on user's preferences and history to delever much relevant suggesstions / recommendations.

To achieve this, for each user we need to perform the followings:
- Get user's movie preferences.
- Exclude unliked user movies.
- Vectorizing user movies (preferences).
- Getting recommendations for that specific user based on his interest.

In [51]:
# Loading users data.

ratings_df = pd.read_csv("data/ratings_small.csv")

ratings_df

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
...,...,...,...,...
99999,671,6268,2.5,1065579370
100000,671,6269,4.0,1065149201
100001,671,6365,4.0,1070940363
100002,671,6385,2.5,1070979663


In [56]:

# Get user's movie preferences by user ID. 
def get_user_ratings(id):
    return ratings_df[ratings_df['userId'] == id].sort_values('rating', ascending=False)


# Vectorizing users movies preferences, return a matrix.
def get_vectorized_user_movies(id, min_rating=5):
    user_df = get_user_ratings(id)
    user_df = user_df[user_df['rating'] >= min_rating]
    user_movies_df = get_movies_by_ids(reduced_df, user_df['movieId'])
    
    user_descriptions_df = user_movies_df.apply(lambda s: f"{s['title']} {s['tagline']} {s['production_companies']} {s['overview']}", axis=1)

    return tf.transform(user_descriptions_df).toarray()

In [111]:
# Getting movies that the user (id=28) has watched. 
user_id = 28

user_df = get_user_ratings(user_id)
user_df['rating'] = user_df['rating'] * 2
user_movies_df = get_movies_by_ids(reduced_df, user_df['movieId'])

user_movies_df.head(15)

Unnamed: 0,genres,id,overview,production_companies,tagline,title,vote_average,vote_count
97,"[Animation, Comedy, Drama, Family, Fantasy]",2300,In a desperate attempt to win a basketball mat...,"[Warner Bros. Family Entertainment, Northern L...",Get ready to jam.,Space Jam,6.5,1335
299,[Drama],1366,"When world heavyweight boxing champion, Apollo...",[United Artists],His whole life was a million-to-one shot.,Rocky,7.5,1843
351,"[Action, Comedy, Crime, Thriller]",2109,When Hong Kong Inspector Lee is summoned to Lo...,[New Line Cinema],The Fastest Hands in the East Meet the Biggest...,Rush Hour,6.8,1254
417,"[Mystery, Drama]",345,"After Dr. Bill Hartford's wife, Alice, admits ...","[Hobby Films, Pole Star, Stanley Kubrick Produ...",Cruise. Kidman. Kubrick.,Eyes Wide Shut,7.1,1266
466,"[Comedy, Family, Science Fiction]",926,The stars of a 1970s sci-fi show - now scrapin...,"[DreamWorks SKG, Gran Via Productions]",A comedy of Galactic Proportions.,Galaxy Quest,6.9,722
786,"[Fantasy, Horror, Action]",924,A group of surviving people take refuge in a s...,"[New Amsterdam Entertainment, Strike Entertain...","When the undead rise, civilization will fall.",Dawn of the Dead,6.8,1039
898,"[Family, Animation]",953,Zoo animals leave the comforts of man-made hab...,"[DreamWorks SKG, Pacific Data Images (PDI), Dr...",Someone's got a zoo loose.,Madagascar,6.6,3322
980,"[Animation, Adventure, Comedy, Family]",920,"Lightning McQueen, a hotshot rookie race car d...","[Walt Disney Pictures, Pixar Animation Studios]",Ahhh... it's got that new movie smell.,Cars,6.6,3991
984,"[Adventure, Fantasy, Action]",58,Captain Jack Sparrow works his way out of a bl...,"[Walt Disney Pictures, Jerry Bruckheimer Films...",Jack is back!,Pirates of the Caribbean: Dead Man's Chest,7.0,5380
1035,"[Thriller, Action, Fantasy, Horror]",1250,"In order to save his dying father, young stunt...","[Columbia Pictures Corporation, Relativity Med...",Hell Is About To Be Unleashed,Ghost Rider,5.2,1754


As you may noticed in the table above, the user with id=28 has watched the following movies (his favorate movies):
- Rocky
- Rush Hour
- Cars
- Ghost Rider
- Pirates of the Caribbean: Dead Man's Chest

#### <b>Note:</b> remember that this user has watched those movies (will need them later for comparison).

# OK, The Main Question is, How are We going to get Recommendations for that specific User?

First, I would like to explain the concept, then see the coding.<br>
- let's suppose that each movie could be represented with just `x` and `y` in a 2D space.
- in the `figure-1`, after ploting user's favorite movies, we can draw a rectangle around those movies (using just minimum/maximum of `x` & `y`) 

<img src="imgs/figure_1.png" >

- This rectangle is representing the user's movie interest, in other words, movies that fall inside this rectengle are a great match to the user's interest.
- <b>So, any movie inside the rectengle could be used as a recommendation.</b>

<br>
<br>
Now, let vectorize his favorite movies (rating >= 5).

In [112]:
vectorized_user_df = get_vectorized_user_movies(user_id) # default: min_rating=5

vectorized_user_df.shape

(8, 16266)

In [113]:
matrix.shape

(2274, 16266)

#### To organize this, we could create a user class for it.

In [114]:

class User:
    def __init__(self, user_id, db_matrix, min_rating=5):
        self._db_matrix = db_matrix
        self._min_rating = min_rating
        
        self._user_df = get_user_ratings(user_id)
        # chanching the rating to be (0 to 10).
        self._user_df['rating'] = self._user_df['rating'] * 2
        self._rated_movies_df = get_movies_by_ids(reduced_df, self._user_df['movieId'])
        
    def rated_movies(self):
        return self._rated_movies_df
        
    def vectorized_rated_movies(self):
        # Getting just movies that are rated higher than min_rating. 
        user_df = self._user_df[self._user_df['rating'] >= self._min_rating]
        user_movies_df = get_movies_by_ids(reduced_df, user_df['movieId'])
        
        # Generate a user description to be vectorized. 
        user_descriptions_df = user_movies_df.apply(lambda s: f"{s['title']} {s['tagline']} {s['production_companies']} {s['overview']}", axis=1)
        
        # Vectorizing the user description.
        return tf.transform(user_descriptions_df).toarray()
    
    def recommendations(self, vec=None):
        '''
        Return movies recommendations based on user interest.
        '''
        if vec is None:
            vec = np.mean(self.vectorized_rated_movies(), axis=0)

        cos_sim = cos_similarity(self._db_matrix, vec)
        user_recommended_df = reduced_df.copy()
        user_recommended_df['user_cos_similarity'] = cos_sim
        
        # Exclude already watched movies from recommendations
        user_recommended_df = user_recommended_df[~user_recommended_df['id'].isin(self._user_df['movieId'])]
        
        # Sorting values based on cos_similarity.
        user_recommended_df.sort_values('user_cos_similarity', ascending=False, inplace=True)

        return user_recommended_df
        
    def variant_interest_vector(self):
        '''
        Return a random vector between top movie & good movie vectors, in other words, the same user interest.
        '''
        vec = self.vectorized_rated_movies()
        min_vec = vec.min(axis=0)
        max_vec = vec.max(axis=0)
        
        #user_space = max_vec - min_vec
        #print("user_space min:", user_space.min())
        #print("user_space max:", user_space.max())
        
        # Create a random vector with the same user interest.
        return np.array([np.random.uniform(min_v, max_v) for min_v, max_v in np.c_[min_vec, max_vec]]) 
    
    def variant_recommendations(self):
        '''
        Return movies recommendations randomly based on user interest.
        '''
        vec = self.variant_interest_vector()
        return self.recommendations(vec=vec)

In [115]:
user = User(28, matrix)

user.recommendations().head(10)

Unnamed: 0,genres,id,overview,production_companies,tagline,title,vote_average,vote_count,user_cos_similarity
1058,"[Adventure, Fantasy, Action]",285,"Captain Barbossa, long believed to be dead, ha...","[Walt Disney Pictures, Jerry Bruckheimer Films...","At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4627,0.152797
1482,"[Animation, Family, Adventure, Comedy]",49013,Star race car Lightning McQueen and his pal Ma...,"[Walt Disney Pictures, Pixar Animation Studios]",Ka-ciao!,Cars 2,5.8,2088,0.15177
2104,[Drama],312221,The former World Heavyweight Champion Rocky Ba...,"[New Line Cinema, Warner Bros., Metro-Goldwyn-...",Your legacy is more than a name,Creed,7.3,1963,0.142251
1466,"[Adventure, Action, Fantasy]",1865,Captain Jack Sparrow crosses paths with a woma...,"[Walt Disney Pictures, Jerry Bruckheimer Films...",Live Forever Or Die Trying.,Pirates of the Caribbean: On Stranger Tides,6.4,5068,0.13303
2263,"[Family, Comedy, Animation, Adventure]",260514,Blindsided by a new generation of blazing-fast...,"[Walt Disney Pictures, Pixar Animation Studios]","From this moment, everything will change",Cars 3,6.6,718,0.127239
1562,"[Action, Fantasy, Thriller]",71676,When the devil resurfaces with aims to take ov...,"[Columbia Pictures, Imagenation Abu Dhabi FZ, ...",He Rides Again.,Ghost Rider: Spirit of Vengeance,4.7,1163,0.126549
1076,"[Action, Comedy, Crime, Thriller]",5174,After an attempted assassination on Ambassador...,[New Line Cinema],The Rush Is On!,Rush Hour 3,6.1,801,0.125648
605,"[Action, Comedy, Crime, Thriller]",5175,It's vacation time for Carter as he finds hims...,[New Line Cinema],Get ready for a second Rush!,Rush Hour 2,6.4,1078,0.116921
792,[Horror],923,During an ever-growing epidemic of zombies tha...,"[Laurel Group, Dawn Associates]","When there's no more room in hell, the dead wi...",Dawn of the Dead,7.4,597,0.116142
377,[Drama],1375,A lifetime of taking shots has ended Rocky's c...,[United Artists],Go for it!,Rocky V,5.3,688,0.113532


<br>

As you may noticed in the table above, the user with id=28 who has watched the following movies:
- Rocky
- Rush Hour
- Cars
- Ghost Rider
- Pirates of the Caribbean: Dead Man's Chest

The User-Based Recommender System has suggested to him the following movies:
- Cars 2
- Rocky V
- Cars 3
- Rush Hour 2
- Rush Hour 3
- Pirates of the Caribbean: At World's End
- Pirates of the Caribbean: On Stranger Tides
- Ghost Rider: Spirit of Vengeance

### So, the user has got suggestions based on his interest, this is really great 😃.

In [116]:
# Randomized / Varianted recommendations
user.variant_recommendations().head(10)

Unnamed: 0,genres,id,overview,production_companies,tagline,title,vote_average,vote_count,user_cos_similarity
1466,"[Adventure, Action, Fantasy]",1865,Captain Jack Sparrow crosses paths with a woma...,"[Walt Disney Pictures, Jerry Bruckheimer Films...",Live Forever Or Die Trying.,Pirates of the Caribbean: On Stranger Tides,6.4,5068,0.134078
1482,"[Animation, Family, Adventure, Comedy]",49013,Star race car Lightning McQueen and his pal Ma...,"[Walt Disney Pictures, Pixar Animation Studios]",Ka-ciao!,Cars 2,5.8,2088,0.1304
377,[Drama],1375,A lifetime of taking shots has ended Rocky's c...,[United Artists],Go for it!,Rocky V,5.3,688,0.123848
727,"[Adventure, Fantasy, Action]",22,"Jack Sparrow, a freewheeling 17th-century pira...","[Walt Disney Pictures, Jerry Bruckheimer Films]",Prepare to be blown out of the water.,Pirates of the Caribbean: The Curse of the Bla...,7.5,7191,0.121699
1058,"[Adventure, Fantasy, Action]",285,"Captain Barbossa, long believed to be dead, ha...","[Walt Disney Pictures, Jerry Bruckheimer Films...","At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4627,0.112754
376,[Drama],1374,Rocky must come out of retirement to battle a ...,[United Artists],He's facing the ultimate challenge. And fighti...,Rocky IV,6.6,984,0.110781
1562,"[Action, Fantasy, Thriller]",71676,When the devil resurfaces with aims to take ov...,"[Columbia Pictures, Imagenation Abu Dhabi FZ, ...",He Rides Again.,Ghost Rider: Spirit of Vengeance,4.7,1163,0.109249
1025,[Drama],1246,When he loses a highly publicized virtual boxi...,"[Columbia Pictures, Revolution Studios, Rogue ...",It ain't over 'til it's over.,Rocky Balboa,6.5,858,0.106766
792,[Horror],923,During an ever-growing epidemic of zombies tha...,"[Laurel Group, Dawn Associates]","When there's no more room in hell, the dead wi...",Dawn of the Dead,7.4,597,0.098265
374,[Drama],1367,After Rocky goes the distance with champ Apoll...,[United Artists],Once he fought for a dream. Now he's fighting ...,Rocky II,6.9,948,0.098217


### The same great User-Based suggestions, but random 🥳

<br>
<hr>
<br>

## ⚠️ So far So good, but our User-Based Recommender System could have ONE downside 😱.

Let me explain, for example, we have a user that is watching/likes `Action` movies as well as `Romance` movies.<br>
    
in the `figure-2`, at top-right, we have user's `Action` movies, while at the buttom-left we have user's `Romance` movies.<br>

<img src="imgs/figure_2.png" >


<b>The issue is when we draw the rectangle of the user's interest, the rectangle is much bigger and contains more movies that the user may not like (out of his interest).</b>

## So, How are We going to Solve this issue?

### The perfect approach is to use more accurate movie spaces, to provide more relevant suggestions (see `figure-3`):

<img src="imgs/figure_3.png" >

### So, Any movies that fall in one of each these circles are great suggestions for that specific user.

<br>

To achieve this we can use `NearestNeighbors` algorithm from sklearn.

In [131]:
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import scale


#user = User(28, matrix, min_rating=5)

X = matrix
X_test = user.vectorized_rated_movies()

n, p = X.shape

nn = NearestNeighbors(n_neighbors=3)
nbrs = nn.fit(X)
distances, indices = nbrs.radius_neighbors(X_test, radius=2, return_distance=True, sort_results=True) 


print("X_test: ", X_test.shape)
print("distance:", distances.shape)
print("indices:", indices.shape)

# Filter the important movies indices.
distances = np.array([d[1:5] for d in distances]) # starts from 1 to remove the same movie index.
indices = np.array([d[1:5] for d in indices])     # starts from 1 to remove the same movie index.

X_test:  (10, 16266)
distance: (10,)
indices: (10,)


In [132]:
#display(distances)
display(indices)

array([[ 987, 1366,   22, 1799],
       [ 377,  376, 2104,  374],
       [1076,  605, 1566,  445],
       [1493,  178,  126,  589],
       [1479, 1445,  583,  422],
       [ 792,  128,  624, 2115],
       [1482, 2263, 2038, 1743],
       [1058,  727, 1466, 1959],
       [1562, 1029, 1523,  678],
       [  12, 1646,  861, 1989]])

In [140]:
import random

# number of movies to choose from, lower is more accurate.
suggestion_range = 30
n_results = 15

l = np.unique(indices.T.ravel())
l = l[:suggestion_range]  # select just most relevant results.
movies_indices = np.array(random.sample(list(l), n_results))
reduced_df.loc[movies_indices]


Unnamed: 0,genres,id,overview,production_companies,tagline,title,vote_average,vote_count
1482,"[Animation, Family, Adventure, Comedy]",49013,Star race car Lightning McQueen and his pal Ma...,"[Walt Disney Pictures, Pixar Animation Studios]",Ka-ciao!,Cars 2,5.8,2088
376,[Drama],1374,Rocky must come out of retirement to battle a ...,[United Artists],He's facing the ultimate challenge. And fighti...,Rocky IV,6.6,984
1466,"[Adventure, Action, Fantasy]",1865,Captain Jack Sparrow crosses paths with a woma...,"[Walt Disney Pictures, Jerry Bruckheimer Films...",Live Forever Or Die Trying.,Pirates of the Caribbean: On Stranger Tides,6.4,5068
422,"[Action, Drama, History]",967,Spartacus is a 1960 American historical drama ...,"[Bryna Productions, Universal International Pi...",More titanic than any story ever told!,Spartacus,7.3,472
987,"[Comedy, Drama, Family, Music, TV Movie]",10947,"Troy (Zac Efron), the popular captain of the b...",[Disney Channel],This School Rocks Like No Other!,High School Musical,6.1,1048
128,[Horror],10331,A group of people try to survive an attack of ...,"[Laurel Group, Off Color Films, Image Ten, Mar...","If it doesn't scare you, you're already dead!",Night of the Living Dead,7.5,591
445,"[Adventure, Action, Thriller]",253,James Bond must investigate a mysterious murde...,"[United Artists, Eon Productions, Danjaq]",Roger Moore is James Bond.,Live and Let Die,6.4,540
377,[Drama],1375,A lifetime of taking shots has ended Rocky's c...,[United Artists],Go for it!,Rocky V,5.3,688
792,[Horror],923,During an ever-growing epidemic of zombies tha...,"[Laurel Group, Dawn Associates]","When there's no more room in hell, the dead wi...",Dawn of the Dead,7.4,597
12,"[Adventure, Animation, Drama, Family]",10530,History comes gloriously to life in Disney's e...,"[Walt Disney Pictures, Walt Disney Feature Ani...",An American legend comes to life.,Pocahontas,6.7,1509


Based on the table/results above, we've got a very relevent suggestions like:
- Cars 2
- Rocky IV
- Rocky V
- Rocky II
- Pirates of the Caribbean: On Stranger Tides
- Pirates of the Caribbean: The Curse of the Bla...
- Dawn of the Dead

### Congratulations 😁🎉 , Now our User-Based Recommender System provides more accurate suggestions for any user preferences.
<br>
<br>
<br>

# 📌 Conclusion

So far, We've discussed various of Recommendation Systems:
- Classic / simple recommender system.
- Content-based recommender system.
- Metadata-based recommender system.
- User-based recommender system.

Each one of them has Pros and Cons/limitations and its proper use cases.

**In the end, I would like to mention that with AI/Machine Learning, we can create marvelous and much more powerful things 🦾, that will end up elevating the quality of life for humans, and that's the GOAL.**