## In this Project we create a Content-Based recommendation system for some data from IMDB cite to give the best option for users to watch new movies

#### First we import required libraries:

In [39]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### reading movies and ratings datasets

In [40]:
movies_df = pd.read_csv('movies.csv')
ratings_df = pd.read_csv('ratings.csv')

### Show the size of our datasets:

In [41]:
print(movies_df.shape)
print(ratings_df.shape)

(9742, 3)
(100836, 4)


### Show the head of datasets:

In [42]:
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [43]:
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


### Modify the movies_df data set and split the year of movie:

In [44]:
movies_df['year'] = movies_df.title.str.extract('(\(\d\d\d\d\))',expand=False)
movies_df['year'] = movies_df.year.str.extract('(\d\d\d\d)',expand=False)
movies_df['title'] = movies_df.title.str.replace('(\(\d\d\d\d\))', '')
movies_df['title'] = movies_df['title'].apply(lambda x: x.strip())
movies_df.head()

  movies_df['title'] = movies_df.title.str.replace('(\(\d\d\d\d\))', '')


Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,Adventure|Animation|Children|Comedy|Fantasy,1995
1,2,Jumanji,Adventure|Children|Fantasy,1995
2,3,Grumpier Old Men,Comedy|Romance,1995
3,4,Waiting to Exhale,Comedy|Drama|Romance,1995
4,5,Father of the Bride Part II,Comedy,1995


### Modify the movies_df dataset and split the genres of movies and make a copy of dataset:

In [45]:
movies_df['genres'] = movies_df.genres.str.split('|')
moviesG_df = movies_df.copy()
moviesG_df


Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995
2,3,Grumpier Old Men,"[Comedy, Romance]",1995
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995
4,5,Father of the Bride Part II,[Comedy],1995
...,...,...,...,...
9737,193581,Black Butler: Book of the Atlantic,"[Action, Animation, Comedy, Fantasy]",2017
9738,193583,No Game No Life: Zero,"[Animation, Comedy, Fantasy]",2017
9739,193585,Flint,[Drama],2017
9740,193587,Bungo Stray Dogs: Dead Apple,"[Action, Animation]",2018


### in the copy of dataset, we create new columns based on genres of the movie. it shows category of the movie by 1 and 0s

In [46]:
for index, row in movies_df.iterrows():
    for genre in row['genres']:
        moviesG_df.at[index, genre] = 1


#Filling in the NaN values with 0 to show that a movie doesn't have that column's genre
moviesG_df = moviesG_df.fillna(0)
moviesG_df.head()

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,1.0,1.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,Grumpier Old Men,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5,Father of the Bride Part II,[Comedy],1995,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [47]:
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


### we need to clean our rating dataset by columns we don't need like time

In [48]:
ratings_df = ratings_df.drop('timestamp', 1)
ratings_df.head()

  ratings_df = ratings_df.drop('timestamp', 1)


Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


### Now we have our movie and rating dataset clean!
Now it's time to get new_user input:

In [49]:
userInput = [
            {'title':'Grumpier Old Men', 'rating':5},
            {'title':'Flint', 'rating':2.5},
            {'title':'Jumanji', 'rating':2},
             {'title':"Andrew Dice Clay: Dice Rules", 'rating':5},
            {'title':'Father of the Bride Part II', 'rating':4.5}
         ] 
inputMovies = pd.DataFrame(userInput)
inputMovies

Unnamed: 0,title,rating
0,Grumpier Old Men,5.0
1,Flint,2.5
2,Jumanji,2.0
3,Andrew Dice Clay: Dice Rules,5.0
4,Father of the Bride Part II,4.5


## To create a Recommendation system based on user content we have three steps:
1.What are the genres of the movies that the new_user rated!
to answer this question we need to build User_Movie_matrix

2.. Find the favorite genres of the User and create the Weighted_movies_rating_matrix

3.Compare the Weighted_movies_rating_matrix of the user with genreTable and return the most similarity

## First step: Create User Movie Matrix:

In [50]:
#Add inputid to inputMovies
inputId = moviesG_df[moviesG_df['title'].isin(inputMovies['title'].tolist())]

inputMovies = pd.merge(inputId, inputMovies)
inputMovies = inputMovies.drop('genres', 1).drop('year', 1)

# inputMovies



  inputMovies = inputMovies.drop('genres', 1).drop('year', 1)
  inputMovies = inputMovies.drop('genres', 1).drop('year', 1)


### Create User Genre Matrix:

In [51]:

userGenreTable = inputMovies.drop('movieId', 1).drop('title', 1).drop('rating' , 1)
userGenreTable

  userGenreTable = inputMovies.drop('movieId', 1).drop('title', 1).drop('rating' , 1)
  userGenreTable = inputMovies.drop('movieId', 1).drop('title', 1).drop('rating' , 1)
  userGenreTable = inputMovies.drop('movieId', 1).drop('title', 1).drop('rating' , 1)


Unnamed: 0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [52]:
inputMovies['rating']

0    2.0
1    5.0
2    4.5
3    2.5
4    5.0
Name: rating, dtype: float64

## Second Step: Create Weighted User Rating Matrix 
by multiplying userGenreTable to userRating

In [53]:
userProfile = userGenreTable.transpose().dot(inputMovies['rating'])
userProfile

Adventure              2.0
Animation              0.0
Children               2.0
Comedy                14.5
Fantasy                2.0
Romance                5.0
Drama                  2.5
Action                 0.0
Crime                  0.0
Thriller               0.0
Horror                 0.0
Mystery                0.0
Sci-Fi                 0.0
War                    0.0
Musical                0.0
Documentary            0.0
IMAX                   0.0
Western                0.0
Film-Noir              0.0
(no genres listed)     0.0
dtype: float64

Now, we have the weights for every of the user's preferences. This is known as the User Profile. Using this, we can recommend movies that satisfy the user's preferences.


## Third step: Comparing the User Profile to GenreTable of the whole movies!

In [54]:
### Whole Movies GenreTable:
genreTable = moviesG_df.set_index(moviesG_df['movieId'])
genreTable = genreTable.drop('movieId', 1).drop('title', 1).drop('genres', 1).drop('year', 1)
genreTable.head()

  genreTable = genreTable.drop('movieId', 1).drop('title', 1).drop('genres', 1).drop('year', 1)
  genreTable = genreTable.drop('movieId', 1).drop('title', 1).drop('genres', 1).drop('year', 1)
  genreTable = genreTable.drop('movieId', 1).drop('title', 1).drop('genres', 1).drop('year', 1)
  genreTable = genreTable.drop('movieId', 1).drop('title', 1).drop('genres', 1).drop('year', 1)


Unnamed: 0_level_0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Now Compare this to UserProfile and return the most similarities:

In [55]:
recommendationTable_df = ((genreTable*userProfile).sum(axis=1))/(userProfile.sum())
recommendationTable_df = recommendationTable_df.sort_values(ascending=False)
recommendationTable_df


movieId
45672     0.928571
1907      0.928571
108540    0.928571
4306      0.910714
56152     0.910714
            ...   
92475     0.000000
7321      0.000000
1330      0.000000
92420     0.000000
1214      0.000000
Length: 9742, dtype: float64

### Now that we have the movieIds, we can return the recommendation Table by Movie's names:

In [56]:
Recom = movies_df.loc[movies_df['movieId'].isin(recommendationTable_df.head(20).keys())]
Recom


Unnamed: 0,movieId,title,genres,year
222,258,"Kid in King Arthur's Court, A","[Adventure, Children, Comedy, Fantasy, Romance]",1995
505,587,Ghost,"[Comedy, Drama, Fantasy, Romance, Thriller]",1990
1390,1907,Mulan,"[Adventure, Animation, Children, Comedy, Drama...",1998
1530,2065,"Purple Rose of Cairo, The","[Comedy, Drama, Fantasy, Romance]",1985
2103,2797,Big,"[Comedy, Drama, Fantasy, Romance]",1988
2350,3108,"Fisher King, The","[Comedy, Drama, Fantasy, Romance]",1991
2510,3358,Defending Your Life,"[Comedy, Drama, Fantasy, Romance]",1991
3194,4306,Shrek,"[Adventure, Animation, Children, Comedy, Fanta...",2001
3249,4392,Alice,"[Comedy, Drama, Fantasy, Romance]",1990
4356,6373,Bruce Almighty,"[Comedy, Drama, Fantasy, Romance]",2003


In [57]:
# Thank you so much for your perfec Course. Cheers.