# Movie Recommendation


The task is to predict the recommended movies from a given movie. We will be implementing this problem using content based recommendation method.

We will build a Recommendation System, recommending the items based on the following,
### Content Based Recommendation:
It identifies the similarity based approach calculating cosine similarity between users history posts and potential recommended posts.


In [1]:
#import your libraries

import pandas as pd
from IPython.display import display
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

#your info here
__author__ = "Kanchan Pandhare"
__email__ = "kanchan.pandhare08@gmail.com"


ds = pd.read_csv("movies.csv")
display(ds.head())

#Function to get movie Id from Movie title
def get_index_from_title(title):
    if title in ds.title.values:
        return ds[ds.title == title]["movieId"].values[0]
    else: 
        return -1

#Combine the movie features for better predictions based on movie content
def combine_features(row):
    try:
        return row['title']
    except:
        print (row)

#Function to get movie title from movie Id
def item(id):
    return ds.loc[ds['movieId'] == id]['title'].tolist()[0].split(' - ')[0]


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [None]:
ds['combined_features'] = ds.apply(combine_features,axis = 1)

#Create a matrix from the available genres and titles
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 2), min_df=0, stop_words='english')
count_matrix = tf.fit_transform(ds['combined_features'])
#Calculate the similarity matrix using cosine similarity
cosine_similarities = cosine_similarity(count_matrix)
display(pd.DataFrame(cosine_similarities))
results = {}

#Sort the similarity in descending order from the similarity matrix such that the most similar movies comes at the first position

for idx, row in ds.iterrows():
    similar_indices = cosine_similarities[idx].argsort()[:-100:-1]
    similar_items = [(cosine_similarities[idx][i], ds['movieId'][i]) for i in similar_indices]
    results[row['movieId']] = similar_items[1:]
    
# Just reads the results out of the dictionary.
#Recommendation function to display the top n recommendations for a given movie
def recommend(item_id, num):
    item_id = get_index_from_title(item_id)
    if item_id == -1:
        print("Movie does not exist")
    else:
        print("Recommending " + str(num) + " movies similar to " + item(item_id) + "...")
        print("-------")
        recs = results[item_id][:num]
        for rec in recs:
            print("Recommended: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")



Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,9732,9733,9734,9735,9736,9737,9738,9739,9740,9741
0,1.000000,0.091698,0.059942,0.066840,0.062670,0.096481,0.091698,0.068927,0.071890,0.089969,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0
1,0.091698,1.000000,0.072030,0.080319,0.075308,0.115936,0.110190,0.082827,0.086387,0.108111,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0
2,0.059942,0.072030,1.000000,0.052503,0.049228,0.075786,0.072030,0.054143,0.056470,0.070671,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0
3,0.066840,0.080319,0.052503,1.000000,0.054893,0.084507,0.080319,0.060373,0.062968,0.078803,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0
4,0.062670,0.075308,0.049228,0.054893,1.000000,0.079235,0.075308,0.056607,0.059040,0.073887,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9737,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,1.000000,0.050790,0.076061,0.0,0.0
9738,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.050790,1.000000,0.092424,0.0,0.0
9739,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.076061,0.092424,1.000000,0.0,0.0
9740,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,1.0,0.0


In [None]:
recommend(item_id='Iron Man (2008)', num=10)


## Content-based Filtering Advantages & Disadvantages
### Advantages

The model doesn't need any data about other users, since the recommendations are specific to this user. This makes it easier to scale to a large number of users.

The model can capture the specific interests of a user, and can recommend niche items that very few other users are interested in.
### Disadvantages

Since the feature representation of the items are hand-engineered to some extent, this technique requires a lot of domain knowledge. Therefore, the model can only be as good as the hand-engineered features.

The model can only make recommendations based on existing interests of the user. In other words, the model has limited ability to expand on the users' existing interests.