# Movie Recommendation


The task is to predict the recommended movies from a given movie. We will be implementing this problem using content based recommendation method.

We will build a Recommendation System, recommending the items based on the following,
### Content Based Recommendation:
It identifies the similarity based approach calculating cosine similarity between users history posts and potential recommended posts.


In [2]:
#import your libraries

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

#your info here
__author__ = "Kanchan Pandhare"
__email__ = "kanchan.pandhare08@gmail.com"


ds = pd.read_csv("movies.csv")
#print(ds.head())

#Function to get movie Id from Movie title
def get_index_from_title(title):
    if title in ds.title.values:
        return ds[ds.title == title]["movieId"].values[0]
    else: 
        return -1

#Combine the movie features for better predictions based on movie content
def combine_features(row):
    try:
        return row['title']
    except:
        print (row)

#Function to get movie title from movie Id
def item(id):
    return ds.loc[ds['movieId'] == id]['title'].tolist()[0].split(' - ')[0]


In [3]:
ds['combined_features'] = ds.apply(combine_features,axis = 1)

#Calculate the similarity matrix using cosine similarity
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 2), min_df=0, stop_words='english')
count_matrix = tf.fit_transform(ds['combined_features'])
#cv = CountVectorizer()
#count_matrix = cv.fit_transform(ds['combined_features'])
cosine_similarities = cosine_similarity(count_matrix)

results = {}

#Sort the similarity in descending order from the similarity matrix such that the most similar movies comes at the first position

for idx, row in ds.iterrows():
    similar_indices = cosine_similarities[idx].argsort()[:-100:-1]
    similar_items = [(cosine_similarities[idx][i], ds['movieId'][i]) for i in similar_indices]
    results[row['movieId']] = similar_items[1:]
    
# Just reads the results out of the dictionary.
#Recommendation function to display the top n recommendations for a given movie
def recommend(item_id, num):
    item_id = get_index_from_title(item_id)
    if item_id == -1:
        print("Movie does not exist")
    else:
        print("Recommending " + str(num) + " movies similar to " + item(item_id) + "...")
        print("-------")
        recs = results[item_id][:num]
        for rec in recs:
            print("Recommended: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")



In [4]:
recommend(item_id='Iron Man (2008)', num=10)
print(cosine_similarities.shape)

Recommending 10 movies similar to Iron Man (2008)...
-------
Recommended: Iron Man 2 (2010) (score:0.5909000712263456)
Recommended: Iron Man 3 (2013) (score:0.5843071168270364)
Recommended: Iron Man (1931) (score:0.5408978245408712)
Recommended: Ip Man (2008) (score:0.47511539496593247)
Recommended: Yes Man (2008) (score:0.46776347087231107)
Recommended: Invincible Iron Man, The (2007) (score:0.457104712191622)
Recommended: Iron Man & Hulk: Heroes United (2013) (score:0.34019130794111696)
Recommended: W. (2008) (score:0.29530773594974863)
Recommended: Never Back Down (2008) (score:0.29530773594974863)
Recommended: Iron Will (1994) (score:0.2540879355037116)
(9742, 9742)


## Content-based Filtering Advantages & Disadvantages
### Advantages

The model doesn't need any data about other users, since the recommendations are specific to this user. This makes it easier to scale to a large number of users.

The model can capture the specific interests of a user, and can recommend niche items that very few other users are interested in.
### Disadvantages

Since the feature representation of the items are hand-engineered to some extent, this technique requires a lot of domain knowledge. Therefore, the model can only be as good as the hand-engineered features.

The model can only make recommendations based on existing interests of the user. In other words, the model has limited ability to expand on the users' existing interests.