### This recommendation system is used to match movies with each other based on the similarity of their genres.

For this two methods were used: genres multi-hot encoding and cosine similarity and CountVectorizer and cosine similarity. 

The former was not successful as comparing similarity between 20 genres is extremely memory intensive.

 The latter was successful as the CountVectorizer was able to vectorize the description of the movie and the cosine similarity was able to find the similarity between the movies.

The description of the movie is vectorized using CountVectorizer and then cosine similarity is used to find the similarity between the movies. The movies are then recommended based on the similarity score.

In [13]:
import pandas as pd

In [14]:
movie = pd.read_csv('ml-25m/movies.csv')
ratings = pd.read_csv('ml-25m/ratings.csv')

In [15]:
movie_count_vect = movie.copy()
movie_count_vect = movie_count_vect[:10000]
movie_count_vect.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [16]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

cv = CountVectorizer()
count_matrix = cv.fit_transform(movie_count_vect["genres"])

cosine_similarity = cosine_similarity(count_matrix)

print(count_matrix)

  (0, 1)	1
  (0, 2)	1
  (0, 3)	1
  (0, 4)	1
  (0, 8)	1
  (1, 1)	1
  (1, 3)	1
  (1, 8)	1
  (2, 4)	1
  (2, 16)	1
  (3, 4)	1
  (3, 16)	1
  (3, 7)	1
  (4, 4)	1
  (5, 0)	1
  (5, 5)	1
  (5, 18)	1
  (6, 4)	1
  (6, 16)	1
  (7, 1)	1
  (7, 3)	1
  (8, 0)	1
  (9, 1)	1
  (9, 0)	1
  (9, 18)	1
  :	:
  (9987, 0)	1
  (9988, 18)	1
  (9988, 11)	1
  (9989, 4)	1
  (9990, 16)	1
  (9990, 7)	1
  (9990, 13)	1
  (9991, 7)	1
  (9992, 4)	1
  (9992, 16)	1
  (9992, 7)	1
  (9993, 4)	1
  (9993, 7)	1
  (9993, 13)	1
  (9994, 7)	1
  (9994, 19)	1
  (9995, 7)	1
  (9995, 19)	1
  (9996, 0)	1
  (9996, 17)	1
  (9996, 9)	1
  (9997, 16)	1
  (9997, 7)	1
  (9998, 4)	1
  (9999, 4)	1


In [17]:
def get_title_from_index(index):
    return movie_count_vect[movie_count_vect.index == index]["title"].values[0]
def get_movie_id_from_title(title):
    return movie_count_vect[movie_count_vect.title == title]["movieId"].values[0]

In [18]:
movie_name = "Toy Story (1995)"
movie_id = get_movie_id_from_title(movie_name)
similar_movies = list( enumerate(cosine_similarity[movie_id]))

In [19]:
sorted_similar_movies = sorted(similar_movies,key=lambda x:x[1],reverse=True)[1:]

In [20]:
print(f"Top 10 similar movies to {movie_name} are:\n")
for i, element in enumerate(sorted_similar_movies[:10], 1):
    print(get_title_from_index(element[0]))

Top 10 similar movies to Toy Story (1995) are:

Indian in the Cupboard, The (1995)
NeverEnding Story III, The (1994)
Escape to Witch Mountain (1975)
Darby O'Gill and the Little People (1959)
Return to Oz (1985)
NeverEnding Story, The (1984)
NeverEnding Story II: The Next Chapter, The (1990)
Santa Claus: The Movie (1985)
Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Magic in the Water (1995)


NB: The algorithm can be improved by using the description, the cast, the actors, the director, the year of release, the rating, the number of ratings, etc. to find the similarity between the movies. 
This will make the recommendation system more accurate and will provide better recommendations.