# Recommender Systems Homework

* This notebook is for recommender systems homework of Applied AI. \
* Used dataset for this homework is [The Movies Dataset](https://www.kaggle.com/rounakbanik/the-movies-dataset/data)

## **Dataset Description** 

This dataset includes 45k movies with their features like the kind of the movie or the crew of movie. Also, ratings of these movies are in this dataset as User-Movie interaction table.

**Tables in Dataset:**

* movies_metadata : Features belong to movies (~45k)
* keywords : Keywords extracted from plot of the movies
* credits : Cast and crew information
* links : TMDB and IMDB IDs of all movies
* ratings : User-Movie interactions

## **Task Description**

You are supposed to build a **recommendation system** which recommends movies to the user. Input of the system is a movie and the output is recommendation list consisting similar movies to given movie.

* This task's approach for recommender systems is **Content Based** Approach.
* Similarities between movies can be found by looking at their **common** **cast**.
* Another movie features can be added to the system as you wish.


## **What will you report?**

* There is no limitation or scoring function for this task. 
* You can look at the distances between similar movies for comparison.
* Recommend to yourselves movies with using your system and evaluate yourselves fairly 😀
---




## Preperation

* Mount Drive first

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


* Import libraries

In [1]:
import pandas as pd
import numpy as np

D:\Anaconda3\envs\myEnv001\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll
D:\Anaconda3\envs\myEnv001\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll


* Read the _credits_ and _movies_metadata_ files

In [3]:
credits = pd.read_csv('credits.csv', low_memory=False)
movies = pd.read_csv('movies_metadata.csv', low_memory=False)
NUM_MOVIES = min([len(credits), len(movies)])
credits = credits[:NUM_MOVIES]
movies = movies[:NUM_MOVIES]

 - Prepare new dataset of movies with their names

In [25]:
MOVIES = movies["original_title"].apply(lambda s:s.upper()).to_numpy().tolist()
print("TOY STORY" in MOVIES)

True


## Recommendation System

In [29]:
# Function for getting the cast list of a movie
def get_cast_list(movie_name:str="toy story"):
    movie_name = movie_name.upper()
    if movie_name not in MOVIES:
        raise ValueError("The given name does not exist in the database.")
    else:
        i = MOVIES.index(movie_name)
        names = []
        for j in credits['cast'][i].split("'name': '")[1:]:
            name = j.split("'")[0]
            names.append(name)
    return names

# Function for calculating similarity between two movies.
# Similarity is the ratio of number of common actors between two movies to number of all actors in the pair
def similarity(query, candidate):
    ql = get_cast_list(query)
    cl = get_cast_list(candidate)
    diff = 0
    comm = 0
    for actor in ql:
        if actor in cl:
            comm += 1
        else:
            diff += 1
    for actor in cl:
        if actor not in ql:
            diff += 1
    return comm/(comm + diff)

## Recommendation Function

In [39]:
def recommend_movie(movie:str, num_rec:int=3):
    """Recommends movies based on cast similarity

    Args:
        movie (str): Name of the movie, not case sensitive
        num_rec (int, optional): Max number of recommendations. Defaults to 3.
    """
    sims = []
    for candidate in MOVIES:
        sims.append((candidate, similarity(movie, candidate)))
    sims_sorted = sorted(sims, key=lambda x:x[1], reverse=True)
    d = pd.DataFrame(sims_sorted[1:num_rec+1], columns=["Name", "Similarity"]) 
    return d

In [40]:
recs = recommend_movie("toy story", 10)

In [41]:
print("Recommended movies and their similarity scores to the movie you selected can be seen below:")
recs

Recommended movies and their similarity scores to the movie you selected can be seen below:


Unnamed: 0,Name,Similarity
0,TOY STORY 2,0.37037
1,TWA FLIGHT 800,0.235294
2,THE LEGEND OF MOR'DU,0.235294
3,TOY STORY 3,0.1875
4,MAGGIE SIMPSON IN THE LONGEST DAYCARE,0.178571
5,THE RED BERET,0.172414
6,CHAPPIE,0.138889
7,THE PIXAR STORY,0.086957
8,QUEST FOR CAMELOT,0.083333
9,ERNEST GOES TO SCHOOL,0.076923
