# Moive Recommendation

## This project is to practice data structures, methods and functions of the Pandas and Numpy

The goal of the project is to create movie recommendations for a person, based on the person’s and critics’ ratings of the movies. 

The following files will be required to run the program:
1. `IMDB.csv`: A table with movie information
2. `ratings.csv`: A table with ratings of all movies listed in the movies data 
    by 100 critics. The column names in the critics data correspond to the name of each critic.
3. `pX.csv`: A table with one person’s ratings of a subset of the movies in the movies data set, 
    where X is a number. The column name in the file indicates the name of the person.
    
    
All personal ratings are integer numbers in the 1..10 range.

** How does this program function:** <br>
1. The user will be asked to specify the `subfolder` in the current working directory, where the files are stored, along with the `names of the critics`, `person` and `movies data files`.
2. Determine and output the names of three critics, whose ratings of the movies are closest to the person’s ratings based on the `Euclidean distance` metric.
3. Use the `ratings by the critics` identified in item 2 to determine which movies to recommend. Display information about recommended movies as described below.<br>
a. The movie recommendations must consist of the top-rated movies in each movie genre, based on the average ratings of movies by the three critics identified in step 2 above.<br>
b. Movie genre is determined by the Genre1 column of the movies data.<br>
c. Recommendations must be listed in alphabetical order by genre.<br>
d. Missing data (e.g. running time) should not be included.

In [1]:
import os.path
import pandas as pd
import numpy as np

def main():
    '''
    The main function that is called to start the program.  
    '''
    filesNames = input('Please enter the name of the folder with files, the name of movies file,\
    \nthe name of critics file, the name of personal ratings file, separated by spaces:\n')
    print() #print a new line
    filesNamesLst = filesNames.split(' ') 
    currentWorkDir = os.getcwd()
    subfolderName = filesNamesLst[0]
    #create a DataFrame for movies with selected columns
    movieFileName = filesNamesLst[1] 
    movieFilePath = os.path.join(currentWorkDir, subfolderName, movieFileName)
    movieDataFrame = pd.read_csv(movieFilePath, \
                                 encoding = 'unicode_escape').loc[:, ['Title', 'Genre1', 'Year', 'Runtime']] 
    #create a DataFrame for critics ratings
    criticsFileName = filesNamesLst[2] 
    criticsFilePath = os.path.join(currentWorkDir, subfolderName, criticsFileName)
    criticsDataFrame = pd.read_csv(criticsFilePath) 
    #create a DataFrame for personal ratings
    personalFileName = filesNamesLst[3] 
    personalFilePath = os.path.join(currentWorkDir, subfolderName, personalFileName)
    personalDataFrame = pd.read_csv(personalFilePath) 
    #call functions to run the program
    topThreeCriticsLst = findClosestCritics(criticsDataFrame, personalDataFrame) 
    print(topThreeCriticsLst, '\n') 
    movieRecommendation = recommendMovies(criticsDataFrame, personalDataFrame, \
                                          topThreeCriticsLst, movieDataFrame)
    personName = personalDataFrame.columns[1]
    printRecommendations(movieRecommendation, personName)

In [2]:
def findClosestCritics(criticsDataFrame, personalDataFrame):
    '''
    This function is to return a list of three critics, whose ratings of movies are most similar to 
    those provided in the personal ratings data based on Euclidean distance. The lower the distance, 
    the closer, thus more similar, the critic's ratings are to the person's. 
     
    Parameters:
    criticsDataFrame - provides data about critics ratings
    personalDataFrame - provides data about personal ratings 
    '''
    
    # merge critics file and personal file by the same movie title
    criticsPersonRating = pd.merge(criticsDataFrame, personalDataFrame) 
    # a new DataFrame with only critics' ratings after merging without Title column
    criticRating = criticsPersonRating.iloc[:,1:-1] 
    # indexed by the movie titles
    criticRating.index = criticsPersonRating['Title'] 
    # person's rating value without the person's name
    personRatingValue = criticsPersonRating[personalDataFrame.columns[1]] 
    # to keep the index the same as the critics' rating DataFrame    
    personRatingValue.index = criticsPersonRating['Title'] 
    ratingDifference = criticRating.sub(personRatingValue, axis = 0)
    eucliDistance = np.sqrt((ratingDifference**2).apply(np.sum))
    eucliDistance.sort_values(inplace = True) # sort the result from smallest to largest
    # select only the top 3 critics with smaller Euclidean distance 
    topThreeCritics = eucliDistance.iloc[:3] 
    topThreeCriticsLst = list(topThreeCritics.index.values) # generate a list of the critics' names
    
    return topThreeCriticsLst

In [3]:
def recommendMovies(criticsDataFrame, personalDataFrame, topThreeCriticsLst, movieDataFrame):  
    '''
    This function is to compute the top-rated unwatched movies in each genre category 
    based on the average of the three critics' ratings
     
    Parameters:
    criticsDataFrame - provides data about critics' ratings
    personalDataFrame - provides data about personal ratings 
    topThreeCriticsLst - a list of three critics, whose ratings of movies are most similar to 
    those provided in the personal ratings data
    movieDataFrame - provides data about movies info
    '''
    # prepare the DataFrames for critics rating, person's rating and movie indexed by movie title.
    criticsDataFrame.index = criticsDataFrame['Title']
    criticsDataFrame = criticsDataFrame.iloc[:,1:]
    personalDataFrame.index = personalDataFrame['Title']
    personalDataFrame = personalDataFrame.iloc[:,1:]
    movieDataFrame.index = movieDataFrame['Title']
    movieDataFrame = movieDataFrame.iloc[:,1:]
    # prepare the unwatched movie DataFrame with average ratings 
    # from the three critics whose ratings are similar to the person's
    unwatchedCriticRating = criticsDataFrame.loc[criticsDataFrame.index.difference(personalDataFrame.index)]
    topThreeCriticsRating = unwatchedCriticRating[topThreeCriticsLst]
    averageCriticsRating = round(topThreeCriticsRating.mean(axis = 1), 2)
    movieDataFrame['Average Rating'] = averageCriticsRating 
    movieDataFrame.sort_values('Genre1', inplace = True)
    movieRecommendation = movieDataFrame[movieDataFrame.groupby(by = 'Genre1')['Average Rating'].\
                                         transform(max) == movieDataFrame['Average Rating']]
    
    return movieRecommendation

In [4]:
def printRecommendations(movieRecommendation, personName):
    '''
    This function is to printout all the recommended movies in alphabetical order by the genre.
    
    Parameters:
    movieRecommendation - provides data about critics' ratings
    personName - the person's name for whom the recommendation is made for
    '''
    print('Recommendations for ', personName, ':', sep = '')
    # get the longest title for formatting later
    moiveTitle = list(movieRecommendation.index.values)
    longestTitle = len(max(moiveTitle, key = len))
    # get each factor (i.e. title, genre etc.) and then print with designed format 
    for row in range(len(movieRecommendation)):
        title = movieRecommendation.index[row]
        gener1 = movieRecommendation.loc[title]['Genre1']
        year = movieRecommendation.loc[title]['Year']
        runTime = movieRecommendation.loc[title]['Runtime']
        rating = movieRecommendation.loc[title]['Average Rating']
        if pd.isnull(runTime) != True:
            print('"', title, '" ', (longestTitle - len(title))*' ', \
                  '(', gener1, '), ', 'rating: ', rating, ', ', year, ', runs ', runTime, sep = '')
        else:
            print('"', title, '" ', (longestTitle - len(title))*' ', \
                  '(', gener1, '), ', 'rating: ', rating, ', ', year, sep = '')

In [5]:
main()

Please enter the name of the folder with files, the name of movies file,    
the name of critics file, the name of personal ratings file, separated by spaces:
data1 IMDB.csv ratings.csv p8.csv

['Quartermaine', 'Arvon', 'Merrison'] 

Recommendations for Catulpa:
"Star Wars: The Force Awakens"    (Action), rating: 9.67, 2015, runs 136 min
"The Grand Budapest Hotel"        (Adventure), rating: 9.0, 2014, runs 99 min
"The Martian"                     (Adventure), rating: 9.0, 2015, runs 144 min
"Kubo and the Two Strings"        (Animation), rating: 9.67, 2016
"How to Train Your Dragon"        (Animation), rating: 9.67, 2010
"Hacksaw Ridge"                   (Biography), rating: 9.33, 2016, runs 139 min
"What We Do in the Shadows"       (Comedy), rating: 9.0, 2014
"Prisoners"                       (Crime), rating: 8.33, 2013, runs 153 min
"Spotlight"                       (Crime), rating: 8.33, 2015, runs 128 min
"The Perks of Being a Wallflower" (Drama), rating: 9.67, 2012, runs 102 min
"