# Making Recommendations

**Author : Soutik Chakraborty**

This is code from Chapter 2: Making Recommendations. 

The code for the excercise is in another jupter notebook named <b>Making Recommendations - Excercise</b>

## Critics Data

Below is a nested dictionary, which has the scores of each critics out of 5 for a variety of movies. You can add your ratings in the ditionary if needed.

In [1]:
# A dictionary of movie critics and their ratings of a small
# set of movies
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
 'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
 'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
 'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
 'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
 'The Night Listener': 4.5, 'Superman Returns': 4.0,
 'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
 'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

## Euclidean Distance

This function takes the critics dict and 2 critics names as inputs and returns the euclidean distance between the two critics. The distance is calculated by measuring the distance between the common movie ratings between the critics

In [2]:
from math import sqrt
from pprint import pprint

# Returns a distance-based similarity score for person1 and person2
def sim_distance(prefs, person1, person2):
    # Get the list of shared_items
    si={}
    for item in prefs[person1]:
        if item in prefs[person2]:
            si[item] = 1

    # if they have no ratings in common, return 0
    if len(si) == 0: 
        return 0

    # Add up the squares of all the differences
    sum_of_squares = 0
    for item in si:
        sum_of_squares = sum_of_squares + pow(prefs[person1][item] - prefs[person2][item], 2)

    return 1/(1+sqrt(sum_of_squares))

In [3]:
print sim_distance(critics, 'Lisa Rose','Gene Seymour')

0.294298055086


## Pearson's Corelation

This function takes the critics dict and 2 critics names as inputs and returns the Pearson's Corelation between the two critics. The corelation is calculated by measuring the distance between the common movie ratings between the critics

In [4]:
def sim_pearson(prefs, person1, person2):
    # Get the similar items
    si = {}
    for item in prefs[person1]:
        if item in prefs[person2]:
            si[item] = 1
    
    # Find the number of similar items
    n = len(si)
    
    # Return 0 if no common items
    if n == 0:
        return 0
    
    # Add up all the preferences
    sum1 = sum([prefs[person1][item] for item in si])
    sum2 = sum([prefs[person2][item] for item in si])
    
    # Sum up the squares
    sum1sq = sum([pow(prefs[person1][item], 2) for item in si])
    sum2sq = sum([pow(prefs[person2][item], 2) for item in si])
    
    #Sum up the products
    pSum = sum([prefs[person1][items] * prefs[person2][items] for items in si])
    
    # Calculate Pearson's coef
    num = pSum - (sum1 * sum2)/n
    den = sqrt((sum1sq - pow(sum1, 2)/n) * (sum2sq - pow(sum2, 2)/n))
    if den == 0:
        return 0
    else:
        return num/den

In [5]:
print sim_pearson(critics, 'Lisa Rose', 'Gene Seymour')

0.396059017191


## Similarity

Finding out which critic matches closely to the other specified critic. We will be using the functions previously mentioned to calculate the similarty. If there are more functions specified they can be passed into the below function to calculate the similarity

In [6]:
# Returns the best matches for person from the prefs dictionary.
# Number of results and similarity function are optional params.
def topMatches(prefs, person, n=5, similarity=sim_pearson):
    scores=[(similarity(prefs, person, other), other) for other in prefs if other != person]

    # Sort the list so the highest scores appear at the top
    scores.sort()
    scores.reverse()
    return scores[0:n]

In [7]:
pprint(topMatches(prefs = critics, person = 'Toby', n=3))

[(0.9912407071619299, 'Lisa Rose'),
 (0.9244734516419049, 'Mick LaSalle'),
 (0.8934051474415647, 'Claudia Puig')]


## Get Recommendations

In the below section we will write a function to get recommendation for a person based on how close their ratings for a movie is with other critics


In [23]:
def get_recommendations(perfs, person, similarity = sim_pearson):
    totals = {}
    simSums = {}
    
    for other in perfs:
        # Get similarity between the "person" and others only if the person == other
        if other != person:
            sim = similarity(perfs, person, other)
        else:
            continue
        
        # Ignore similarity scors of <= 0
        if sim >= 0:
            for item in perfs[other]:
                # Score only movies that "person" hasn't watched
                if item not in perfs[person] or perfs[person][item] == 0:
                    # Similarity * Scores
                    totals.setdefault(item, 0)
                    totals[item] += perfs[other][item] * sim
                    # Get sum of simScores
                    simSums.setdefault(item, 0)
                    simSums[item] += sim
                    
    # Create the normalized list
    rankings = [(total/simSums[item], item) for item, total in totals.items()]

    # Return the sorted list
    rankings.sort()
    rankings.reverse()
    return rankings

In [24]:
get_recommendations(critics, "Toby")

[(3.3477895267131013, 'The Night Listener'),
 (2.8325499182641614, 'Lady in the Water'),
 (2.5309807037655645, 'Just My Luck')]