## Collaborative Filtering

A collaborative filtering algorithm usually works by searching a large group of people and finding a smaller set with tastes similar to ours. It looks at other things they like and combine them to create a ranked list of suggestions.

In [8]:
'''
We shall use a self-prepared nested dictionary as a small exemplary dataset.
'''
from recommendations import critics
# exemplary run
# # check_1---------------------------------------------------
# critics['Lisa Rose']['Lady in the Water']

# # check_2---------------------------------------------------
# critics['Toby']['Snakes on a Plane']=4.5
# critics['Toby']

## Finding Similar Users
This can be achieved by comparing each person with every other person and calculating a *similarity score*.
There are a few ways to achieve this score. Two of them which I shall be using here are:
1. Euclidean Distance
2. Pearson Correlation

### Euclidean Distance Score

In [19]:
from math import sqrt

# Returns a distance-based similarity score for person1 and person2
def sim_distance(prefs, person1, person2):
    # Get the list of shared_items
    si = {}
    for item in prefs[person1]:
        if item in prefs[person2]:
            si[item]=1
            
    # if they have no ratings in common, return 0
    if len(si)==0: return 0
    
    #Add up the squares of all the differences
    sum_of_squares = sum([pow(prefs[person1][item]-prefs[person2][item], 2)\
                          for item in prefs[person1] if item in prefs[person2]])
    
    return 1/(1+sum_of_squares)

In [24]:
round(sim_distance(recommendations.critics, 'Lisa Rose', 'Gene Seymour'), 6)

0.148148

In [35]:
'''
Lets us find out the similarity index each person against the rest.
'''
all_critics = []
for p1 in recommendations.critics.keys():
    all_critics.append(p1)

done_matches = []
for p1 in all_critics:
    for p2 in all_critics:
        compare = [p1, p2]
        compare.sort()
        if p2 is not p1 and compare not in done_matches:
            print("Similarity score for {} and {}: {}".format(p1, p2,
                                                             round(sim_distance(recommendations.critics, p1, p2), 6)
                                                             ))
            done_matches.append(compare)
            

Similarity score for Jack Matthews and Mick LaSalle: 0.137931
Similarity score for Jack Matthews and Claudia Puig: 0.181818
Similarity score for Jack Matthews and Lisa Rose: 0.210526
Similarity score for Jack Matthews and Toby: 0.117647
Similarity score for Jack Matthews and Gene Seymour: 0.8
Similarity score for Jack Matthews and Michael Phillips: 0.181818
Similarity score for Mick LaSalle and Claudia Puig: 0.173913
Similarity score for Mick LaSalle and Lisa Rose: 0.333333
Similarity score for Mick LaSalle and Toby: 0.307692
Similarity score for Mick LaSalle and Gene Seymour: 0.129032
Similarity score for Mick LaSalle and Michael Phillips: 0.285714
Similarity score for Claudia Puig and Lisa Rose: 0.285714
Similarity score for Claudia Puig and Toby: 0.235294
Similarity score for Claudia Puig and Gene Seymour: 0.133333
Similarity score for Claudia Puig and Michael Phillips: 0.571429
Similarity score for Lisa Rose and Toby: 0.222222
Similarity score for Lisa Rose and Gene Seymour: 0.1481