# Collaborative Filtering

Given vectors of item ratings by two people, how similar are the two people?

$$ W_{ij} = \frac {\sum_{k}(R_{ik} - \overline{R_i}) (R_{jk} - \overline{R_j}) } {\sqrt{\sum_{k}(R_{ik} - \overline{R_i})^2} \sqrt{\sum_{k}(R_{jk} - \overline{R_j})^2}} $$


$ R_{ik}$ is the rating user i gave to item k


$ \overline{R_i} $ is the average of all the ratings of user i

In [14]:
import math

def pearsons(r, i, j, num_items):
    """ Computes pearsons similarity coefficient. Coefficient is a quantity between -1 and 1
        where 1 means identical and -1 completely opposite

        parameters
        r         -- vector of ratings, where r[i][k] represents the rating user i 
                     gave to item k
        i         -- user i
        j         -- user j
        num_items -- number of items
    """

    avg_r_i = sum(r[i])/len(r[i])
    avg_r_j = sum(r[j])/len(r[j])

    i_ratings = [(r[i][k] - avg_r_i) for k in range(num_items)]
    j_ratings = [(r[j][k] - avg_r_j) for k in range(num_items)]

    sim = sum(a * b for a,b in zip(i_ratings, j_ratings))

    i_sqrd = (a*a for a in i_ratings)
    j_sqrd = (b*b for b in j_ratings)

    var = math.sqrt(sum(i_sqrd)) * math.sqrt(sum(j_sqrd))

    return sim / var

In [5]:
bob_ratings = [1,2,3]
alice_rattings = [2,5,3]
pearsons([bob_ratings, alice_rattings], 0, 1, 3)

0.3273268353539885

In [6]:
pearsons([bob_ratings, bob_ratings], 0, 1, 3)

0.9999999999999998

In [10]:
anti_alice = [8, 5, 7]
pearsons([alice_rattings, anti_alice], 0, 1, 3)

-0.9999999999999998

In [7]:
anti_bob = [10, 8, 7]
pearsons([bob_ratings, anti_bob], 0, 1, 3)

-0.9819805060619655

### Predicting ratings

Given *item k* ratings from top-N users, similar to *user i*, can we predict how *user i* would rate *item k*?

$$ R_{ik} = \overline{R_i} + \alpha \sum_{X_j\in{N_i}}{W_{ij}(R_{jk} - \overline{R_j})} $$ 
   

$$\alpha = \frac{1}{\sum{{ |W_{ij}| }}}$$