# Approximation formula

In the collaborative approach, once you have identified similar objects, you need to use them to predict preferences for items. There is a formula that allows you to do this. This formula was difficult for me to understand, so this page focuses on understanding this formula.

In [4]:
import numpy as np

from sklearn.datasets import make_blobs

## Task generation

The following cell generates an example that I'll use to show the sense of some transformations.

In [8]:
r_width = 10
r_height = 20

R, c = make_blobs(
    n_samples=r_height,
    n_features=r_width,
    centers=3,
    random_state=10
)
R = np.round((R-R.min())*10/(R.max()-R.min())).astype(int)
# add bias for each object
bias = np.random.randint(-2,3, [R.shape[0], 1])
R = R + bias

# sometimes bias can lead to ratings
R = np.where(R<0, 0, R)
R = np.where(R>10, 10, R)
R

array([[ 5,  8,  0,  2,  6,  3,  5,  1,  7,  5],
       [ 6,  0,  4,  6,  3,  1,  0,  6,  0,  1],
       [ 7, 10,  1,  4,  9,  6,  8,  4,  8,  7],
       [ 4,  0,  2,  5,  2,  3,  5,  4,  4,  4],
       [ 7, 10,  2,  7,  9,  8,  9,  5, 10,  8],
       [ 4,  0,  2,  4,  3,  3,  5,  3,  4,  4],
       [ 7,  9,  0,  5,  7,  5,  6,  2,  8,  7],
       [10, 10,  3,  8, 10,  8,  9,  5, 10,  9],
       [ 4,  1,  3,  6,  3,  4,  6,  4,  6,  5],
       [ 7, 10,  1,  5,  8,  6,  8,  4, 10,  7],
       [10,  3,  8,  9,  7,  4,  5,  9,  4,  3],
       [ 9,  2,  7,  9,  6,  3,  3,  9,  4,  2],
       [ 8, 10,  2,  6,  9,  7, 10,  4, 10,  8],
       [ 8,  2,  7,  8,  7,  4,  2,  9,  3,  1],
       [ 6,  0,  5,  5,  4,  0,  0,  6,  0,  0],
       [ 6,  1,  5,  7,  5,  5,  6,  5,  6,  5],
       [ 9,  2,  8,  9,  6,  4,  3,  9,  3,  2],
       [ 9,  2,  8,  8,  7,  4,  3,  8,  3,  3],
       [ 8,  4,  6,  9,  7,  7,  8,  7,  9,  9],
       [ 6,  2,  5,  7,  4,  4,  7,  6,  8,  6]])

So let $k'$ - is object to which we need to recoomend something.

So for the example under consideration we'll use $k' = 5$.

In [29]:
consideration_object = 5

## Collaboration

The collaboration for $k'$ object is a set of objects that we think are similar to it. We measure similarity by the Pearson correlation coefficient.

So we can define similarity as the set of objects that have $c_{ik'} > c'$ or more formally $U_{k'}=\left\{i\in U | c_{k'i} > c' \right\}$. So now we have a hyper-parameter of the algorithm $c'$ that controls how many objects are used to approximate preferences for the object.

The next cell shows a table with the correlation coefficients of the objects in the example with $k'=5$.  

In [94]:
objects_values = np.concatenate([
    np.arange(0,consideration_object), 
    np.arange(consideration_object+1, R.shape[0])
])

header = (
    "<tr>" + 
        "<th>object</th>" + 
        "".join([f"<th>{str(i)}</th>" for i in range(R.shape[1])]) +
        "<th>corr. coef</th>"
    "</tr>"
)

content = "".join([
    (
        "<tr>" + 
        f"<td>{obj}</td>" + 
        "".join([f"<td>{val}</td>" for val in R[obj,:]]) +
        f"<td>{str(c[i])}</td>" + 
        "</tr>"
    )
    for i, obj in enumerate(objects_values)
])
HTML("<table>" + header + content + "</table>")

object,0,1,2,3,4,5,6,7,8,9,corr. coef
0,5,8,0,2,6,3,5,1,7,5,-0.1032600252843457
1,6,0,4,6,3,1,0,6,0,1,0.1088931012960941
2,7,10,1,4,9,6,8,4,8,7,-0.0526540924351022
3,4,0,2,5,2,3,5,4,4,4,0.9329650014620616
4,7,10,2,7,9,8,9,5,10,8,0.0968649524639667
6,7,9,0,5,7,5,6,2,8,7,0.0230571487955358
7,10,10,3,8,10,8,9,5,10,9,0.1858261556206646
8,4,1,3,6,3,4,6,4,6,5,0.9126423311754246
9,7,10,1,5,8,6,8,4,10,7,0.0230571487955358
10,10,3,8,9,7,4,5,9,4,3,0.1942386638560546
