# Approximation formula

In the collaborative approach, once you have identified similar objects, you need to use them to predict preferences for items. There is a formula that allows you to do this. This formula was difficult for me to understand, so this page focuses on understanding this formula.

In [1]:
import numpy as np

from sklearn.datasets import make_blobs
from IPython.display import HTML

## Task generation

The following cell generates an example that I'll use to show the sense of some transformations.

In [2]:
r_width = 10
r_height = 20

R, groups = make_blobs(
    n_samples=r_height,
    n_features=r_width,
    centers=3,
    random_state=10
)
R = np.round((R-R.min())*10/(R.max()-R.min())).astype(int)
# add bias for each object
bias = np.random.randint(-2,3, [R.shape[0], 1])
R = R + bias

# sometimes bias can lead to ratings
R = np.where(R<0, 0, R)
R = np.where(R>10, 10, R)
R

array([[ 5,  8,  0,  2,  6,  3,  5,  1,  7,  5],
       [ 7,  0,  5,  7,  4,  2,  1,  7,  1,  2],
       [ 8, 10,  2,  5, 10,  7,  9,  5,  9,  8],
       [ 6,  1,  4,  7,  4,  5,  7,  6,  6,  6],
       [ 7, 10,  2,  7,  9,  8,  9,  5, 10,  8],
       [ 4,  0,  2,  4,  3,  3,  5,  3,  4,  4],
       [ 9, 10,  2,  7,  9,  7,  8,  4, 10,  9],
       [ 6,  8,  0,  4,  6,  4,  5,  1,  7,  5],
       [ 5,  2,  4,  7,  4,  5,  7,  5,  7,  6],
       [ 8, 10,  2,  6,  9,  7,  9,  5, 10,  8],
       [ 7,  0,  4,  5,  3,  0,  1,  5,  0,  0],
       [ 7,  0,  5,  7,  4,  1,  1,  7,  2,  0],
       [ 9, 10,  3,  7, 10,  8, 10,  5, 10,  9],
       [ 9,  3,  8,  9,  8,  5,  3, 10,  4,  2],
       [ 7,  0,  6,  6,  5,  1,  1,  7,  1,  0],
       [ 8,  3,  7,  9,  7,  7,  8,  7,  8,  7],
       [ 9,  2,  8,  9,  6,  4,  3,  9,  3,  2],
       [ 7,  0,  6,  6,  5,  2,  1,  6,  1,  1],
       [ 8,  4,  6,  9,  7,  7,  8,  7,  9,  9],
       [ 4,  0,  3,  5,  2,  2,  5,  4,  6,  4]])

So let $k'$ - is object to which we need to recoomend something.

So for the example under consideration we'll use $k' = 5$.

In [3]:
consideration_object = 5

## Collaboration

The collaboration for $k'$ object is a set of objects that we think are similar to it. We measure similarity by the Pearson correlation coefficient.

So we can define similarity as the set of objects that have $c_{ik'} > c'$ or more formally $U_{k'}=\left\{i\in U | c_{k'i} > c' \right\}$. So now we have a hyper-parameter of the algorithm $c'$ that controls how many objects are used to approximate preferences for the object.

The next cell shows a table with the correlation coefficients of the objects in the example with $k'=5$ on the left, and it's collaboration in the case $c'=0.8$ on the right.

In [29]:
# it's indices of objects excluding
# the object for which we are generating 
# predictions
other_indices = np.concatenate([
    np.arange(0,consideration_object), 
    np.arange(consideration_object+1, R.shape[0])
])
other_R = R[other_indices, :]
correlations = np.corrcoef(
    other_R, R[consideration_object, :]
)[0,1:]

# HTML code for input
# table that will be displayed
# on the left sides
header = (
    "<tr>"
        "<th rowspan=\"2\">object</th>"
        f"<th colspan=\"{R.shape[1]}\" style='text-align:center'>Ranks of the items</th>"
        "<th rowspan=\"2\">corr. coef</th>"
    "</tr>"
    "<tr>"+
        "".join([f"<th>{str(i)}</th>" for i in range(R.shape[1])])+
    "</tr>"
)
content = "".join([
    (
        "<tr>" + 
        f"<td>{obj}</td>" + 
        "".join([f"<td>{val}</td>" for val in R[obj,:]]) +
        f"<td>{str(correlations[i])}</td>" + 
        "</tr>"
    )
    for i, obj in enumerate(other_indices)
])
input_table = "<table>" + header + content + "</table>"
del header, content


# HTML code for table that represents
# collaboration that is on the right side
collatoratoin_indices = other_indices[correlations > 0.8]
collaboration = R[collatoratoin_indices,:]
header = (
    "<tr>"
        "<th rowspan=\"2\">object</th>"
        f"<th colspan=\"{R.shape[1]}\" style='text-align:center'>Ranks of the items</th>"
    "</tr>"
    "<tr>"+
        "".join([f"<th>{str(i)}</th>" for i in range(R.shape[1])])+
    "</tr>"
)
content = "".join([
    "<tr>" +
        f"<td>{object_ind}</td>"+
        ''.join(['<td>'+str(v)+'</td>' for v in obj])+
    "</tr>"
     for obj, object_ind in zip(collaboration, collatoratoin_indices)
])
collaboration_table = "<table>"+header+content+"</table>"
del header, content

HTML(
    "<div style='display: flex;justify-content: space-around;'>"+
    "<div>" + 
        "<p style='font-size:17px;text-align:center'>Input correlations</p>" + 
        input_table + 
    "</div>" +
    "<div style='font-size:100px'>→</div>"
    "<div>" + 
        "<p style='font-size:17px;text-align:center'>Collaboration</p>" + 
        collaboration_table + 
    "</div>"
    "</div>"
)

object,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,corr. coef
object,0,1,2,3,4,5,6,7,8,9,corr. coef
0,5,8,0,2,6,3,5,1,7,5,-0.6655264799818821
1,7,0,5,7,4,2,1,7,1,2,0.9433748638354392
2,8,10,2,5,10,7,9,5,9,8,-0.3372300605285859
3,6,1,4,7,4,5,7,6,6,6,0.8974592464346675
4,7,10,2,7,9,8,9,5,10,8,0.9347546382694412
6,9,10,2,7,9,7,8,4,10,9,0.9636480072147564
7,6,8,0,4,6,4,5,1,7,5,-0.1416274390752352
8,5,2,4,7,4,5,7,5,7,6,0.953468626856854
9,8,10,2,6,9,7,9,5,10,8,-0.5157266969762434
10,7,0,4,5,3,0,1,5,0,0,-0.5870995880756342

object,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items,Ranks of the items
object,0,1,2,3,4,5,6,7,8,9
1,7,0,5,7,4,2,1,7,1,2
3,6,1,4,7,4,5,7,6,6,6
4,7,10,2,7,9,8,9,5,10,8
6,9,10,2,7,9,7,8,4,10,9
8,5,2,4,7,4,5,7,5,7,6
11,7,0,5,7,4,1,1,7,2,0


So the collaboration in this case will be $U_5=\{1,3,4,6,8,11\}$. Set of indices of objects belonging to the collaboration of the 5th element.

## Formula

Now when we have collaboration we can predict expected preferences for the items for 5-th element.

Estimation of the preference of $j$-th item for user $k'$ can be computed using:

$$a_{k', j}=\overline{r}_{k'} + \frac{\sum_{j\in U_{k'}}(r_{lj}-\overline{r}_l)c_{k'l}}{\sum_{l \in U_{k'}}|c_{k'l}|}$$