# Approximation formula

In the collaborative approach, once you have identified similar objects, you need to use them to predict preferences for items. There is a formula that allows you to do this. This formula was difficult for me to understand, so this page focuses on understanding this formula.

In [1]:
import numpy as np

from sklearn.datasets import make_blobs
from IPython.display import HTML

## Task generation

The following cell generates an example that I'll use to show the sense of some transformations.

In [2]:
r_width = 10
r_height = 20

R, groups = make_blobs(
    n_samples=r_height,
    n_features=r_width,
    centers=3,
    random_state=10
)
R = np.round((R-R.min())*10/(R.max()-R.min())).astype(int)
# add bias for each object
bias = np.random.randint(-2,3, [R.shape[0], 1])
R = R + bias

# sometimes bias can lead to ratings
R = np.where(R<0, 0, R)
R = np.where(R>10, 10, R)
R

array([[ 9, 10,  3,  6, 10,  7,  9,  5, 10,  9],
       [10,  2,  8, 10,  7,  5,  4, 10,  4,  5],
       [ 8, 10,  2,  5, 10,  7,  9,  5,  9,  8],
       [ 6,  1,  4,  7,  4,  5,  7,  6,  6,  6],
       [ 4,  7,  0,  4,  6,  5,  6,  2,  7,  5],
       [ 8,  4,  6,  8,  7,  7,  9,  7,  8,  8],
       [ 8, 10,  1,  6,  8,  6,  7,  3,  9,  8],
       [10, 10,  3,  8, 10,  8,  9,  5, 10,  9],
       [ 6,  3,  5,  8,  5,  6,  8,  6,  8,  7],
       [ 7, 10,  1,  5,  8,  6,  8,  4, 10,  7],
       [ 9,  1,  6,  7,  5,  2,  3,  7,  2,  1],
       [ 6,  0,  4,  6,  3,  0,  0,  6,  1,  0],
       [ 8, 10,  2,  6,  9,  7, 10,  4, 10,  8],
       [ 8,  2,  7,  8,  7,  4,  2,  9,  3,  1],
       [ 7,  0,  6,  6,  5,  1,  1,  7,  1,  0],
       [ 6,  1,  5,  7,  5,  5,  6,  5,  6,  5],
       [10,  3,  9, 10,  7,  5,  4, 10,  4,  3],
       [ 9,  2,  8,  8,  7,  4,  3,  8,  3,  3],
       [ 6,  2,  4,  7,  5,  5,  6,  5,  7,  7],
       [ 7,  3,  6,  8,  5,  5,  8,  7,  9,  7]])

So let $k'$ - is object to which we need to recoomend something.

So for the example under consideration we'll use $k' = 5$.

In [3]:
consideration_object = 5

## Collaboration

The collaboration for $k'$ object is a set of objects that we think are similar to it. We measure similarity by the Pearson correlation coefficient.

So we can define similarity as the set of objects that have $c_{ik'} > c'$ or more formally $U_{k'}=\left\{i\in U | c_{k'i} > c' \right\}$. So now we have a hyper-parameter of the algorithm $c'$ that controls how many objects are used to approximate preferences for the object.

The next cell shows a table with the correlation coefficients of the objects in the example with $k'=5$.  

In [32]:
# it's indices of objects excluding
# the object for which we are generating 
# predictions
other_indices = np.concatenate([
    np.arange(0,consideration_object), 
    np.arange(consideration_object+1, R.shape[0])
])
other_R = R[other_indices, :]
correlations = np.corrcoef(
    other_R, R[consideration_object, :]
)[0,1:]

# HTML code for input
# table that will be displayed
# on the left sides
header = (
    "<tr>"
        "<th rowspan=\"2\">object</th>"
        f"<th colspan=\"{R.shape[1]}\" style='text-align:center'>Ranks of the items</th>"
        "<th rowspan=\"2\">corr. coef</th>"
    "</tr>"
    "<tr>"+
        "".join([f"<th>{str(i)}</th>" for i in range(R.shape[1])])+
    "</tr>"
)
content = "".join([
    (
        "<tr>" + 
        f"<td>{obj}</td>" + 
        "".join([f"<td>{val}</td>" for val in R[obj,:]]) +
        f"<td>{str(correlations[i])}</td>" + 
        "</tr>"
    )
    for i, obj in enumerate(other_indices)
])
input_table = "<table>" + header + content + "</table>"
del header, content


collatoratoin_indices = other_indices[correlations > 0.8]
collaboration = R[collatoratoin_indices,:]
# HTML code for table that represents
# collaboration that is on the right side

For our case we choose $c'=0.8$ - so all objects that have a correlation greater than $0.8$ will correspond to the collaboration of the object $k'$. So let's display collaboaration for our case.

In [43]:
content = "".join([
    "<tr>" +
        f"<td>{object_ind}</td>"+
        ''.join(['<td>'+str(v)+'</td>' for v in obj])+
    "</tr>"
     for obj, object_ind in zip(collaboration, collatoratoin_indices)
])
HTML("<table>" + content + "</table>")

0,1,2,3,4,5,6,7,8,9,10
1,10,2,8,10,7,5,4,10,4,5
3,6,1,4,7,4,5,7,6,6,6
4,4,7,0,4,6,5,6,2,7,5
6,8,10,1,6,8,6,7,3,9,8
8,6,3,5,8,5,6,8,6,8,7
11,6,0,4,6,3,0,0,6,1,0


In [14]:
%%HTML
<html>
<head>
	<style>
		.table-container {
			display: flex;
			justify-content: space-around;
		}
		.table-left {
			float: left;
			margin-right: 50px;
		}
		.table-right {
			float: right;
		}
	</style>
</head>
<body>
	<div class="table-container">
		<div class="table-left">
			<table border="1">
				<tr>
					<th>Header 1</th>
					<th>Header 2</th>
				</tr>
				<tr>
					<td>Row 1, Cell 1</td>
					<td>Row 1, Cell 2</td>
				</tr>
				<tr>
					<td>Row 2, Cell 1</td>
					<td>Row 2, Cell 2</td>
				</tr>
			</table>
		</div>
		<div class="table-right">
			<table border="1">
				<tr>
					<th>Header 1</th>
					<th>Header 2</th>
				</tr>
				<tr>
					<td>Row 1, Cell 1</td>
					<td>Row 1, Cell 2</td>
				</tr>
				<tr>
					<td>Row 2, Cell 1</td>
					<td>Row 2, Cell 2</td>
				</tr>
			</table>
		</div>
	</div>
</body>
</html>

Header 1,Header 2
"Row 1, Cell 1","Row 1, Cell 2"
"Row 2, Cell 1","Row 2, Cell 2"

Header 1,Header 2
"Row 1, Cell 1","Row 1, Cell 2"
"Row 2, Cell 1","Row 2, Cell 2"
