<a href="https://colab.research.google.com/github/raulbenitez/postgrau_IML_exploratory/blob/master/RECOMENDADORES/Recomendadores_distancias.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Distancias entre usuarios: 

Let's use the same example than the chapter recommendation systems in the subject materials (-1 indicates no valoration):

In [2]:
user_1 = [-1, 3, 1, -1, 4, 3, 5, -1]
user_2 = [4, 1, 3, -1, 5, -1, -1, 2]
user_3 = [2, 1, -1, 5, -1, -1, -1, 1]
user_4 = [3, -1, 2, -1, -1, 5, -1, 4]

## Pearson distance

Notice that we need to compute the distance only with the valorations common between users (we have to ignore the values when either of the two users has a -1)

In [3]:
user_1_common_with_2 = [3, 1, 4]
user_2_common_with_1 = [1, 3, 5]


We will use the pearsonr function from scipy.stats that computes the pearson distance between vectors: <br>
From: https://www.geeksforgeeks.org/python-pearson-correlation-test-between-two-variables/

In [4]:
from scipy.stats import pearsonr 

corr, _ = pearsonr(user_1_common_with_2, user_2_common_with_1) 
print('Pearsons correlation: %.2f' % corr) 



Pearsons correlation: 0.33


The pearsons correlation measure is symmetric:


In [5]:
from scipy.stats import pearsonr 

corr, _ = pearsonr(user_2_common_with_1, user_1_common_with_2) 
print('Pearsons correlation: %.2f' % corr) 


Pearsons correlation: 0.33


User 1 and User 3 only have 1 value in common:

In [6]:
user_1_common_with_3 = [3]
user_3_common_with_1 = [1]

In [7]:
corr, _ = pearsonr(user_1_common_with_3, user_3_common_with_1) 
print('Pearsons correlation: %.3f' % corr) 

ValueError: ignored

As they only have 1 value in common we can't compute the pearsons similarity! As we only have one common valoration, the fit line cannot be defined.

In [8]:
user_1_common_with_4 = [1, 3]
user_4_common_with_1 = [2, 5]

In [9]:
corr, _ = pearsonr(user_1_common_with_4, user_4_common_with_1) 
print('Pearsons correlation: %.3f' % corr) 

Pearsons correlation: 1.000


Users 1 and 4 have a correlation of 1.0; this is because, regardless of scale factors, their tendency to rate films is the same.

On the other hand, if we take users 3 and 4:

In [10]:
user_3_common_with_4 = [2, 1]
user_4_common_with_3 = [3, 4]

In [11]:
corr, _ = pearsonr(user_3_common_with_4, user_4_common_with_3) 
print('Pearsons correlation: %.3f' % corr) 

Pearsons correlation: -1.000


Users 3 and 4 make opposite assessments, so their coefficient is -1.0. However, more data should be available for more reliable measurements.

### Prediction

Once we have the pearson similarities we can use the most similar users to predict elements

We can use the formula from the book:

![image.png](attachment:image.png)

For example, if we want to compute the prediction for the user 1 in the last item from the list (item 8):

First we compute the mean of the user 1 valorations (only for the items that has valoration):

In [12]:
# user_1 = [-1, 3, 1, -1, 4, 3, 5, -1]
user_1_valorations = [3, 1, 4, 3, 5]
user_1_mean = sum(user_1_valorations) / len(user_1_valorations) 
print(user_1_mean)


3.2


If we take the two users more similar to user 1: <br>
user 2 (pearson correlation = 0.33) <br>
user 4 (pearson correlation = 1.00)

In [13]:
sim_1_to_2 = 0.33
sim_1_to_4 = 1.0

We need also to have the mean from users 2 and 4:

In [14]:
#user_2 = [4, 1, 3, -1, 5, -1, -1, 2]
user_2_valorations = [4, 1, 3, 5, 2]
user_2_mean = sum(user_2_valorations) / len(user_2_valorations) 

#user_4 = [3, -1, 2, -1, -1, 5, -1, 4]
user_4_valorations = [3, 2, 5, 4]
user_4_mean = sum(user_4_valorations) / len(user_4_valorations) 

And the valorations for the desired item (item 8)

In [15]:
user_2_val_8 = 2
user_4_val_8 = 4

In [16]:
pred_user1_item8 = user_1_mean + (sim_1_to_2*(user_2_val_8-user_2_mean)+sim_1_to_4*(user_4_val_8-user_4_mean))/(sim_1_to_2+sim_1_to_4)
print('Prediction item 8 for user 1: %.2f' % pred_user1_item8) 

Prediction item 8 for user 1: 3.33


## Euclidean similarity


In [18]:
import numpy as np
from scipy.spatial.distance import euclidean

dist = euclidean(user_1_common_with_2, user_2_common_with_1)
print('distancia {}'.format(dist))

simil = 1/(1+dist)
print('similitud {}'.format(simil))





distancia 3.0
similitud 0.25
