# Personalized Techniques
- personalized: based on the user's previous rankings
- collaborative filtering: using other user's rankings

## Main CF Techniques
1. memory based = find "similar" users/items, use them for prediction
   1.1 nearest neighbors (user, item)

### 1.1.1 User-based Nearest Neighbor CF
1. item _i_ not rated by Alice
2. find "similar" users to Alice who have rated _i_
   - How do we measure similarity?
     - Pearson correlation
     - Spearman correlation
     - Cosine similarity
     - Adjusted cosine similarity
     - => no fundamental reason for choice of one metric, mostly based on practical experiences, may depend on application
3. compute average to predict rating by Alice
   - How many neighbors should we consider?
     - neighbors N = k most similar users
   - How do we generate a prediction from the neighbors' ratings?
     - average of neighbors' ratings
     - improved versions
4. recommend items with the highest predicted rating

In [72]:
import pandas as pd

sample_df = pd.DataFrame(
    {
        'Item1': [5, 3, 4, 3, 1],
        'Item2': [3, 1, 3, 3, 5],
        'Item3': [4, 2, 4, 1, 5],
        'Item4': [4, 3, 3, 5, 2],
        'Item5': [None, 3, 5, 4, 1],
    }, index=['Alice', 'User1', 'User2', 'User3', 'User4'])
sample_df

Unnamed: 0,Item1,Item2,Item3,Item4,Item5
Alice,5,3,4,4,
User1,3,1,2,3,3.0
User2,4,3,4,3,5.0
User3,3,3,1,5,4.0
User4,1,5,5,2,1.0


In [73]:
alice = sample_df.loc['Alice']
alice

Item1    5.0
Item2    3.0
Item3    4.0
Item4    4.0
Item5    NaN
Name: Alice, dtype: float64

In [74]:
other_users = sample_df.drop('Alice')
other_users

Unnamed: 0,Item1,Item2,Item3,Item4,Item5
User1,3,1,2,3,3.0
User2,4,3,4,3,5.0
User3,3,3,1,5,4.0
User4,1,5,5,2,1.0


In [75]:
def pearson(individual, group):
    return group.corrwith(individual, axis=1, method='pearson')

pearson(alice, other_users)

User1    0.852803
User2    0.707107
User3    0.000000
User4   -0.792118
dtype: float64

In [76]:
def spearman(individual, group):
    return group.corrwith(individual, axis=1, method='spearman')

spearman(alice, other_users)

User1    0.833333
User2    0.707107
User3    0.000000
User4   -0.833333
dtype: float64

In [77]:
from numpy import dot
from numpy.linalg import norm

def cosine(individual, group):
    indices_of_nans = individual.isna()
    individual = individual.loc[~indices_of_nans]
    group = group.loc[:, ~indices_of_nans]
    
    return group.apply(lambda x: x.dot(individual)/(norm(x)*norm(individual)), axis=1)

cosine(alice, other_users)

User1    0.975321
User2    0.992243
User3    0.890724
User4    0.796687
dtype: float64

In [78]:
def adjusted_cosine(individual, group):
    indices_of_nans = individual.isna()
    individual = individual.loc[~indices_of_nans]
    group = group.loc[:, ~indices_of_nans]
    
    return group.apply(lambda x: dot(x - x.mean(), individual - individual.mean())/(norm(x - x.mean())*norm(individual - individual.mean())), axis=1)

adjusted_cosine(alice, other_users)

User1    0.852803
User2    0.707107
User3    0.000000
User4   -0.792118
dtype: float64

In [79]:
def get_k_similarities(individual, group, k, similarity_measure):
    return similarity_measure(individual, group).nlargest(k)

get_k_similarities(alice, other_users, 2, pearson)

User1    0.852803
User2    0.707107
dtype: float64

In [80]:
def get_k_neighbors(individual, group, k, similarity_measure):
    similarities = get_k_similarities(individual, group, k, similarity_measure)
    return group.loc[similarities.index]

get_k_neighbors(alice, other_users, 2, pearson)

Unnamed: 0,Item1,Item2,Item3,Item4,Item5
User1,3,1,2,3,3.0
User2,4,3,4,3,5.0


In [81]:
def get_prediction(individual, group, k, similarity_measure):
    neighbors = get_k_neighbors(individual, group, k, similarity_measure)
    return neighbors.mean()

get_prediction(alice, other_users, 2, pearson).loc['Item5']

4.0

##### Improvements:
- user bias: consider difference from average rating (rbi − rb)
- user similarities: weighted average, weight sim(a, b)

In [108]:
def get_improved_prediction(individual, group, k, similarity_measure, item):
    similarities = get_k_similarities(individual, group, k, similarity_measure)
    neighbors = get_k_neighbors(individual, group, k, similarity_measure)
    diff_from_avg_rating = neighbors.loc[:, item] - neighbors.mean(axis=1)
    return individual.mean() + sum(similarities * diff_from_avg_rating)/sum(abs(similarities))

get_improved_prediction(alice, other_users, 2, pearson, 'Item5')

4.871979899370592

##### Improvements (cont.):
- number of co-rated items
- agreement on more "exotic" items more important
- case amplification – more weight to very similar neighbors
- neighbor selection

### Item-based Nearest Neighbor CF
- compute similarity between **items**
- use this similarity to predict ratings
- more computationally efficient, often number of items << number of users
- practical advantage (over user-based filtering): feasible to check results using intuition

> User-based CF: find similar users to a user, use ratings of those users to predict item ratings by the user

> Item-based CF: find similar items to an item, use ratings of those items to predict ratings of the item by users

In [83]:
item5 = sample_df.T.loc['Item5']
item5

Alice    NaN
User1    3.0
User2    5.0
User3    4.0
User4    1.0
Name: Item5, dtype: float64

In [84]:
other_items = sample_df.T.drop('Item5')
other_items

Unnamed: 0,Alice,User1,User2,User3,User4
Item1,5.0,3.0,4.0,3.0,1.0
Item2,3.0,1.0,3.0,3.0,5.0
Item3,4.0,2.0,4.0,1.0,5.0
Item4,4.0,3.0,3.0,5.0,2.0


In [85]:
pearson(item5, other_items)

Item1    0.969458
Item2   -0.478091
Item3   -0.427618
Item4    0.581675
dtype: float64

In [86]:
get_prediction(item5, other_items, 2, pearson).loc['Alice']

4.5

In [87]:
get_improved_prediction(item5, other_items, 2, pearson, 'Alice')

4.6

In [88]:
def get_prediction_for_cosine_similarity(individual, group, k, similarity_measure, item):
    similarities = get_k_similarities(individual, group, k, similarity_measure)
    neighbors = get_k_neighbors(individual, group, k, similarity_measure)
    return sum(similarities * neighbors.loc[:, item])/sum(similarities)

display(get_prediction_for_cosine_similarity(item5, other_items, 2, cosine, 'Alice'))
get_prediction_for_cosine_similarity(item5, other_items, 2, adjusted_cosine, 'Alice')


4.5141032559718095

4.625