# Unit 4: Neighborhood-based Collaborative Filtering for Rating Prediction

In this section we generate personalized recommendations for the first time. We exploit rating similarities among users and items to identify similar users and items that assist in finding the relevant items to recommend for each user.

This describes the fundamental idea behind Collaborative Filtering (CF) and using kNN is a neighborhood-based approach towards CF. In a later unit we will also have a look at model-based approaches.

This is also the first time we try to predict user ratings for unknown items using rating predictions to take the top-$N$ items with the highest rating predictions and recommend those to the user.

In [3]:
from collections import OrderedDict
import itertools
from typing import Dict, List, Tuple

import numpy as np
import pandas as pd

In [1]:
from recsys_training.data import Dataset
from recsys_training.evaluation import get_relevant_items
from recsys_training.utils import get_entity_sim

In [None]:
ml100k_ratings_filepath = '../data/raw/ml-100k/u.data'

## Load Data

In [None]:
data = Dataset(ml100k_ratings_filepath)
data.rating_split(seed=42)
user_ratings = data.get_user_ratings()

The idea behind this recommender is to use item ratings of the $k$ most similar users (neighbors). We identify those _nearest neighbors_ with a similarity metric which we apply to the ratings both, root user and possible neighbor, have in common. Similarity thereby means having a similar opinion on movies.

The steps are as follows:

1. Compute user-user similarities (we use the Pearson Correlation Coefficient here, but feel free to try other similarity metrics)

2. For each user:

    1. Get the k nearest neighbors along with their similarities
    2. Collect the neighborhood item ratings and ignore those already rated by the root user
    3. Item Rating Prediction: Compute the similarity-weighted sum of neighborhood item ratings
    4. Recommendations: Get the $N$ items with the highest ratings that have a minimum rating count

### 1. User-User Similarities

In [None]:
sim_metric = 'pearson'
user_user_sims = {}
user_pairs = itertools.combinations(data.users, 2)

The following takes a few seconds to finish ...

In [None]:
for pair in user_pairs:
    user_user_sims[pair] = get_entity_sim(pair[0], pair[1],
                                          user_ratings,
                                          sim_metric)

In [None]:
user_user_sims[(1,4)]

## 2. Computing Recommendations

### A. Implement Nearest Neighbors for a given user

![](Parrot.png)

**Task:** It's your turn again. Complete `get_k_nearest_neighbors` to return a sorted list of the $k$ nearest neighbors - identified by their id - for a given user, each along with its similarity.

In [4]:
def get_k_nearest_neighbors(user: int, k: int, user_user_sims: dict) -> List[Tuple[int, float]]:
    neighbors = set(data.users)
    neighbors.remove(user)

    nearest_neighbors = dict()
    
    pass
    
    return nearest_neighbors[:k]

In [None]:
user_neighbors = get_k_nearest_neighbors(1, k=10, user_user_sims=user_user_sims)

In [None]:
user_neighbors

### B. Obtain the Neighborhood Ratings

![](Parrot.png)

**Task:** Now, use the nearest neighbors and get their ratings, but leave out the items our root user has already rated (known positives). Return a mapping from unknown item to a list of dicts with neighbor similarity and item rating.

In [None]:
def get_neighborhood_ratings(user, user_neighbors: List[Tuple[int, float]]) -> Dict[int, List[Dict[str, float]]]:
    neighborhood_ratings = dict()
    
    pass
    
    return neighborhood_ratings

In [None]:
neighborhood_ratings = get_neighborhood_ratings(1, user_neighbors)

In [None]:
neighborhood_ratings

### C. Compute Rating Predictions from Neighborhood Ratings

![](Parrot.png)

**Task:** In this step, we estimate ratings for the seed user based on the neighborhood ratings. We implement a similarity weighted average of neighbor ratings for that. Return a mapping from item to its prediction and the count of neighbor ratings received.

In [None]:
def compute_rating_pred(neighborhood_ratings: dict) -> dict:
    rating_preds = dict()
    
    pass

    return rating_preds

In [None]:
rating_preds = compute_rating_pred(neighborhood_ratings)

In [None]:
list(rating_preds.items())[:20]

### D. Compute the Top-$N$ Recommendation Items

![](Parrot.png)

**Task:** The last step takes the rating predictions and returns the $N$ highest predictions which have a minimum rating count, i.e. the number of neighbors from the neighborhood that rated this item.

In [None]:
def compute_top_n(rating_preds: dict, min_count: int, N: int) -> OrderedDict:
    pass
    
    return OrderedDict(sorted_rating_preds[:N])

In [None]:
top_n_recs = compute_top_n(rating_preds, min_count=2, N=10)

In [None]:
top_n_recs

### Combine all steps in `get_recommendations`

In [None]:
def get_recommendations(user: int,
                        user_user_sims: dict,
                        k: int,
                        C: int,
                        N: int):
    user_neighbors = get_k_nearest_neighbors(user, k=k, user_user_sims=user_user_sims)
    neighborhood_ratings = get_neighborhood_ratings(user, user_neighbors)
    rating_preds = compute_rating_pred(neighborhood_ratings)
    top_n_recs = compute_top_n(rating_preds, min_count=C, N=N)
    return top_n_recs

In [None]:
get_recommendations(1, user_user_sims, 10, 2, 10)

### Evaluation

Let's check the performance of the neighborhood- and user-based recommender for a neighborhood size of $k = 60$, minimum rating count of $C = 10$ and stay with $N = 10$ recommendations.

In [None]:
k = 60
C = 10
N = 10

In [None]:
relevant_items = get_relevant_items(data.test_ratings)

In [None]:
users = relevant_items.keys()
prec_at_N = dict.fromkeys(data.users)

for user in users:
    recommendations = get_recommendations(user, user_user_sims, k, C, N)
    recommendations = list(recommendations.keys())
    hits = np.intersect1d(recommendations,
                          relevant_items[user])
    prec_at_N[user] = len(hits)/N

In [None]:
np.mean([val for val in prec_at_N.values() if val is not None])