# Unit 2: Popularity Recommendations

In this section we build a recommender that sorts items by popularity as of the number of ratings they received. As a result we return the $N$ most popular items as recommendations.

In [None]:
from typing import Dict, List

from scipy.stats import spearmanr
from recsys_training.data import Dataset

In [None]:
ml100k_ratings_filepath = '../data/raw/ml-100k/u.data'

## Load Data

We load the dataset with 100000 ratings and split it 80/20 into train and test set.

In [None]:
data = Dataset(ml100k_ratings_filepath)
data.rating_split(seed=42)

In [None]:
data.train_ratings

In [None]:
data.test_ratings

Build a Mapping from user id to his or her item ratings. We will need this later.

In [None]:
# build user rating maps
user_ratings = {}
grouped = data.train_ratings[['user', 'item', 'rating']].groupby('user')
for user in data.users:
    vals = grouped.get_group(user)[['item', 'rating']].values
    user_ratings[user] = dict(zip(vals[:, 0].astype(int),
                                  vals[:, 1].astype(float)))

In [None]:
user_ratings[1]

## Popularity Ranking

How do we define _popularity_? It turns out that there can be different things justifying the popularity of content:
- pure count: simply count the number of ratings or interactions an item received regardless of their quality
- positive count: only count the number of ratings or interactions that we assume reflect preference towards items, e.g. ratings above user mean ratings
- time-dependency: despite evergreen stars items may also be popular for a limited time only - how can we account for this?

However, popularity ranking entails no personalization. We obtain a single popularity ranking of items which is independent from the user and serve the same top-$k$ items to every user.

### Popularity based on simple Interaction Counts

**Task**: Infer the item popularity order from training ratings as an array with items in descending order of popularity.

In [None]:
item_popularity = pass

In [None]:
item_popularity

In [None]:
item_order = pass

### Popularity based on positive Interaction Counts

Therefore, we must first remove all ratings that fall below the mean user rating which we assume to be the individual threshold between positive and negative item opinion.

In [None]:
user_mean_ratings = data.train_ratings[['user', 'rating']].groupby('user')
user_mean_ratings = user_mean_ratings.mean().reset_index()
user_mean_ratings.rename(columns={'rating': 'user_mean_rating'},
                         inplace=True)

In [None]:
user_mean_ratings

In [None]:
positive_train_ratings = data.train_ratings.merge(user_mean_ratings,
                                                  on='user',
                                                  how='left')

In [None]:
keep_ratings = (positive_train_ratings['rating'] >= positive_train_ratings['user_mean_rating'])

In [None]:
positive_train_ratings = positive_train_ratings[keep_ratings]
positive_train_ratings.drop(columns='user_mean_rating', inplace=True)

In [None]:
positive_train_ratings

In [None]:
item_popularity_positive = positive_train_ratings.item.value_counts()

In [None]:
item_popularity_positive

In [None]:
item_order_positive = item_popularity.index.values

#### How strong do both orderings correlate with each other?

Check spearman rank correlation between both orderings to quantify the distortion in ordering.

In [None]:
joint_counts = [[item_popularity.loc[item], item_popularity_positive[item]]
                for item in np.intersect1d(item_popularity_positive.index.values,
                                           item_popularity.index.values)]
joint_counts = np.array(joint_counts)

In [None]:
joint_counts

In [None]:
spearmanr(joint_counts)

### Using Popularity Ordering for top-$N$ Recommendations

In [None]:
item_order

In [None]:
item_order_positive

**Task**: Write the method `get_recommendation` that returns the top-$N$ items without any known positives, i.e. items the user has already viewed.

In [None]:
def get_recommendations(user: int,
                        user_ratings: dict,
                        item_popularity_order: np.array,
                        N: int) -> List[int]:
    pass
    
    return recommendations

In [None]:
get_recommendations(1, user_ratings, item_order, 10)

## Evaluating the Relevance of Recommendations

In [None]:
def get_relevant_items(test_ratings: pd.DataFrame) -> Dict[int, List[int]]:
    """
    returns {user: [items]} as a list of relevant items per user
    for all users found in the test dataset
    """
    relevant_items = test_ratings[['user', 'item']]
    relevant_items = relevant_items.groupby('user')
    relevant_items = {user: relevant_items.get_group(user)['item'].values
                      for user in relevant_items.groups.keys()}

    return relevant_items

In [None]:
relevant_items = get_relevant_items(data.test_ratings)

In [None]:
relevant_items[1]

$Precision@10$

Now, we can compute the intersection between the top-$N$ recommended items and the items each user interacted with. Ideally, we want every recommendation to be a hit, i.e. an item the user consumed. In this case the size of intersections is $N$ given $N$ recommendations which is a precision of 100% = $\frac{N}{N}$.

We compute the so called $Precision@N$ for every user and take the mean over all. The resulting metric is called _mean average precision at N_ or short $MAP@N$.

**Task:** Compute the $MAP@N$ for popularity recommendations

#### Item Order

In [None]:
N = 10

In [None]:
users = relevant_items.keys()
prec_at_N = dict.fromkeys(users)

pass