# Unit 3: Demographic Recommendations

In this section we leave the boring field of unpersonalized content and do our first steps for more personalization. But, before tailoring content to individuals we first tailor content to groups of individuals that by some criteria seem to be similar and therefore - assumed to - consume similar content.

We distinguish individuals into groups by using demographic information we have on these individuals. This can be any of
* age
* gender
* citizenship
* income
* etc.

In [None]:
import itertools
from typing import List

import numpy as np
import pandas as pd

In [None]:
from recsys_training.data import Dataset
from recsys_training.evaluation import get_relevant_items

In [None]:
ml100k_ratings_filepath = '../data/raw/ml-100k/u.data'
ml100k_user_filepath = '../data/raw/ml-100k/u.user'

## Load Data

In [None]:
data = Dataset(ml100k_ratings_filepath)
data.rating_split(seed=42)
user_ratings = data.get_user_ratings()

MovieLens also provides some demographic data on users along with the datasets. We will user _age_ and _gender_ in this tutorial to create different groups.

In [None]:
users = pd.read_csv(ml100k_user_filepath, sep='|', header=None,
                    names=['user', 'age', 'gender', 'occupation', 'zip'])

## Explore Data

In [None]:
users.head()

In [None]:
users.age.hist()

Let's define 2 x 6 user groups by splitting by gender and age class (see advice [here](https://support.google.com/analytics/answer/2799357?hl=de))

In [None]:
gender_groups = ['M', 'F']
age_groups = [(18, 24),
              (25, 34),
              (35, 44),
              (45, 54),
              (55, 65),
              (65, 73)]

user_groups = list(itertools.product(gender_groups, age_groups))
user_group_indices = range(len(user_groups))
user_groups = dict(zip(user_group_indices, user_groups))

In [None]:
user_groups

In [None]:
def assign_group(row, age_groups=age_groups):
    for age_group in age_groups:
        if row['age'] >= age_group[0] and row['age'] <= age_group[1]:
            break
    return (row['gender'], age_group)

In [None]:
users['group'] = users.apply(lambda row: assign_group(row, age_groups), axis=1)

In [None]:
users['group'] = users['group'].map(lambda val: list(user_groups.values()).index(val))

In [None]:
users['group'].value_counts()

**Task**: For each group we use popularity recommendations based on the groups historical viewing popularity. Infer the `group_popularities` as a mapping from group index to the item ordering array.

In [None]:
group_popularities = dict.fromkeys(user_group_indices)

In [None]:
for group_idx in user_group_indices:
    pass

In [None]:
group_popularities

**Task:** Adapt the $MAP@10$ recommendation method from the popularity recommendation notebook accordingly and compute the $MAP@10$ for demographic recommendations.

In [None]:
user_group_map = dict(zip(users['user'].values,users['group'].values))

In [None]:
def get_recommendations(user: int,
                        user_ratings: dict,
                        user_group_map: dict,
                        group_popularities: dict,
                        N: int) -> List[int]:
    pass
    
    return recommendations

## Evaluation Evaluating the Relevance of Recommendations

In [None]:
relevant_items = get_relevant_items(data.test_ratings)

Computing $MAP@10$

In [None]:
N = 10

In [None]:
users = relevant_items.keys()
prec_at_N = dict.fromkeys(users)

for user in users:
    recommendations = get_recommendations(user,
                                          user_ratings,
                                          user_group_map,
                                          group_popularities,
                                          N=N)
    hits = np.intersect1d(recommendations,
                          relevant_items[user])
    prec_at_N[user] = len(hits)/N

In [None]:
np.mean(list(prec_at_N.values()))

What is the $MAP@10$ for ea. specific group?

In [None]:
group_maps = dict.fromkeys(user_group_indices, list())
for user in users:
    group_maps[user_group_map[user]].append(prec_at_N[user])
for group in user_group_indices:
    group_maps[group] = np.mean(group_maps[group])

In [None]:
group_maps