# Laboratorium 5 - rekomendacje grupowe

## Przygotowanie

 * pobierz i wypakuj dataset: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
   * więcej możesz poczytać tutaj: https://grouplens.org/datasets/movielens/
 * [opcjonalnie] Utwórz wirtualne środowisko
 `python3 -m venv ./recsyslab5`
 * zainstaluj potrzebne biblioteki:
 `pip install numpy pandas matplotlib`

## Część 1. - przygotowanie danych

In [42]:
# importujemy wszystkie potrzebne pakiety

import math
import numpy as np
import pandas

from random import choice, sample
from statistics import mean, stdev

from reco_utils import *

In [43]:
# wczytujemy oceny uytkownikow i obliczamy (za pomocą collaborative filtering) wszystkie przewidywane oceny filmow

raw_ratings = pandas.read_csv('ml-latest-small/ratings.csv').drop(columns=['timestamp'])
movies = list(raw_ratings['movieId'].unique())
users = list(raw_ratings['userId'].unique())
ratings = get_predicted_ratings(raw_ratings)
ratings

Total error: 215458.061347808
Total error: 208517.29336294954
Total error: 202071.34512947357
Total error: 196070.1282819043
Total error: 190469.82199874998
Total error: 185231.95507165158
Total error: 180322.6422998679
Total error: 175711.94573161835
Total error: 171373.33759164895
Total error: 167283.24655695906
Total error: 163420.67275455536
Total error: 159766.85973492483
Total error: 156305.0139260574
Total error: 153020.06384528964
Total error: 149898.4527513547
Total error: 146927.95954026564
Total error: 144097.54358894107
Total error: 141397.20997767473
Total error: 138817.8921132275
Total error: 136351.3492567004
Total error: 133990.0768562638
Total error: 131727.22791136804
Total error: 129556.54386557947
Total error: 127472.29375029354
Total error: 125469.22048961095
Total error: 123542.49343436596
Total error: 121687.66632600778
Total error: 119900.64000310995
Total error: 118177.62925821809
Total error: 116515.13333341638
Total error: 114909.9096117162
Total error: 11335

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
1,7,6,6,5,7,6,6,6,6,7,...,7,4,8,6,7,6,6,6,6,4
2,0,0,6,10,9,0,3,10,0,0,...,8,7,8,3,5,9,4,5,0,10
3,4,5,5,9,4,10,7,5,8,7,...,9,9,10,10,8,0,9,5,3,7
4,8,6,7,5,4,8,7,7,7,8,...,7,5,10,7,4,7,4,7,7,5
5,8,10,10,6,10,4,4,1,0,6,...,10,8,8,1,0,10,7,5,2,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,6,7,6,6,6,6,7,6,6,6,...,5,6,5,5,6,6,6,6,6,6
607,9,6,0,10,9,9,4,1,4,4,...,5,3,9,8,8,3,10,5,4,10
608,5,6,6,6,6,6,6,6,6,6,...,5,6,5,5,6,6,6,5,6,6
609,3,4,2,6,5,2,6,0,8,4,...,4,10,3,6,10,0,10,3,10,6


In [44]:
# wczytujemy nazwy filmow i kategorie

movies_metadata = pandas.read_csv('ml-latest-small/movies.csv').set_index('movieId')
movies_metadata

Unnamed: 0_level_0,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
...,...,...
193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
193585,Flint (2017),Drama
193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [45]:
# wczytujemy przykladowe grupy uzytkownikow
groups = pandas.read_csv('groups.csv', header=None).values.tolist()
groups

[[606, 274, 474, 599, 448],
 [111, 307, 474, 599, 414],
 [469, 182, 232, 448, 600],
 [508, 581, 497, 402, 566],
 [300, 515, 245, 568, 507],
 [2, 371, 252, 518, 37],
 [269, 360, 469, 287, 308],
 [243, 527, 418, 118, 370],
 [186, 559, 327, 553, 314]]

In [46]:
# przygotowujemy funkcje pomocnicza

def describe_group(group, N=10):
    print(f'\n\nUser ids: {group}')
    group_size = len(group)
    
    mean_stdev = ratings.loc[group].std(axis=0).mean()
    median_stdev = ratings.loc[group].std(axis=0).median()
    std_stdev = ratings.loc[group].std(axis=0).std()
    print(f'\nMean ratings deviation: {mean_stdev}')
    print(f'Median ratings deviation: {median_stdev}')
    print(f'Standard deviation of ratings deviation: {std_stdev}')
    
    average_scores = ratings.iloc[group].mean(axis=0)
    average_scores = average_scores.sort_values()
    best_movies = [(movies_metadata['title'][movie_id], average_scores[movie_id]) for movie_id in list(average_scores[-N:].index)]
    worst_movies = [(movies_metadata['title'][movie_id], average_scores[movie_id]) for movie_id in list(average_scores[:N].index)]
    
    print('\nBest movies:')
    for movie, score in best_movies[::-1]:
        print(f'{movie}, {score}*')
    print('\nWorst movies:')
    for movie, score in worst_movies:
        print(f'{movie}, {score}*')

describe_group(groups[2])



User ids: [469, 182, 232, 448, 600]

Mean ratings deviation: 0.7382431733751043
Median ratings deviation: 0.7071067811865476
Standard deviation of ratings deviation: 0.377500371102978

Best movies:
Heidi Fleiss: Hollywood Madam (1995), 10.0*
Songs From the Second Floor (Sånger från andra våningen) (2000), 10.0*
Nasu: Summer in Andalusia (2003), 10.0*
Che: Part Two (2008), 10.0*
Sightseers (2012), 10.0*
Funny Girl (1968), 10.0*
Spotlight (2015), 10.0*
London Has Fallen (2016), 9.8*
Winter Light (Nattvardsgästerna) (1963), 9.8*
Paper Towns (2015), 9.8*

Worst movies:
Miss Congeniality 2: Armed and Fabulous (2005), 0.6*
Days of Being Wild (A Fei jingjyuhn) (1990), 0.6*
Saturday Night Fever (1977), 0.8*
Imaginarium of Doctor Parnassus, The (2009), 0.8*
Double Impact (1991), 1.0*
Lightning in a Bottle (2004), 1.0*
Dressed to Kill (1980), 1.0*
Topkapi (1964), 1.0*
Elizabeth (1998), 1.0*
Magdalene Sisters, The (2002), 1.0*


## Część 2. - algorytmy proste

In [47]:
# zdefiniujmy interfejs dla wszystkich algorytmow rekomendacyjnych

class Recommender:
    def recommend(self, movies, ratings, group, size):
        pass


# jako pierwszy zaimplementujemy algorytm losowy - dla porownania
    
class RandomRecommender(Recommender):
    def __init__(self):
        self.name = 'random'
        
    def recommend(self, movies, ratings, group, size):
        return sample(movies, size)

In [48]:
# algorytm rekomendujacy filmy o najwyzszej sredniej ocen

class AverageRecommender(Recommender):
    def __init__(self):
        self.name = 'average'
    
    def recommend(self, movies, ratings, group, size):
        selected_rows = ratings.loc[ratings.index.isin(group)]
        movies_averages = dict()
        for column in selected_rows.columns:
            movies_averages[column] = selected_rows[column].mean()
        best_movies = [score[0] for score in sorted(movies_averages.items(), key=lambda x: x[1], reverse=True)[:size]]
        return best_movies            

In [49]:
# algorytm rekomendujacy filmy o najwyzszej sredniej ocen,
#   ale rownoczesnie wykluczajacy te filmy, ktore otrzymaly choc jedna ocene ponizej thresholdu

class AverageWithoutMiseryRecommender(Recommender):
    def __init__(self, score_threshold):
        self.name = 'average_without_misery'
        self.score_threshold = score_threshold
        
    def recommend(self, movies, ratings, group, size):
        selected_rows = ratings.loc[ratings.index.isin(group)]
        movies_averages = dict()
        for column in selected_rows.columns:
            if (selected_rows[column] < self.score_threshold).any():
                continue
            else :
                movies_averages[column] = selected_rows[column].mean()
        best_movies = [score[0] for score in sorted(movies_averages.items(), key=lambda x: x[1], reverse=True)[:size]]
        return best_movies

In [50]:
# algorytm uwzgledniajacy preferencje tylko jednego uzytkownika w kazdej iteracji

class FairnessRecommender(Recommender):
    def __init__(self):
        self.name = 'fairness'
        
    def recommend(self, movies, ratings, group, size):
        selected_rows = ratings.loc[ratings.index.isin(group)]
        best_movies = []
        group_size = len(group)
        for i in range(size):
            user = group[i % group_size]
            found = False
            counter = 1
            while not found and counter < size:
                max_columns = selected_rows.loc[user].nlargest(counter).index
                for column in max_columns:
                    if column not in best_movies:
                        best_movies.append(column)
                        found = True
                        break
                counter += 1
        return best_movies

In [51]:
# wybrany algorytm wyborczy (dyktatura, glosowanie proste, Borda, Copeland)

class VotingRecommender(Recommender):
    def __init__(self, score_threshold):
        self.name = 'simple_voter'
        self.threshold = score_threshold
    
    def recommend(self, movies, ratings, group, size):
        selected_rows = ratings.loc[ratings.index.isin(group)]
        movies_scores = dict()
        for column in selected_rows.columns:
            movies_scores[column] = selected_rows[column].apply(lambda col: (col > self.threshold)).sum()
        best_movies = [score[0] for score in sorted(movies_scores.items(), key=lambda x: x[1], reverse=True)[:size]]
        return best_movies       

In [52]:
# algorytm zachlanny, aproksymujacy metode Proportional Approval Voting
#   w kazdej iteracji wybieramy ten film, ktory najbardziej zwieksza zadowolenie zgodnie z punktacja PAV

class ProportionalApprovalVotingRecommender(Recommender):
    def __init__(self, threshold):
        self.threshold = threshold
        self.name = 'PAV'

    def recommend(self, movies, ratings, group, size):
        selected_rows = ratings.loc[ratings.index.isin(group)]
        scores = {user: 1 for user in group}
        best_movies = []
        for _ in range(size):
            movies_scores = {movie: 1 for movie in movies}
            for movie in [col for col in selected_rows.columns if col not in best_movies]:
                for user in group:
                    if selected_rows.loc[user, movie] > self.threshold:
                        movies_scores[movie] += (1 / scores[user])

            best_movie = sorted(movies_scores.items(),
                                key=lambda x: x[1], reverse=True)[0][0]
            best_movies.append(best_movie)
            for user in group:
                if selected_rows.loc[user, best_movie] > self.threshold:
                    scores[user] += 1
        return best_movies

## Część 3. - funkcje celu

In [78]:
# dwie funkcje pomocnicze:
#  - znajdujaca ulubione filmy danego uzytkownika
#  - obliczajaca sume ocen wystawionych przez uzytkownika wszystkim filmom w rekomendacji

def top_n_movies_for_user(ratings, movies, user_id, n):
    return ratings.loc[user_id].nlargest(n).index

def total_score(recommendation, user_id, ratings):
    return ratings.loc[user_id, recommendation].sum()

In [81]:
def overall_user_satisfaction(recommendation, user_id, movies, ratings):
    recommendation_score = total_score(recommendation, user_id, ratings)
    best_score = total_score(top_n_movies_for_user(
 ratings, movies, user_id, len(recommendation)), user_id, ratings)
    if best_score == 0:
        return 1
    else:
        return recommendation_score / best_score


def overall_group_satisfaction(recommendation, group, movies, ratings):
    if len(group) == 0:
        return 0
    score = 0
    for user in group:
        score += overall_user_satisfaction(recommendation,
                                           user, movies, ratings)
    return score / len(group)


def group_disagreement(recommendation, group, movies, ratings):
    if len(group) == 0:
        return 0
    max = 0
    min = 2
    for user in group:
        score = overall_user_satisfaction(
            recommendation, user, movies, ratings)
        if score > max:
            max = score
        if score < min:
            min = score
    return max - min

## Część 4. - Sequential Hybrid Aggregation

In [69]:
# algorytm balansujacy pomiedzy wyborem elementow o najwyzszej sredniej ocen
#   i o najwyzszej minimalnej ocenie
#   score(G, d_z, j) = (1 - alfa) * avgScore(G,d_z,j)   +   alfa * minScore(G,d_z,j)
#  a minScore(G, d_z, j) to minimalna ocena filmu d_z przez uzytkownikow z grupy G w iteracji j
#  a avgScore(G, d_z, j) to srednia ocena filmu d_z przez uzytkownikow z grupy G w iteracji j
#  a alfa jest równy group_disagreement z poprzedniej iteracji (nie jest parametrem, tylko wyliczane na bieżąco)

class SequentialHybridAggregationRecommender(Recommender):
    def __init__(self):
        self.name = 'balanced'
    
    def recommend(self, movies, ratings, group, size):
        selected_rows = ratings.loc[ratings.index.isin(group)]
        alfa = 0
        best_movies = []
        for _ in range(size):
            movies_scores = dict()
            for movie in [col for col in selected_rows.columns if col not in best_movies]:
                movies_scores[movie] = (1 - alfa) * selected_rows[movie].mean() + alfa * selected_rows[movie].min()
            best_movie = sorted(movies_scores.items(),key=lambda x: x[1], reverse=True)[0][0]
            best_movies.append(best_movie)
            alfa = group_disagreement(best_movies, group, movies, ratings)
        return best_movies

## Część 5. - porównanie algorytmów

In [82]:
recommenders = [
    RandomRecommender(),
    AverageRecommender(),
    AverageWithoutMiseryRecommender(5),
    FairnessRecommender(),
    VotingRecommender(5),
    ProportionalApprovalVotingRecommender(5),
    SequentialHybridAggregationRecommender()
]

recommendation_size = 10

# dla kazdego algorytmu:
#  - wygenerujmy jedna rekomendacje dla kazdej grupy
#  - obliczmy wartosci obu funkcji celu dla kazdej rekomendacji
#  - obliczmy srednia i odchylenie standardowe dla obu funkcji celu
#  - wypiszmy wyniki na konsole

for recommender in recommenders:
    print(f'\n\n{recommender.name}')
    group_satisfaction = []
    group_disagreements = []
    for group in groups:
        recommendation = recommender.recommend(movies, ratings, group, recommendation_size)
        group_satisfaction.append(overall_group_satisfaction(recommendation, group, movies, ratings))
        group_disagreements.append(group_disagreement(recommendation, group, movies, ratings))
    print(f'Group satisfaction: {mean(group_satisfaction)} +- {stdev(group_satisfaction)}')
    print(f'Group disagreement: {mean(group_disagreements)} +- {stdev(group_disagreements)}')



random
Group satisfaction: 0.6599890260631002 +- 0.12556941917676964
Group disagreement: 0.1763139329805996 +- 0.1267237354477698


average
Group satisfaction: 0.9581446208112875 +- 0.02748050620884482
Group disagreement: 0.08436507936507937 +- 0.05774509039609002


average_without_misery
Group satisfaction: 0.9581446208112875 +- 0.02748050620884482
Group disagreement: 0.08436507936507937 +- 0.05774509039609002


fairness
Group satisfaction: 0.7193788947677836 +- 0.12593397605389126
Group disagreement: 0.18085537918871253 +- 0.08149717261133048


simple_voter
Group satisfaction: 0.8300883793846756 +- 0.033789798761796774
Group disagreement: 0.1420546737213404 +- 0.09548130688342475


PAV
Group satisfaction: 0.8140124436605919 +- 0.03811234492663847
Group disagreement: 0.1082451499118166 +- 0.08173203035860623


balanced
Group satisfaction: 0.9582398589065255 +- 0.027326962465283544
Group disagreement: 0.08706349206349208 +- 0.059582352163559345
