### Algorithms to deal with repeat recommendations


In this notebook we review some basic algorithms to deal with repeatitions in a system of recommendations. For illustration, we use a database of user's behaviour in a movie website. The example was taken from "Practical Recommender Systems" by Kim Falk. To be able to replicate the example one must follow the instructions to build the application locally. The instructions can be found here: https://github.com/practical-recommender-systems/moviegeek

In [27]:
from builder.implicit_ratings_calculator import (
    query_log_for_users, 
    query_aggregated_log_data_for_user)
import pandas as pd
from collections import defaultdict
import random

Here we have an example of the events generated to a single user:

In [13]:
df = pd.DataFrame(list(query_aggregated_log_data_for_user('400003').values()))
df.head()

Unnamed: 0,id,created,user_id,content_id,event,session_id,count
0,7,2023-01-04 16:35:45,400003,4501244,details,794773,1
1,8,2023-01-04 16:35:45,400003,3521164,moreDetails,794773,1
2,18,2023-01-04 16:35:45,400003,2241351,addToList,794773,1
3,22,2023-01-04 16:35:45,400003,1700841,moreDetails,794773,1
4,24,2023-01-04 16:35:45,400003,1608290,moreDetails,794773,1


Now we use a simple algorithm to generate recommendations for this user:

In [55]:
def calculate_implicit_ratings_for_user(user_id):
    
    w1 = 100
    w2 = 50
    w3 = 15

    data = query_aggregated_log_data_for_user(user_id)

    agg_data = dict()
    max_rating = 0

    for row in data:
        content_id = str(row['content_id'])
        if content_id not in agg_data.keys():
            agg_data[content_id] = defaultdict(int)

        agg_data[content_id][row['event']] = row['count']

    ratings = dict()
    for k, v in agg_data.items():

        rating = w1 * v['buy'] + w2 * v['details'] + w3 * v['moredetails']
        max_rating = max(max_rating, rating)

        ratings[k] = rating

    for content_id in ratings.keys():
        ratings[content_id] = 10 * ratings[content_id] / max_rating
        
    return ratings

In [116]:
def get_recommendations(user_id):
    d = calculate_implicit_ratings_for_user(user_id)
    sorted_by_value = sorted(d.items(), key=lambda item: item[1])
    
    return sorted_by_value[::-1]

In [117]:
rec = get_recommendations('400003')

In [118]:
rec[:10]

[('4513674', 10.0),
 ('3783958', 9.519774011299434),
 ('3381008', 9.40677966101695),
 ('3110958', 9.322033898305085),
 ('1878870', 9.307909604519773),
 ('2005151', 9.138418079096045),
 ('5512872', 8.841807909604519),
 ('2869728', 8.502824858757062),
 ('1289401', 7.7824858757062145),
 ('4651520', 7.5141242937853105)]

### Impression discount

In [None]:
def apply_impression_discount(scores, impressions, decay_factor=0.5):
    discounted_scores = {}
    for item, score in scores.items():
        num_impressions = impressions.get(item, 0)
        discounted_score = score * (decay_factor ** num_impressions)
        discounted_scores[item] = discounted_score
    return discounted_scores

### Frequency capping

In [44]:
def frequency_capping(scores, impressions, max_impressions_per_item):
    capped_scores = {}
    for item, score in scores.items():
        num_impressions = impressions.get(item, 0)
        if num_impressions < max_impressions_per_item:
            capped_scores[item] = score
    return capped_scores

### First strategy: Hacker-news algorithm

This strategy consists basically in a re-classification that give a discount at item-level. This discount is called gravity and can be based on any metric at item-level

In [119]:
def hn_algo(score, user_id='400003'):
    gravity = 1.8
    age = 10
    result = (score-1)/(pow(age+2,gravity))
    
    return result

In [122]:
new_recs = {k[0]:hn_algo(k[1]) for k in rec}
new_recs = sorted(new_recs.items(), key=lambda item: item[1])[::-1]
new_recs[:10]

[('4513674', 9.988585056739463),
 ('3783958', 9.508359068038898),
 ('3381008', 9.395364717756413),
 ('3110958', 9.310618955044548),
 ('1878870', 9.296494661259237),
 ('2005151', 9.127003135835508),
 ('5512872', 8.830392966343982),
 ('2869728', 8.491409915496526),
 ('1289401', 7.771070932445678),
 ('4651520', 7.502709350524774)]