# 1. Intro

- What is Recommender Systems?
    - A computer program that recommend the "best" items to users in different contexts
    - A notion of a best match is typically obtained by optimizing for objective metrics like 'clicks', 'impression', 'GMV', 'CP', etc

## 1.1 Overview of recommender systems for web services
- input 'signals'
    - what we know about items
    - what we know about users
        - user profiles
    - how users interacted with items
        - clickstream events
- processing
    - algorithmic techniques
- output 'objectives' & 'metrics'
    -  objectives: push vs pull
        - pull: retrieving items that are relevant to the explicit information needs of the user, normally related to the sub-topic 'search'
        - push: show the items that is likely to engage users, primary focus of 'recommendation'
    - metrics
        - click, impression, gmv, purchase(usage) per 1000 impressions, cp
- pitfalls
    - cold start 
    - explore & exploit
## 1.2 Simple recommender

# 2. Classic Methods
## 2.1 Item Characterization

# 3. Explore-Exploit
# 4. Evaluation Methods
## 4.4 Offline replay
- https://github.com/sb-ai-lab/RePlay

- Simliarity
    - significance weighting: The discount factor kicks in when the number of common ratings between the two users is less than a particular threshold B

  

# Recommender systems


In [1]:
ratings = [
    [7, 6, 7, 4, 5, 4],
    [6, 7, None, 4, 3, 4],
    [None, 3, 3, 1, 1, None],
    [1, 2, 2, 3, 3, 4],
    [1, None, 1, 2, 3, 3]
]

In [4]:
def filter_out_none(l):
    return [x for x in l if x is not None]

In [6]:
means = [sum(filter_out_none(x))/len(filter_out_none(x)) for x in ratings]
means

[5.5, 4.8, 2.0, 2.5, 2.0]

In [41]:
def filter_both_exist(a, b):
    f_a = []
    f_b = []
    for i in range(0, len(a)):
        if a[i] is not None and b[i] is not None:
            f_a.append(a[i])
            f_b.append(b[i])
    return f_a, f_b

def calculate_pearson(a, b, m_a, m_b):
    numerator = sum([(x-m_a)*(y-m_b) for x, y in zip(a, b)])
    denominator = sum([(x-m_a)**2 for x in a]) ** (1/2) * sum([(x-m_b)**2 for x in b]) ** (1/2)
    # print(f"numerator: {numerator}")
    # print(f"denominator: {denominator}")

    return numerator / denominator


def calculate_cosine(a, b):
    numerator = sum([(x)*(y) for x, y in zip(a, b)])
    denominator = sum([(x)**2 for x in a]) ** (1/2) * sum([(x)**2 for x in b]) ** (1/2)

    return numerator / denominator
 
    

In [43]:
for i in range(0, len(ratings)):
    pearson = calculate_pearson(*filter_both_exist(ratings[i], ratings[2]), means[i], means[2])
    print(f"pearson: {i} to {2} {pearson}")

print() 

for i in range(0, len(ratings)):
    cosine = calculate_cosine(*filter_both_exist(ratings[i], ratings[2]))
    print(f"cosine: {i} to {2} {cosine}")
    

pearson: 0 to 2 0.8944271909999159
pearson: 1 to 2 0.9384742644069303
pearson: 2 to 2 1.0
pearson: 3 to 2 -1.0
pearson: 4 to 2 -0.8164965809277259

cosine: 0 to 2 0.9561828874675148
cosine: 1 to 2 0.9813994921258943
cosine: 2 to 2 0.9999999999999998
cosine: 3 to 2 0.7893522173763263
cosine: 4 to 2 0.6446583712203042


In [35]:
(1.5**2 + 1.5 ** 2+ 1.5 ** 2 + 0.5 ** 2) ** (1/2)

2.6457513110645907

In [34]:
1.5 ** 2

2.25

In [44]:
r_hat_31 = (7 * 0.894 + 6 * 0.939) / (0.894 + 0.939)
print(r_hat_31)

6.48772504091653


In [45]:
r_hat_36 = (4 * 0.894 + 4 * 0.939) / (0.894 + 0.939)
print(r_hat_36)

4.0


In [46]:
# mean-centered
r_hat_31 = 2 + (1.5 * 0.894 + 1.2 * 0.939) / (0.894 + 0.939)
print(r_hat_31)

3.346317512274959


In [51]:
adjusted_ratings = [[e - means[idx] if e is not None else None for e in r ] for idx, r in enumerate(ratings)]
adjusted_ratings

[[1.5, 0.5, 1.5, -1.5, -0.5, -1.5],
 [1.2000000000000002,
  2.2,
  None,
  -0.7999999999999998,
  -1.7999999999999998,
  -0.7999999999999998],
 [None, 1.0, 1.0, -1.0, -1.0, None],
 [-1.5, -0.5, -0.5, 0.5, 0.5, 1.5],
 [-1.0, None, -1.0, 0.0, 1.0, 1.0]]

In [54]:
item_1 = [r[0] for r in adjusted_ratings]
item_3 = [r[2] for r in adjusted_ratings]
calculate_cosine(*filter_both_exist(item_1, item_3))

0.9116846116771036

- SVD: https://www.youtube.com/watch?v=xy3QyyhiuY4


In [55]:
ratings = [
    [1, 1, 1],
    [7, 7, 7],
    [3, 1, 1],
    [5, 7, 7],
    [3, 1, None],
    [5, 7, None],
    [3, 1, None],
    [5, 7, None],
    [3, 1, None],
    [5, 7, None],
    [3, 1, None],
    [5, 7, None]
]

In [57]:
for i in range(3):
    s = 0
    cnt = 0
    for j in range(12):
        if ratings[j][i] is not None:
            s += ratings[j][i]
            cnt += 1
    print(f"{i} - mean : {s / cnt}")

0 - mean : 4.0
1 - mean : 4.0
2 - mean : 4.0


In [62]:
filled_ratings = [[e if e is not None else 4.0 for e in ratings[idx]] for idx, x in enumerate(ratings)]
filled_ratings

[[1, 1, 1],
 [7, 7, 7],
 [3, 1, 1],
 [5, 7, 7],
 [3, 1, 4.0],
 [5, 7, 4.0],
 [3, 1, 4.0],
 [5, 7, 4.0],
 [3, 1, 4.0],
 [5, 7, 4.0],
 [3, 1, 4.0],
 [5, 7, 4.0]]

In [64]:
import numpy as np
x = np.array(filled_ratings).T
np.cov(x)

array([[2.54545455, 4.36363636, 2.18181818],
       [4.36363636, 9.81818182, 3.27272727],
       [2.18181818, 3.27272727, 3.27272727]])

In [67]:
none_filtered_ratings = [x for x in ratings if x[0] is not None and x[1] is not None and x[2] is not None]
x = np.array(none_filtered_ratings).T
np.cov(x)

array([[ 6.66666667,  8.        ,  8.        ],
       [ 8.        , 12.        , 12.        ],
       [ 8.        , 12.        , 12.        ]])

In [68]:
none_filtered_ratings

[[1, 1, 1], [7, 7, 7], [3, 1, 1], [5, 7, 7]]