# Importing libraries

In [1]:
import pandas as pd
import numpy as np
from tqdm import tqdm

from surprise import Reader, Dataset, SVD, accuracy
from surprise.model_selection.validation import cross_validate

from surprise.model_selection import train_test_split

# Reading data

In [2]:
data = pd.read_csv('data/dataset.csv')
data = data.sort_values(['timestamp'])

I will use here Collaborative filtering using SurPRISE library in Python. https://surprise.readthedocs.io

In [3]:
# making dataset
reader = Reader()
data = data.drop('timestamp', axis=1)
data = Dataset.load_from_df(data, reader) 

Let's make train (80%) and test(20%) datasets using function from surprise. As this task is designed we dont need to shuffle. I think the author of the task wanted us to make predictions based on older observations. As we can see in my "draft" notebook we have data for the train mostly from 1997, for the test from 1998. Our predictions will be more life-oriented, since we always have older observations and have to build predictions for the current ones.

In [4]:
train, test = train_test_split(data, test_size=0.2, shuffle=False)

# Fitting the model

Our data does not have that many features to apply some neural networks or more sophisticated algorithms. That is why i suggest to try Collaborative filtering here. Collaborative filtering is a technique that can filter out items that a user might like on the basis of reactions by similar users. So, we do have users, items and some ratings (which in fact how they rate items). For avoiding problems such as _scalability_ and _sparsity_ I am offering to use SVD (Single Value Decomposition). Like I said before, beautiful framework SuPRISE has these algorithms to use (and many more!).

Let's fit our model with 10 factors

In [5]:
svd = SVD(n_factors=10, random_state=17)
# fit
svd.fit(train)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x11e0420f0>

Now we can build predictions for our test dataset.

In [8]:
predictions = svd.test(test)

print(len(predictions))

20001


# Recommender

So, our model now has ability to build the estimations for the rating based on given user and item. In our case we need to recommend our users most relevant items. I suggest to save best (top/first) n items, that the user would rate if he had to do that. So when we build a prediction we just sort our itmes by the estimated rating for this particular item and user.

In [9]:
from collections import defaultdict

In [14]:
def first_recs(pred, n):
    first_n = defaultdict(list)
    # loop
    for user_id, item_id, x, estimation, _ in predictions:
        first_n[user_id].append((item_id, estimation))
    
    # sort by the estimation rating for each user
    for user_id, est_ratings in first_n.items():
        # sort by rating
        est_ratings.sort(key=lambda a: a[1], reverse=True)
        first_n[user_id] = est_ratings[:n]

    return first_n    

In [12]:
def recommend(user, first_n):
    return [x[0] for x in first_n[user]]

# Get first n items for each user

In [15]:
first_10 = first_recs(predictions, 10)

# Metric

This function was given by the author, but we still need to understand what is under the hood. Average precision function reward our model for the continious right precisions. If we predict first m items correctly we will have the maximum possible value for this m items. But if we miss some l-th < m, next right predictions will be rewarded less.

In [16]:
def average_precision(actual, recommended, k=30):
    ap_sum = 0
    hits = 0
    for i in range(k):
        product_id = recommended[i] if i < len(recommended) else None
        if product_id is not None and product_id in actual:
            hits += 1
            ap_sum += hits / (i + 1)
    return ap_sum / k


def normalized_average_precision(actual, recommended, k=30):
    actual = set(actual)
    if len(actual) == 0:
        return 0.0

    ap = average_precision(actual, recommended, k=k)
    ap_ideal = average_precision(actual, list(actual)[:k], k=k)
    return ap / ap_ideal

# Score

Okay, let's calculate the score and see if our solution worked.

In [18]:
data_for_score = pd.read_csv('data/dataset.csv')
data_for_score = data_for_score.sort_values(['timestamp'])

test_for_score = data_for_score[80000:]

In [20]:
scores = []
for user in tqdm(test_for_score['user_id'].unique()):
    actual = list(test_for_score[test_for_score['user_id'] == user]['item_id'])
    recommended = recommend(user, first_10)
    
    scores.append(normalized_average_precision(actual, recommended))

np.mean(scores)

100%|██████████| 301/301 [00:00<00:00, 1046.14it/s]


0.5061151854702257

We got 0.506 > 0.1. Awesome result, but i think we can upgrade the solution and make it better. Let's discuss it in next section.

# My suggestions for model improvement

Firstly, i should note that improvement is not only about getting high value of the metric. Our model could give high score on some data, but low score for other data. So we need to build _stable_ model.

1. SVD has some parametres that in theory can improve the result
    1.1 For this I can offer build a Grid Search for finding the best subset of the model selection.
    1.2 Learning rates for variables are also crucial in this algorthms and should be optimized
    
2. In my "draft" notebook i did excellent work on extracting information from timestamp variable. I am sure this information could be useful for more sophisticated algorithms (like stacking different models where we can use those features).