# nbgallery python recommender demo

We'll use the [scikit surprise](https://surprise.readthedocs.io) library to get the top N notebook recommendations for each user.  This example is based on [this FAQ](https://surprise.readthedocs.io/en/stable/FAQ.html#how-to-get-the-top-n-recommendations-for-each-user), and [this article](https://towardsdatascience.com/how-to-build-a-memory-based-recommendation-system-using-python-surprise-55f3257b2cf4) is also a good overview.

In [None]:
from collections import defaultdict

import numpy as np
import pandas as pd
import surprise # pip install scikit-surprise

import nbgallery.database.dataframes as nbgdf

Surprise can build a dataset from a dataframe containing user, item, and rating columns.  Our `clicks_rollup_pivot` gives us a good start; we just need to add an implicit rating based on how much a user has used a notebook.  Let's just make something up for this example.

In [None]:
df = nbgdf.clicks_rollup_pivot()
df['rating'] = ((df['executed'] / df['executed'].max() * 2) + (df['viewed'] / df['viewed'].max() * 1)) / 3
df

Import the data into Surprise.  We'll use all the data for training and then build an "anti-testset" with all the missing ratings.  Note that the anti-testset could be very large, so this may not be feasible for large numbers of users and notebooks.

In [None]:
reader = surprise.Reader(rating_scale=(0.0, 1.0))
data = surprise.Dataset.load_from_df(df[['user_id', 'notebook_id', 'rating']], reader)
trainset = data.build_full_trainset()
testset = trainset.build_anti_testset()

Choose an algorithm and run the predictions.

In [None]:
algo = surprise.KNNBasic()
algo.fit(trainset)
predictions = algo.test(testset)

This function will return the top N predictions for each user.  This is copied from the [FAQ](https://surprise.readthedocs.io/en/stable/FAQ.html#how-to-get-the-top-n-recommendations-for-each-user).

In [None]:
def get_top_n(predictions, n=10):
    """Return the top-N recommendation for each user from a set of predictions.

    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.

    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    """

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n


Finally, print out the recommendations

In [None]:
top_n = get_top_n(predictions)

for user_id, user_ratings in top_n.items():
    print(f"user: {user_id}")
    for notebook_id, rating in user_ratings:
        print(f"    notebook: {notebook_id}, rating: {rating:.3f}")