## Random model tutorial


In this notebook, we present a more verbose version of the standard submission.py script, with the aim of explaining in detail how the main abstractions work and showing how easy it is to partecipate in the challenge. 

_NOTE_: this notebook is meant as a coding guide to the evaluation script, and a walk-through baseline submission to explain how to partecipate in the challenge. While you're free to experiment with this or other notebooks and even submit to the leaderboard from here, the _final_ submission should comply with the template scripts, as explained in the README.

Please contact the organizers on Slack should you have any doubt.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# check we are using the right interpreter with the right RecList version
!which python
!pip install -r ../requirements.txt

In [None]:
import os
import sys
sys.path.insert(0, '../')

_Basic imports, read the credentials from the env file_

In [None]:
import numpy as np
import pandas as pd
from dotenv import load_dotenv

load_dotenv('../upload.env')

EMAIL = os.getenv('EMAIL')  # the e-mail you used to sign up
assert EMAIL != '' and EMAIL is not None
BUCKET_NAME = os.getenv('BUCKET_NAME') # you received it in your e-mail
PARTICIPANT_ID = os.getenv('PARTICIPANT_ID') # you received it in your e-mail
AWS_ACCESS_KEY = os.getenv('AWS_ACCESS_KEY') # you received it in your e-mail
AWS_SECRET_KEY = os.getenv('AWS_SECRET_KEY') # you received it in your e-mail

_Specify some other global variables to improve local iteration and debugging, for example setting a LIMIT to work with a smaller, faster test set_

In [None]:
LIMIT = 1000

_NOTE: as long as there is a limit specified, the runner won't upload results: make sure to have LIMIT=0 when you want to submit to the leaderboard!_

In [None]:
from evaluation.EvalRSRunner import EvalRSRunner
from evaluation.EvalRSRunner import ChallengeDataset
from reclist.abstractions import RecModel

_Declare our model, in this case, a random generator: any model needs to include an implementation of "train" "predict", taking user IDs as input and returning a DataFrame with predictions as output._

In [None]:
class MyModel(RecModel):
    
    def __init__(self, items: pd.DataFrame, top_k: int=100, **kwargs):
        super(MyModel, self).__init__()
        self.items = items
        self.top_k = top_k
        # kwargs may contain additional arguments in case, for example, you
        # have data augmentation strategies
        print("Received additional arguments: {}".format(kwargs))
        return

    def train(self, train_df: pd.DataFrame):
        """
        Implement here your training logic. Since our example method is a simple random model,
        we actually don't use any training data to build the model, but you should ;-)

        At the end of training, make sure the class contains a trained model you can use in the predict method.
        """
        print(train_df.head(1))
        print("Training completed!")
        return 

    def predict(self, user_ids: pd.DataFrame) -> pd.DataFrame:
        """
        
        This function takes as input all the users that we want to predict the top-k items for, and 
        returns all the predicted songs.

        While in this example is just a random generator, the same logic in your implementation 
        would allow for batch predictions of all the target data points.
        
        """
        k = self.top_k
        num_users = len(user_ids)
        pred = self.items.sample(n=k*num_users, replace=True).index.values
        pred = pred.reshape(num_users, k)
        pred = np.concatenate((user_ids[['user_id']].values, pred), axis=1)
        return pd.DataFrame(pred, columns=['user_id', *[str(i) for i in range(k)]]).set_index('user_id')

_Get the dataset and inspect the basic entities: tracks, users, and the interaction dataset_

In [None]:
dataset = ChallengeDataset(force_download=False)  # note, if YES, the dataset will be donwloaded again

In [None]:
dataset.df_tracks.head()

In [None]:
dataset.df_users.head()

In [None]:
train, test = dataset.get_sample_train_test()

In [None]:
print(train.shape, test.shape)

In [None]:
train.head(5)

_When we are happy with our model class, we can instantiate it and then initialize the runner with our credentials_

In [None]:
my_model = MyModel(
    items=dataset.df_tracks,
    # kwargs may contain additional arguments that you wish to use
    my_custom_argument='my_custom_argument' 
)

In [None]:
runner = EvalRSRunner(
    dataset=dataset,
    aws_access_key_id=AWS_ACCESS_KEY,
    aws_secret_access_key=AWS_SECRET_KEY,
    participant_id=PARTICIPANT_ID,
    bucket_name=BUCKET_NAME,
    email=EMAIL
    )

_Finally, we run the evaluation code: remember, if LIMIT is not 0, your submission won't be uploaded but the loop may still be useful for you to debug / iterate locally_

In [None]:
runner.evaluate(model=my_model, limit=LIMIT)

### Customizing RecList for your submission

A huge motivation behind the Challenge is building as a community shareable insights in the form of working tests for our use case.

While your leaderboard score is ONLY influenced by the official tests as stated in the evaluation README, we ask that your final submission must also include custom tests that you found helpful / insightful when improving your model.

The snippet below shows a working example of how to _extend_ the default RecList with additional tests, and run the same evaluation code.

In [None]:
from reclist.abstractions import rec_test
from evaluation.EvalRSRecList import EvalRSRecList

class myRecList(EvalRSRecList):
    
    @rec_test(test_type='custom_test')
    def lucky_user_test(self):
        """
        Custom test, returning my lucky user from the catalog
        """
        from random import choice

        return {
          "luck_user": str(choice(self._x_test['user_id'].unique())) 
        }


_Re-run the evaluation with the additional test, which gets executed together with the default ones that produce the leaderboard score._

In [None]:
runner.evaluate(
    model=my_model, 
    limit=1,
    custom_RecList=myRecList
)

### Final submission to the committee

Since this is a code competition, you'll be required to submit your repository for statistical verification of your scores. 

Please consult the README carefully to make sure your project complies with the rules and follows the provided template script.