In [None]:
from __future__ import division
import pandas as pd
import numpy as np

from polara.tools.movielens import get_movielens_data
from polara.recommender.data import RecommenderData
from polara.recommender.models import NonPersonalized, SVDModel
from collections import namedtuple

# Preparing data

##### get test users data

In [None]:
test_data = pd.read_csv("https://github.com/Evfro/RecSys_ISP2017/raw/master/test_data_new.gz", compression='gzip')
test_data.head()

This data is not a part of the `Movielens-1M` dataset, however, it contains ratings for the same movies. You are expected to use this ataset to generate recommendations with your recommendation model.

##### get movielens data

`Movielens-1M` dataset to train your model.

In [None]:
ml_data = get_movielens_data()

As previously, you need to convert it into appropriate format:

In [None]:
data_model = RecommenderData(ml_data, 'userid', 'movieid', 'rating')

## Important:

As you'll use custom test data,  the extra step should be taken in order to prepare the data model. You only have to do it once!

In [None]:
data_model._training = data_model._data #set training data to full movielens dataset
data_model._test = test_data.copy() # setting custom test data

You also have to remove gaps in user and movie indices. Typicall this is automatically done by the method `_prepare()`, but it is not applicable in your custom settings (as this method does much more processing on data).

However, this step is still relatively easy with `_reindex_data()` method. It will not only build new index, but also will save index mapping in special attribute `index.itemid`:

In [None]:
data_model._reindex_data() # build new index of users and movies with no gaps and stores it in index.itemid attribute 
data_model._test['movieid'] = data_model._test['movieid'].map(data_model.index.itemid.set_index('old').new)

The last step is to "emulate" the splitting of the test data into observed data and holdout:

In [None]:
data_model._test = namedtuple('TestData', 'testset evalset')._make([data_model._test, None])

# Building your model

In [None]:
svd = SVDModel(data_model)

In [None]:
svd.build()

In [None]:
recs = svd.get_recommendations()
recs.shape

# Submitting your solution

Before submitting you have to "reverse" movies index back to original values. It can be done in one line:

In [None]:
recs = pd.Series(recs.ravel()).map(data_model.index.itemid.set_index('new').old).values.reshape(recs.shape)

Save you model and submit results. Note, that both upload address and the leaderbord itself have new location:

In [None]:
np.savez('svd_baseline', recs=recs)

In [None]:
import requests

files = {'upload': open('svd_baseline.npz','rb')}
url = "http://isp2017.azurewebsites.net/team/upload"

r = requests.post(url, files=files)

In [None]:
print r.status_code, r.reason

##### Viewing results:

http://isp2017.azurewebsites.net/team/leaderboard