# Recommendations for MovieLens dataset

This tutorial shows how to train WSKNN model on MovieLens dataset. We are going to load data from a flat file, and then transform it to k-NN mappings of session-items and item-sessions.

*MovieLens dataset*

```
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets:
History and Context. ACM Transactions on Interactive Intelligent
Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages.
DOI=http://dx.doi.org/10.1145/2827872
```

*Data schema*

```
u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.
              Each user has rated at least 20 movies.  Users and items are
              numbered consecutively from 1.  The data is randomly
              ordered. This is a tab separated list of 
	         user id | item id | rating | timestamp. 
              The time stamps are unix seconds since 1/1/1970 UTC
```

In [1]:
import numpy as np
from wsknn import fit
from wsknn.preprocessing.parse_static import parse_flat_file

In [2]:
fpath = 'demo-data/movielens/ml-100k/u.data'
ds = parse_flat_file(fpath, sep='\t', session_index=0, product_index=1, time_index=3, time_to_numeric=True)

In [4]:
model = fit(sessions=ds[1],
            items=ds[0],
            number_of_recommendations=5,
            number_of_neighbors=10,
            sampling_strategy='recent',
            sample_size=50,
            weighting_func='log',
            ranking_strategy='log',
            return_events_from_session=False,
            recommend_any=False)

In [5]:
def get_sample_sessions(set_of_sessions, n_sessions=100):
    sessions_keys = list(set_of_sessions.keys())
    key_sample = np.random.choice(sessions_keys, n_sessions)
    sampled = [set_of_sessions[dk] for dk in key_sample]
    return sampled

In [6]:
def get_movie_name(movie_id: str):
    with open('demo-data/movielens/ml-100k/u.item', 'r', encoding = "ISO-8859-1") as fin:
        for line in fin:
            splitted = line.split('|')
            if movie_id == splitted[0]:
                return splitted[1]

In [7]:
test_sessions = get_sample_sessions(set_of_sessions=ds[1].session_items_actions_map, n_sessions=5)

In [8]:
for ts in test_sessions:
    print('User watched')
    print(str([get_movie_name(x) for x in ts[0]]))
    print('Recommendations')
    recs = model.recommend(ts)
    for rec in recs:
        print('Item:', get_movie_name(rec[0]), '| weight:', rec[1])
    print('---')
    print('')
    

User watched
['Down Periscope (1996)', 'Smoke (1995)', 'Grifters, The (1990)', 'Strange Days (1995)', 'I Shot Andy Warhol (1996)', 'This Is Spinal Tap (1984)', 'Forrest Gump (1994)', "Things to Do in Denver when You're Dead (1995)", 'Mission: Impossible (1996)', 'Ransom (1996)', 'Streetcar Named Desire, A (1951)', 'Liar Liar (1997)', 'Crimson Tide (1995)', 'Legends of the Fall (1994)', 'Lion King, The (1994)', 'Emma (1996)', 'African Queen, The (1951)', 'Get Shorty (1995)', 'Cold Comfort Farm (1995)', 'Short Cuts (1993)', 'Nutty Professor, The (1996)', 'Taxi Driver (1976)', 'Austin Powers: International Man of Mystery (1997)', 'Mighty Aphrodite (1995)', 'Contact (1997)', 'Secrets & Lies (1996)', 'Shall We Dance? (1996)', 'Mother (1996)', 'Big Night (1996)', 'Little Buddha (1993)', 'Wizard of Oz, The (1939)', 'GoodFellas (1990)', 'Silence of the Lambs, The (1991)', 'Butch Cassidy and the Sundance Kid (1969)', 'Babe (1995)', 'Piano, The (1993)', 'Trainspotting (1996)', 'Lost Horizon (193