# Recommendations for MovieLens dataset

This tutorial shows how to train WSKNN model on MovieLens dataset. We are going to load data from a flat file, and then transform it to k-NN mappings of session-items and item-sessions.

*MovieLens dataset*

```
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets:
History and Context. ACM Transactions on Interactive Intelligent
Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages.
DOI=http://dx.doi.org/10.1145/2827872
```

*Data schema*

```
u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.
              Each user has rated at least 20 movies.  Users and items are
              numbered consecutively from 1.  The data is randomly
              ordered. This is a tab separated list of 
	         user id | item id | rating | timestamp. 
              The time stamps are unix seconds since 1/1/1970 UTC
```

In [5]:
import numpy as np
from wsknn import fit
from wsknn.preprocessing.parse_static import parse_flat_file

In [2]:
fpath = 'demo-data/movielens/ml-100k/u.data'
ds = parse_flat_file(fpath, sep='\t', session_index=0, product_index=1, action_index=2, time_index=3, time_to_numeric=True)

In [3]:
ds

(<wsknn.preprocessing.structure.item.Items at 0x7f9fe586b5e0>,
 <wsknn.preprocessing.structure.session.Sessions at 0x7f9fe586b400>)

In [4]:
model = fit(sessions=ds[1],
            items=ds[0],
            number_of_recommendations=5,
            number_of_neighbors=10,
            sampling_strategy='recent',
            sample_size=50,
            weighting_func='log',
            ranking_strategy='log',
            return_events_from_session=False,
            recommend_any=False)

In [6]:
def get_sample_sessions(set_of_sessions, n_sessions=100):
    sessions_keys = list(set_of_sessions.keys())
    key_sample = np.random.choice(sessions_keys, n_sessions)
    sampled = [set_of_sessions[dk] for dk in key_sample]
    return sampled

In [14]:
def get_movie_name(movie_id: str):
    with open('demo-data/movielens/ml-100k/u.item', 'r', encoding = "ISO-8859-1") as fin:
        for line in fin:
            splitted = line.split('|')
            if movie_id == splitted[0]:
                return splitted[1]

In [9]:
test_sessions = get_sample_sessions(set_of_sessions=ds[1].session_items_actions_map, n_sessions=5)

In [15]:
for ts in test_sessions:
    print('User watched')
    print(str([get_movie_name(x) for x in ts[0]]))
    print('Recommendations')
    recs = model.recommend(ts)
    for rec in recs:
        print('Item:', get_movie_name(rec[0]), '| weight:', rec[1])
    print('---')
    print('')
    

User watched
['Casablanca (1942)', 'Madness of King George, The (1994)', 'Graduate, The (1967)', 'Great Race, The (1965)', 'Nightmare Before Christmas, The (1993)', 'Brazil (1985)', "Singin' in the Rain (1952)", 'Babe (1995)', 'Toy Story (1995)', 'Sense and Sensibility (1995)', 'Return of Martin Guerre, The (Retour de Martin Guerre, Le) (1982)', 'Sex, Lies, and Videotape (1989)', 'Cool Hand Luke (1967)', 'Shawshank Redemption, The (1994)', 'Farewell My Concubine (1993)', 'Courage Under Fire (1996)', 'Braveheart (1995)', 'Princess Bride, The (1987)', 'Field of Dreams (1989)', 'Like Water For Chocolate (Como agua para chocolate) (1992)', 'African Queen, The (1951)', 'Apt Pupil (1998)', 'Leaving Las Vegas (1995)', 'Hudsucker Proxy, The (1994)', 'Glory (1989)', 'Pulp Fiction (1994)', 'Titanic (1997)', 'Return of the Jedi (1983)', 'Sting, The (1973)', 'Terminator 2: Judgment Day (1991)', 'Heidi Fleiss: Hollywood Madam (1995) ', 'Back to the Future (1985)', 'Amadeus (1984)', 'Jaws (1975)', '