# Running recommender models

This notebook allows running individual recommender models in this project, as well as the hybrid recommender system.

Run all cells to see how the system works.

In [116]:
# Reload when changing modules
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [117]:
# Import and enable importing modules
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Global and local PATH variables to work for different environments
# Alternate this path to obtain the data in your local environment
PATH = os.getcwd()
DATA = '/data/'
filepath = PATH + DATA

# Allow wider print of pandas dataframe
pd.set_option('display.max_colwidth', None)

In [118]:
# Import modules and libraries
import pandas as pd
import numpy as np
import json

In this project the data has been preprocessed in different ways to improve the different methods. Three files was created form the original events dataset:

* events.csv
* articles.csv
* users.csv

An overview of these datasets is given below.

NB! If these are unavailable, you might have to run the appropriate preprocessing scripts to get the models to work.

In [123]:
# Method for loading original dataset
def load_active_data(path):
    """
        Load events from files and convert to dataframe.
    """
    map_lst=[]
    for f in os.listdir(path):
        file_name=os.path.join(path, f)
        if os.path.isfile(file_name):
            for line in open(file_name):
                obj = json.loads(line.strip())
                if obj is not None:
                    map_lst.append(obj)
    return pd.DataFrame(map_lst)

# Load data
try:
    users_df = pd.read_csv(filepath + 'users.csv')
    items_df = pd.read_csv(filepath + 'articles.csv')
except FileNotFoundError:
    print('Please make sure the data is in the correct path')
    print('Current path:', filepath)
    print('If you are missing the necessary data (users.csv and articles.csv), please visit the following link to download them:')
    print('https://drive.google.com/drive/folders/1osf88CZsjEeatSWAjds0xZShTG4HZNwC?usp=sharing')
    print('Please change the path to the data in your local environment')
    print('Exiting...')
    sys.exit()
events = load_active_data(PATH + '/active1000/')

In [126]:
# Import model pipelines
from recommenders.collaborative_filtering import cf_pipeline
from recommenders.content_based_articles import cb_articles_pipeline
from recommenders.content_based_kmeans import cb_kmeans_pipeline

In [125]:
# Seed to reproduce results
np.random.seed(42)

# Choose random user to recommend to
user_id = users_df.sample()['user_id'].values[0]
user_id

# Select number of articles to recommend to user
k = 10

In [122]:
# Run collaborative filtering model
cf_model, cf_rec, cf_test_mse, cf_train_mse = cf_pipeline(user_id, k, events)
cf_rec


Running collaborative filtering pipeline ...
Done!


Unnamed: 0,title,documentId
0,Mulig å bygge superbusstasjoner til en fjerdedel av prisen,cdfcf41324251f750be0db169cb5ae1b739a15bf
1,I morgen blir det kaffe og boller på stasjonen,b5805afe6f42ea5da09270c7184be7a7c4cd8c34
2,RBK-keeper aktuell for 1.-divisjonsklubb,e55e646ae3ba16dee0b2b721c03ba008ec0fd62d
3,Sju biler i trafikkulykke,e706c4c8ece6b57112deba9d160f21a5ee6719e4
4,Politianmelder trøndersk boligutbygger etter syv konkurser,ac728afb25512bfe85b58926a9c32659137f4fea
5,Bil fikk grillen smadret av Isklump på nesten 30 kilo,a4108a6448e235d895c62807f493d918a9b8125e
6,Nye Veier styrker staben,307ba9431026536f74cc929286af914f23e969da
7,Lastebil og personbil involvert i trafikkulykke i Meldal,e649420577596b19d15945c38921b47449520d9e
8,Bilfører fikk kniv i støtfanger på E6,967f174e48b18d7c7c346d7e9957858aa61dbb33
9,Håper på over 400 deltakere på isfiskefestival,2428f6b0b85f66aba29543310080c7e38fe7b928


In [127]:
# Run content-based article model
cba_model, cba_rec, cba_eval = cb_articles_pipeline(user_id, k, users_df, items_df, events)
cba_rec

Running content based articles recommender...


In [None]:
# Run content-based kmeans model
user_row = users_df[users_df['user_id'] == user_id]
cbk_model, cbk_rec, cbk_test_score, cbk_train_score = cb_kmeans_pipeline(user_row, k, users_df, items_df, events)

Running content-based K-means pipeline ...
Content-based KMeans recommendation:
Title | Document id
Da han kom tilbake fra alpinbakken så bilen slik ut | aa6a5862cb2ae9fb8996f35a692192559b9083e1
Slik slipper du å kjøpe DAB-radio | f1846d55be374246d0b9a76e0027936642ca3f1a
Skulle til Hitra, havnet på Frosta | fb8b9ca2ddee2ea8dd0d6a2145008dcff48d08fc
Hysterisk morsomme veiskilt i Trøndelag | 62134a1d8e747e1734f1573981ec161553e26d36
- Budskapet er at du kan møte opp og få deg et ligg | d333a6b9c64b858e0a1280dfebda505e409db1dd
John Kåre oppdaget DAB-smutthull i tv-kontakten | b6f08e0fe6567ed39d3b244f2afeceef43e0dbd3
Bakeri i Trondheim anmeldt etter aksjon av politiet og Arbeidstilsynet | 21124879767ab94be2415310b1e04c380051b59f
Idyllen ble brutt av en brysk kommuneansatt med markeringsbehov | 094e3ca8251f2a81626da7af88e25ef03ae7bd86
Julian (13) filmet da raset gikk | fa3f4c9983712f837925cbfe9bd096d09143a2ac
Hjelp oss å finne Midt-Norges dårligste veier | b9ad78ebb5acf15b97a2172aae104903e2a4

## Running Hybrid model

The hybrid model is an expert model where there is a voting mechanism to choose between which recommender to use. This voting mechanism uses the following rules:

* If the user is a new user without previous history or data, use the content-based kmeans model to predict
* If the user exists in the dataset with more than 
* If the user exists in the dataset with more