# Running recommender models

This notebook allows running individual recommender models in this project, as well as the hybrid recommender system.

Run all cells to see how the system works.

In [1]:
# Reload when changing modules
%load_ext autoreload
%autoreload 2

In [2]:
# Import and enable importing modules
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Global and local PATH variables to work for different environments
# Alternate this path to obtain the data in your local environment
PATH = os.getcwd()
DATA = '/data/'
filepath = PATH + DATA

In [3]:
# Import modules and libraries
import pandas as pd
import numpy as np
import json

In this project the data has been preprocessed in different ways to improve the different methods. Three files was created form the original events dataset:

* events.csv
* articles.csv
* users.csv

An overview of these datasets is given below.

NB! If these are unavailable, you might have to run the appropriate preprocessing scripts to get the models to work.

In [4]:
# Method for loading original dataset
def load_active_data(path):
    """
        Load events from files and convert to dataframe.
    """
    map_lst=[]
    for f in os.listdir(path):
        file_name=os.path.join(path, f)
        if os.path.isfile(file_name):
            for line in open(file_name):
                obj = json.loads(line.strip())
                if obj is not None:
                    map_lst.append(obj)
    return pd.DataFrame(map_lst)

# Load data
users_df = pd.read_csv(filepath + 'users.csv')
items_df = pd.read_csv(filepath + 'articles.csv')
events = load_active_data(PATH + '/active1000/')

In [90]:
# Import model pipelines
from recommenders.collaborative_filtering import cf_pipeline
from recommenders.content_based_articles import cb_articles_pipeline
from recommenders.content_based_kmeans import cb_kmeans_pipeline

In [6]:
# Seed to reproduce results
np.random.seed(42)

# Choose random user to recommend to
user_id = users_df.sample()['user_id'].values[0]
user_id

# Select number of articles to recommend to user
k = 10

In [37]:
# Run collaborative filtering model
cf_rec, cf_test_mse = cf_pipeline(user_id, k, events)


Running collaborative filtering pipeline ...
Predictions from collaborative filtering:
Title | Document id
Iiiiiiiiiskaldt nyttårsbad | 2a00c43cf84f7c433431027845505a0fdc77a55d
Test av 16 «hjemmebakte» knekkebrød | 4367f42833890ae1e7b5fe4794e34989385e5c3b
Si sannheten: Dette handler ikke om gamle reiseregninger! | e834b3195717583b48b43207a7f775fd7e891f1d
Sliter du med elbilen i kulda? Her er ti tips. | 7784fe5dc3ebfbdd0ccab8cc0e424afa32df287f
Hjelp! Hvilken bil skal jeg velge? | 91f01709d1490371ed695032e5b96b501b0d3392
RBK-keeper aktuell for 1.-divisjonsklubb | e55e646ae3ba16dee0b2b721c03ba008ec0fd62d
Pallen glapp for nordmennene da Kraft vant igjen | 64e1972ab204fb83c41bd82794621084ce3747ba
- Det blir mye promp i år | ee79eb96a0f25ce70837e4aef3de242c60062839
Bil fikk grillen smadret av Isklump på nesten 30 kilo | a4108a6448e235d895c62807f493d918a9b8125e
Tidligere toppscorer ut mot kvaliteten i Eliteserien: – Jeg klarer ikke å se en spiller som skal score mange mål | 505c6ab9e135314d3f

In [66]:
# Run content-based article model
cba_rec, cba_eval = cb_articles_pipeline(user_id, k, users_df, items_df, events)

Running content based articles recommender...
Content based articles recommendation:
Title | Document id
- Trondheim som by kommer veldig godt ut av det | 282a2341f172ce655143446a1d9175d0cb950e4e
- Britannia skal ikke bli noe Rema-hotell | a3ddb6e8602543b89d28cce185cdc51bcc8e6b6b
Slik vil de endre Trondheim | e6304c6d56145afe17d11d69ef300121443c210b
Eldre føler seg utrygge i sykehjem | fee2c74bad36aacda7eeb6e6d7c0c5a08d670c9d
Trondheim i 1908 | 50cdfe236af2680f8cb4284c5d6a917ed6b7fb63
Åpenhjertig Northug om triumfene og nedturene: - Jeg var ikke sulten nok og rotet bort et gull | 98c45f91400519953369d9e67a743d36195bfc40
Åpenhjertig Northug om triumfene og nedturene: - Jeg var ikke sulten nok og rotet bort et gull | 459ae7920cecb0d957af9e9c743907dd55122d57
En kopp kaffe, to knekkebrød og en flaske urin | dcbf6920ea2657c3361b513f27412c750ba93ea7
- Det var dette eller ingenting for oss | 47da3ec4aadc858c7209a6f88d1ad8e62311fdc3
- Det var dette eller ingenting for oss | 5e3619f3d7f08180a4e

In [91]:
# Run content-based kmeans model
user_row = users_df[users_df['user_id'] == user_id]
cbk_rec, cbk_test_mse = cb_kmeans_pipeline(user_row, k, users_df, items_df, events)

Running content-based K-means pipeline ...
Content-based KMeans recommendation:
Title | Document id
Da han kom tilbake fra alpinbakken så bilen slik ut | aa6a5862cb2ae9fb8996f35a692192559b9083e1
Slik slipper du å kjøpe DAB-radio | f1846d55be374246d0b9a76e0027936642ca3f1a
Skulle til Hitra, havnet på Frosta | fb8b9ca2ddee2ea8dd0d6a2145008dcff48d08fc
Hysterisk morsomme veiskilt i Trøndelag | 62134a1d8e747e1734f1573981ec161553e26d36
- Budskapet er at du kan møte opp og få deg et ligg | d333a6b9c64b858e0a1280dfebda505e409db1dd
John Kåre oppdaget DAB-smutthull i tv-kontakten | b6f08e0fe6567ed39d3b244f2afeceef43e0dbd3
Bakeri i Trondheim anmeldt etter aksjon av politiet og Arbeidstilsynet | 21124879767ab94be2415310b1e04c380051b59f
Idyllen ble brutt av en brysk kommuneansatt med markeringsbehov | 094e3ca8251f2a81626da7af88e25ef03ae7bd86
Julian (13) filmet da raset gikk | fa3f4c9983712f837925cbfe9bd096d09143a2ac
Hjelp oss å finne Midt-Norges dårligste veier | b9ad78ebb5acf15b97a2172aae104903e2a4