# Recommender System

For our Recommender System we have write more function stored in the file reccomander_system.py. <br>
We have chosen to implement more scoring metrics in order to evaluate the differences between them. <br>
The logic of this functions is reported below:
<br>

* **init(self, preferencies, categories, s23_file)** : This function is used to instantiate an reccomender system object.

* **cosine_similarity(self, v, w):**: This function define one of our metrics of scoring. Cosine similarity assign to each user's wikipedia page a score based on his preferences. 

* **pearson_correlation(self, v, w):**: This function define one of our metrics of scoring. Pearson correlation computes the linear correlation between two variables that, in out case, are two users preferencies vector. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.

* **eucliedean_distance(self, v, w):**: This function define one of our metrics of scoring. Euclidean distance or Euclidean metric compute the straight-line distance between two points in Euclidean space.

* **most_relevant_pages(self, user_id, metrics):** This function take in input a user_id and return the 6 page ordered by score computed by one of three metrics available.

In [1]:
from recommender_system import RecommenderSystem
import pickle
from itertools import islice

In the pickle categories we have stored the mapping between one category and its ID. 

In [3]:
categories = pickle.load(open('DATA/categories.pkl', 'rb'))
first20pairs = {k: categories[k] for k in list(categories)[:20]}
first20pairs

{'1981 births': 0,
 '1987 births': 10,
 '20th-century Japanese actresses': 3,
 '21st-century Japanese actresses': 4,
 'Actresses from Kanagawa Prefecture': 2,
 'Ambassadors of supra-national bodies': 11,
 'Australian Open (tennis) champions': 12,
 'Expatriate sportspeople in the United States': 13,
 'French Open champions': 14,
 "Grand Slam (tennis) champions in women's singles": 15,
 'Guggenheim Fellows': 8,
 'Japanese film actresses': 5,
 'Japanese stage actresses': 6,
 'Japanese television actresses': 7,
 'Living people': 1,
 'Maria Sharapova': 9,
 'Olympic medalists in tennis': 16,
 'Olympic silver medalists for Russia': 17,
 'Olympic tennis players of Russia': 18,
 'Sportspeople from Bradenton, Florida': 19}

In the pickle preferencies we assign to each user a one hot vector that contains 1 in the positions of the page that he likes, 0 otherwise. <br>
On this vector is computed the cosine similarity.

In [4]:
preferencies = pickle.load(open('DATA/preferencies.pkl', 'rb'))
first20pairs = {k: preferencies[k] for k in list(preferencies)[:20]}
first20pairs

{'100618369': array([1, 1, 0, ..., 0, 0, 0], dtype=int32),
 '101684764': array([1, 1, 0, ..., 0, 0, 0], dtype=int32),
 '1018670268': array([1, 1, 0, ..., 0, 0, 0], dtype=int32),
 '101935414': array([1, 1, 1, ..., 0, 0, 0], dtype=int32),
 '101948722': array([1, 1, 0, ..., 0, 0, 0], dtype=int32),
 '1025469163': array([1, 1, 0, ..., 0, 0, 0], dtype=int32),
 '103232008': array([0, 1, 0, ..., 0, 0, 0], dtype=int32),
 '103461258': array([0, 1, 0, ..., 0, 0, 0], dtype=int32),
 '1040719962': array([0, 1, 0, ..., 0, 0, 0], dtype=int32),
 '104239528': array([1, 1, 0, ..., 0, 0, 0], dtype=int32),
 '104265805': array([0, 1, 0, ..., 0, 0, 0], dtype=int32),
 '104431004': array([1, 1, 0, ..., 0, 0, 0], dtype=int32),
 '105143827': array([0, 1, 0, ..., 0, 0, 0], dtype=int32),
 '105150125': array([0, 1, 0, ..., 0, 0, 0], dtype=int32),
 '105320169': array([0, 1, 0, ..., 0, 0, 0], dtype=int32),
 '105553307': array([1, 1, 0, ..., 0, 0, 0], dtype=int32),
 '105567943': array([1, 1, 0, ..., 0, 0, 0], dtype=in

Then, we can instantiate a Recommender System object, and we can use the function **most_relevant_pages** in order to retrieve the sorted ranking of the 6 page proposed to each user in the file S23.tsv.

In [5]:
rs = RecommenderSystem(preferencies, categories, open('DATA/S23.tsv', 'r'))

Below some example of our recommender system that suggest at each user the 3 pages with scores of all matrics based on his interest:

In [6]:
rs.most_relevant_pages('101684764', metrics = 'all')

Top 3 (cosine similarity):  ['Heather_Headley', 'Jennifer_Lopez', 'Ryan_Seacrest']
Top 3 (euclidean distance):  ['Jennifer_Lopez', 'Heather_Headley', 'Maxene_Magalona']
Top 3 (pearson correlation):  ['Heather_Headley', 'Jennifer_Lopez', 'Ryan_Seacrest']


But if we want we can check all the scores for each metrics:

In [10]:
rs.most_relevant_pages('101684764', metrics = 'cosine')

[(0.12893946815413682, 'Heather_Headley'),
 (0.11877671491968554, 'Jennifer_Lopez'),
 (0.11310517053068245, 'Ryan_Seacrest'),
 (0.10735733858321811, 'Maxene_Magalona'),
 (0.09552948218914535, 'Julian_Assange'),
 (0.09273288940232355, 'ITV_News_Anglia')]

In [11]:
rs.most_relevant_pages('101684764', metrics = 'pearson')

[(0.12071388481136608, 'Heather_Headley'),
 (0.1143576980106862, 'Jennifer_Lopez'),
 (0.10512785429755785, 'Ryan_Seacrest'),
 (0.09974802024396087, 'Maxene_Magalona'),
 (0.0884025629236834, 'Julian_Assange'),
 (0.08525707535796076, 'ITV_News_Anglia')]

In [12]:
rs.most_relevant_pages('101684764', metrics = 'euclidean')

[(48.33218389437829, 'Jennifer_Lopez'),
 (48.569537778323564, 'Heather_Headley'),
 (48.67237409455183, 'Maxene_Magalona'),
 (48.67237409455183, 'Ryan_Seacrest'),
 (48.703182647543684, 'Julian_Assange'),
 (48.76474136094644, 'ITV_News_Anglia')]

Below there are some examples on other users of how our recommender system works:

In [13]:
rs.most_relevant_pages('104431004', metrics = 'all')

Top 3 (cosine similarity):  ['The_Beatles', 'Miki_Nadal', 'La_Terra']
Top 3 (euclidean distance):  ['Berto_Romero', 'Damien_Rice', 'Chris_Boswell']
Top 3 (pearson correlation):  ['The_Beatles', 'Miki_Nadal', 'La_Terra']


In [14]:
rs.most_relevant_pages('10569722', metrics = 'all')

Top 3 (cosine similarity):  ['Travis_Frederick', 'James_Coe', 'Tom_Upton']
Top 3 (euclidean distance):  ['Peter_Somerville', 'John_Lavan', 'Tim_Pawlenty']
Top 3 (pearson correlation):  ['Travis_Frederick', 'James_Coe', 'Tom_Upton']


In [15]:
rs.most_relevant_pages('104239528', metrics = 'all')

Top 3 (cosine similarity):  ['Sabina_Guzzanti', 'Luciano_Ligabue', 'Ivan_Basso']
Top 3 (euclidean distance):  ['Luciano_Ligabue', 'Peter_Diamandis', 'Lovato']
Top 3 (pearson correlation):  ['Sabina_Guzzanti', 'Luciano_Ligabue', 'Ivan_Basso']
