# RQ3
- Main research question: _'Can a hybrid recommendation system using metadata perform better than the current recommendation system of NPO Start which uses collaborative filtering?'_
- RQ3: _'Which metadata features improve the recommendation the most?'_

## Context 
NPO Start employs recommendations in a ribbon called 'Aanbevolen voor jou'. In such a ribbon several items that suit a user are shown which are based on the history of actions. 

The current recommendation system employs collaborative filtering on the series watched by an user. For each serie coincidental series that are watched together are matched, resulting in a list of (serie, serie) tuples that are frequently watched together. The top 100 of these frequent coincidences are then used to output a list for a single user.  

To improve this current recommendation system it is investigated if implementing a hybrid recommendation system will improve the performance. This is done by using several content features for every item. 

## Operationalization
#### Evaluation
To assess the quality of recommendations three metrics for evaluation were computed: precision@k (precision at k), AP (average precision) and CTR (click-through rate).  

* __Precision@k__ (gemiddelde)<br>
Precision@k is a metric that evaluates the proportion of top-k recommended items that are relevant to the user. Relevant items are denoted as a true positive (TP) which are positive predicted values. Precision is then given as the total number of predicted positives out of all predicted items.
\begin{equation}
P@k = \frac{|\{i \in TP\mid i\mbox{ ranked in top k}\}|}{k}
\end{equation}

* __AP__ (MAP)<br>
AP evaluates the quality and rank of the recommended items. This metric is lower when positive predicted values do not appear at the top of the item list. It assesses the precision at each rank and divides it by the total amount of TPs.
\begin{equation}
AP = \frac{1}{TP}\sum_{k=1}^m\frac{|\{i \in TP\mid i\mbox{ ranked in top k}\}|}{k}
\end{equation}

* __CTR__ <br>
The CTR measures the proportion of users who choose to click on the ribbon with recommended items, opposed to times the ribbon was offered. The equation of the click-through rate is shown in the equation below.
\begin{equation}
CTR = \frac{\textit{number of click-throughs}}{\textit{number of offers}}
\end{equation}


The higher the value of the metrics, the better. The version with the highest precision and CTR has the most success of recommending items that users are interested in, and the version with the highest AP is most successful in ranking the recommendations in a personalized manner. 

#### Data
The data for the recommendation systems consist out of interaction and item information. 

Interaction information consists out of NPO event data. This data consists out of offers and choices, and describes the recommended items, the rank of each item, the total amount of items in a row and which is chosen. 

Item information describes the metadta of each item which is provided by the Publieke Omroep Media Service (POMS) of the NPO. For each item nine variables are chosen: age rating, broadcaster, credits, description, genres, mid, series reference, subtitles and title. These content features are transformed into vectors and used for content-based recommendation. A snippet of the item information is shown below which results in the encoded vector {'title_bestezangers': 1, 'broadcaster_avtr': 1, 'title_debestezangersvannederland': 1, 'broadcaster_tros': 1}. 

<table>
  <tr>
    <th>mid</th>
    <th>feature</th>
    <th>value</th>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>broadcaster</td>
    <td>TROS</td>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>broadcaster</td>
    <td>AVTR</td>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>title</td>
    <td>bestezangers</td>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>title</td>
    <td>debestezangersvannederland</td>
  </tr>
</table>

#### Experimental setup
A hybrid recommendation system using the Python library LightFM was set up which is a hybrid matrix factorisation model which represents users and items as linear combinations of their content features’ latent factors. A weighted approximate-rank pairwise loss was used which learns to rank the items. The model learns user and item representations from interaction data, by using the latent representation approach. It computes recommendations for new items and users, by representing items and users as linear combinations of their content features. 

The steps taken in implementing the LightFM model are:
1. Loading and cleaning data
    1. removing duplicates and thresholding (making sure users have liked at least 5 items, and items are liked by at least 5 users)
2. Preparing content feature vectors
    2. one-hot-encoding the content features
3. Transforming data into an interactions matrix
4. Preparing a 5-fold cross-validation 
5. Executing the model (performing training and testing)
6. Optimizing the hyperparameters 
    6. using the library scikit-optimize on the parameters epoch, learning rate, number of components and alpha

First, offline experiments are performed that assess the precision of the hybrid recommendation system. This is done using historical choice event data from the month March. Different versions of the hybrid recommendation system using different content features are tested to assess the accuracy of predictions. 

For now the precision@k of four different versions has been tested: 
<table class="tg">
   <tr>
    <th class="tg-s268">Version</th>
    <th class="tg-s268">Content features</th>
  </tr>
  <tr>
    <td class="tg-s268">Version 1</td>
    <td class="tg-s268">None</td>
  </tr>
  <tr>
    <td class="tg-s268">Version 2</td>
    <td class="tg-s268">'broadcast'</td>
  </tr>
  <tr>
    <td class="tg-s268">Version 3</td>
    <td class="tg-s268">'title'</td>
  </tr>
  <tr>
    <td class="tg-s268">Version 4</td>
    <td class="tg-s268">'broadcast' + 'title</td>
  </tr>
</table>

## Result

For now, each version was executed on event data of seven days and the precision@k was evaluated for each one, with a k=5. The result of this is shown below.

<table>
  <tr>
    <th rowspan="2">Version</th>
    <th rowspan="2">Content features</th>
    <th rowspan="2">Precision @k</th>
    <th rowspan="2">Standard deviation</th>
    <th colspan="5">Hyperparameters</th>
  </tr>
  <tr>
    <td>Epochs</td>
    <td>Learning rate</td>
    <td>Number of components</td>
    <td>Alpha</td>
    <td>Scaling</td>
  </tr>
  <tr>
    <td>1</td>
    <td>none</td>
    <td>0,17</td>
    <td>0,05</td>
    <td>49</td>
    <td>0,02</td>
    <td>80</td>
    <td>7,7E-04</td>
    <td>x</td>
  </tr>
  <tr>
    <td>2</td>
    <td>broadcaster</td>
    <td>0,15</td>
    <td>0,04</td>
    <td>2</td>
    <td>0,04</td>
    <td>42</td>
    <td>1,9E-05</td>
    <td>0,27</td>
  </tr>
  <tr>
    <td>3</td>
    <td>title</td>
    <td>0,15</td>
    <td>0,05</td>
    <td>6</td>
    <td>0,01</td>
    <td>142</td>
    <td>4,61E-04</td>
    <td>0,70</td>
  </tr>
  <tr>
    <td>4</td>
    <td>broadcaster + title</td>
    <td>0,14</td>
    <td>0,04</td>
    <td>7</td>
    <td>0,01</td>
    <td>135</td>
    <td>1,37E-04</td>
    <td>0,00</td>
  </tr>
</table>

The results indicate that the highest precision was achieved by the model that used no content features, so which only used matrix factorization. The lowest precision was gained when the content features broadcaster and title were incooporated into the hybrid recommendation model. Both the hybrid models that only had one content feature had a precision around the 0,15. All the versions have about the same standard deviation. This shows that incooporating content features into the model did not help the precision of recommendations. 

## Further steps
- Include event data of even more days
- Get results of not only choice data but also include offer data
- Include more content features (also use different text processing techniques)
- Evaluate on more of the evaluation metrics
- Look into other hybrid recommendation systems which utilize pyspark