# RQ3
- Main research question: _'Can content-based filtering using metadata enhance the current recommendation system of NPO Start which uses collaborative filtering?'_
- RQ3: _'Which metadata features improve the recommendation the most?'_

## Context 
NPO Start employs recommendations in a ribbon called 'Aanbevolen voor jou'. In such a ribbon several items that suit a user are shown which are based on the history of actions. 

The current recommendation system employs collaborative filtering on the series watched by an user. For each serie coincidental series that are watched together are matched, resulting in a list of (serie, serie) tuples that are frequently watched together. The top 100 of these frequent coincidences are then used to output a list for a single user.  

To improve this current recommendation system it is investigated if integrating a content-based filtering method into the current system will improve the performance. This combined system thus becomes a hybrid recommender system. This is done by using several content features for every item (or serie).

## Operationalization
#### Evaluation
To assess the quality of recommendations three metrics for evaluation were computed: precision@k (precision at k), AP@k (average precision at k) and CTR (click-through rate).  

* __Precision@k__ <br>
Precision@k is a metric that evaluates the proportion of top-k recommended items that are relevant to the user. Relevant items are denoted as a true positive (TP) which are positive predicted values. Precision is then given as the total number of predicted positives out of all predicted items.
\begin{equation}
\textit{Precision &#64; k} = \frac{TP}{k}
\end{equation}

* __AP@k__ <br>
AP@k evaluates the quality and rank of the recommended items. This metric is lower when positive predicted values do not appear at the top of the item list. It assesses the precision at each rank and multiplies it with the total amount of TPs.
\begin{equation}
AP at k = \frac{1}{TP}\sum_{i=1}^k\frac{\textit{TP seen}}{i}
\end{equation}

* __CTR__ <br>
The CTR measures the proportion of users who choose to click on the ribbon with recommended items, opposed to times the ribbon was offered. The equation of the click-through rate is shown in the equation below.
\begin{equation}
CTR = \frac{\textit{number of click-throughs}}{\textit{number of offers}}
\end{equation}


The higher the value of the metrics, the better. The version with the highest precision and CTR has the most success of recommending items that users are interested in, and the version with the highest AP@k is most successful in ranking the recommendations in a personalized manner. 

#### Data
The data for the recommendation systems consist out of interaction and item information. 

Interaction information consists out of NPO event data. This data consists out of offers and choices, and describes the recommended items, the rank of each item, the total amount of items in a row and which is chosen. 

Item information describes the metadta of each item which is provided by the Publieke Omroep Media Service (POMS) of the NPO. For each item nine variables are chosen: age rating, broadcaster, credits, description, genres, mid, series reference, subtitles and title. These content features are transformed into vectors and used for content-based recommendation.

#### Experimental setup
A hybrid recommendation system using the Python library LightFM was set up which is a hybrid matrix factorisation model which represents users and items as linear combinations of their content features’ latent factors. The model learns user and item representations from interaction data, by using the latent representation approach. It computes recommendations for new items and users, by representing items and users as linear combinations of their content features. 


First, offline experiments are performed that assess the precision of the hybrid recommendation system. This is done using historical choice event data from the month March. Different versions of the hybrid recommendation system using different content features are tested to assess the accuracy of predictions. 

For now the precision@k of four different versions have been tested: 
<table class="tg">
   <tr>
    <th class="tg-s268">Version</th>
    <th class="tg-s268">Content features</th>
  </tr>
  <tr>
    <td class="tg-s268">Version 1</td>
    <td class="tg-s268">None</td>
  </tr>
  <tr>
    <td class="tg-s268">Version 2</td>
    <td class="tg-s268">'broadcast'</td>
  </tr>
  <tr>
    <td class="tg-s268">Version 3</td>
    <td class="tg-s268">'title'</td>
  </tr>
  <tr>
    <td class="tg-s268">Version 4</td>
    <td class="tg-s268">'broadcast' + 'title</td>
  </tr>
</table>

These four versions were all trained using cross-validation on a train-test split of 95-5%. The hyperparameters of each version were then optimized using scikit-optimize. 

## Result

For now, each version was executed and the precision@k was evaluated for each one, with a k=5. The result of this is shown below.

<table class="tg">
  <tr>
    <th class="tg-0lax" rowspan="2">Version</th>
    <th class="tg-0lax" rowspan="2">Content features</th>
    <th class="tg-0lax" rowspan="2">Precision @k</th>
    <th class="tg-0lax" colspan="4">Hyperparameters</th>
  </tr>
  <tr>
    <td class="tg-0lax">Epochs</td>
    <td class="tg-0lax">Learning rate</td>
    <td class="tg-0lax">Number of components</td>
    <td class="tg-0lax">Alpha</td>
  </tr>
  <tr>
    <td class="tg-0lax">1</td>
    <td class="tg-0lax">none</td>
    <td class="tg-0lax">0,0818</td>
    <td class="tg-0lax">87</td>
    <td class="tg-0lax">0,0018</td>
    <td class="tg-0lax">148</td>
    <td class="tg-0lax">4,1657E-05</td>
  </tr>
  <tr>
    <td class="tg-0lax">2</td>
    <td class="tg-0lax">broadcaster</td>
    <td class="tg-0lax">0,0823</td>
    <td class="tg-0lax">6</td>
    <td class="tg-0lax">0,0151</td>
    <td class="tg-0lax">138</td>
    <td class="tg-0lax">2,7892E-05</td>
  </tr>
  <tr>
    <td class="tg-0lax">3</td>
    <td class="tg-0lax">title</td>
    <td class="tg-0lax">0,0810</td>
    <td class="tg-0lax">3</td>
    <td class="tg-0lax">0,0323</td>
    <td class="tg-0lax">160</td>
    <td class="tg-0lax">2,8477E-05</td>
  </tr>
  <tr>
    <td class="tg-0lax">4</td>
    <td class="tg-0lax">broadcaster + title</td>
    <td class="tg-0lax">0,0820</td>
    <td class="tg-0lax">89</td>
    <td class="tg-0lax">0,0022</td>
    <td class="tg-0lax">96</td>
    <td class="tg-0lax">4,4000E-04</td>
  </tr>
</table>

The results indicate that the highest precision was achieved by the hybrid recommendation model that was enhanced with the broadcaster feature. The lowest precision was gained by the title feature, which indicates that titles may not be an useful content feature. The recommendation model with no content features, so basically a matrix factorization model, got the second-lowest precision, which indicates that most other models which utilized content resulted in a higher precision.