# RQ3
- Main research question: <br>
<center>_'Can a hybrid recommendation system using metadata perform better than the current recommendation system of NPO Start which uses collaborative filtering?'_</center>
- RQ3: _'Which metadata features improve the recommendation the most?'_

## Context 
NPO Start is a service that offers users the ability to watch video content on demand. This video content is displayed to users in so-called "ribbons" or rows that have a certain theme, like 'Populair', 'Nieuw' and 'Aanbevolen voor jou'. Each ribbon consists out of a ranked list of several items. 

Users of the service who have an account have the ability to receive several personalized ribbons which contain items that are recommended for a specific user. These recommendations are materialized on the front page of the service in the ribbon called 'Aanbevolen voor jou'. This ribbon utilizes the history of user interactions with items to perform collaborative filtering. These user interactions are grouped on series level and evaluated by pairs of series which are frequently watched together, or coincide often, with the history of the user. Of these coincidences, the top 100 pairs are extracted which are ordered based on their frequency, which results in a personalized ribbon of items for a single user. 

However, there is a lot of metadata available about the offered content which is unused by the current recommendation system. In this thesis, the metadata of broadcasts will be utilised in a hybrid recommendation system to determine if it can improve the performance of the current video recommendation system.

## Operationalization
#### Evaluation
To assess the quality of recommendations three metrics for evaluation were computed: precision@k (precision at k), AP (average precision) and CTR (click-through rate).  

* __Precision@k__
<br><font color='red'>(gemiddelde)</font><br>
Precision@k is a metric that evaluates the proportion of top-k recommended items that are relevant to the user. A relevant item is an item that was chosen by a user when it was offered in a ribbon. Relevant items are denoted as a true positive (TP) which are positive predicted values. Precision is then given as the total number of predicted positives out of all predicted items.
\begin{equation}
P@k = \frac{|\{i \in TP\mid i\mbox{ ranked in top k}\}|}{k}
\end{equation}

* __AP__
<br><font color='red'>(MAP)</font><br>
AP evaluates the quality and rank of the recommended items. This metric is lower when positive predicted values do not appear at the top of the item list. It assesses the precision at each rank and divides it by the total amount of TPs.
\begin{equation}
AP = \frac{1}{TP}\sum_{k=1}^m\frac{|\{i \in TP\mid i\mbox{ ranked in top k}\}|}{k}
\end{equation}

* __CTR__ <br>
The CTR measures the proportion of choices to click on an item of the recommended ribbon divided by the times the ribbon was offered. The equation of the click-through rate is shown in the equation below.
\begin{equation}
CTR = \frac{\textit{number of click-throughs}}{\textit{number of offers}}
\end{equation}


The higher the value of the metrics, the better. The version with the highest precision and CTR has the most success of recommending items that users are interested in, and the version with the highest AP is most successful in ranking the recommendations in a personalized manner. 

#### Data
The input data for the hybrid recommendation system consists of user interaction data and content features. 

The user interaction data consists of NPO event data, which has two types: offers and choices. Offers describe the ranked items that were shown in the 'Aanbevolen voor jou' ribbon to a user. Choices describe the chosen items out of offered ribbons and their rank. 

The content features consists of metadata of each item which is provided by the Publieke Omroep Media Service (POMS) of the NPO. 
Nine content features were chosen for the hybrid recommendation system, namely age rating, broadcaster, credits, description, genres, mid, series reference, subtitles and title. These content features are evaluated to determine which feature or combination of several features improves the recommendation system the most. These features are used for the content-based side of the recommendation by transforming them into vectors. An example of broadcaster and title information for a serie is shown below which results in the encoded vector {'title_bestezangers': 1, 'broadcaster_avtr': 1, 'title_debestezangersvannederland': 1, 'broadcaster_tros': 1}. 

<table>
  <tr>
    <th>mid</th>
    <th>feature</th>
    <th>value</th>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>broadcaster</td>
    <td>TROS</td>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>broadcaster</td>
    <td>AVTR</td>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>title</td>
    <td>bestezangers</td>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>title</td>
    <td>debestezangersvannederland</td>
  </tr>
</table>

#### Experimental setup
<font color='red'>Look at writing</font><br>
A hybrid recommendation system using the Python library LightFM was set up which is a hybrid matrix factorisation model which represents users and items as linear combinations of their content features’ latent factors. A weighted approximate-rank pairwise loss was used which learns to rank the items. The model learns user and item representations from interaction data, by using the latent representation approach. It computes recommendations for new items and users, by representing items and users as linear combinations of their content features. 

The steps taken in implementing the LightFM model are:
1. Loading and cleaning the user interaction data
    1. removing duplicates and thresholding (making sure users have liked at least 5 items, and items are liked by at least 5 users)
2. Preparing content feature vectors
    2. one-hot-encoding the content features
3. Transforming the user interaction data into an interactions matrix
4. Preparing a 5-fold cross-validation 
5. Executing the model
6. Optimizing the hyperparameters 
    6. using the library scikit-optimize on the parameters epoch, learning rate, number of components and alpha

First, offline experiments are performed that assess the precision of the hybrid recommendation system. This is done using historical choice event data from the month March. Different versions of the hybrid recommendation system using different content features are tested to assess the accuracy of predictions. 

<br><font color='red'> Add the size of the content vectors </font>

For now the precision@k of four different versions has been tested: 
<table class="tg">
   <tr>
    <th class="tg-s268">Version</th>
    <th class="tg-s268">Implementation</th>
  </tr>
  <tr>
    <td class="tg-s268">1</td>
    <td class="tg-s268">MF</td>
  </tr>
  <tr>
    <td class="tg-s268">2</td>
    <td class="tg-s268">MF + 'broadcast'</td>
  </tr>
  <tr>
    <td class="tg-s268">3</td>
    <td class="tg-s268">MF + 'title'</td>
  </tr>
  <tr>
    <td class="tg-s268">4</td>
    <td class="tg-s268">MF + 'broadcast'+ 'title</td>
  </tr>
</table>

## Result

<font color='red'>Look at writing</font><br>
For now, each version was executed on event data of seven days and the precision@k was evaluated for each one, with a k=5. The result of this is shown below.

<table>
  <tr>
    <th rowspan="2">Version</th>
    <th rowspan="2">Implementation</th>
    <th rowspan="2">Precision @k</th>
    <th rowspan="2">Standard deviation</th>
    <th colspan="5">Hyperparameters</th>
  </tr>
  <tr>
    <td>Epochs</td>
    <td>Learning rate</td>
    <td>Number of components</td>
    <td>Alpha</td>
    <td>Scaling</td>
  </tr>
  <tr>
    <td>1</td>
    <td>MF</td>
    <td>0,17</td>
    <td>0,05</td>
    <td>49</td>
    <td>0,02</td>
    <td>80</td>
    <td>7,7E-04</td>
    <td>x</td>
  </tr>
  <tr>
    <td>2</td>
    <td>MF + 'broadcaster'</td>
    <td>0,15</td>
    <td>0,04</td>
    <td>2</td>
    <td>0,04</td>
    <td>42</td>
    <td>1,9E-05</td>
    <td>0,27</td>
  </tr>
  <tr>
    <td>3</td>
    <td>MF + 'title'</td>
    <td>0,15</td>
    <td>0,05</td>
    <td>6</td>
    <td>0,01</td>
    <td>142</td>
    <td>4,61E-04</td>
    <td>0,70</td>
  </tr>
  <tr>
    <td>4</td>
    <td>MF + 'broadcaster' + 'title'</td>
    <td>0,14</td>
    <td>0,04</td>
    <td>7</td>
    <td>0,01</td>
    <td>135</td>
    <td>1,37E-04</td>
    <td>0,00</td>
  </tr>
</table>

The results indicate that the highest precision was achieved by the model that used no content features, so which only used matrix factorization. The lowest precision was gained when the content features broadcaster and title were incooporated into the hybrid recommendation model. Both the hybrid models that only had one content feature had a precision around the 0,15. All the versions have about the same standard deviation. This shows that incooporating content features into the model did not help the precision of recommendations. 

One possible explanation of the incooporation of content features into the matrix factorization algorithm is the bias of the choices event data that was supplied into the model. The data used for training the model is gained from the current recommendation system of NPO start which uses collaborative filtering and this is more keen to show items that are frequently watched together. Content features are said to recommend items that have fewer ratings and are less known, which in this case results in a worse precision since these diverse items were not shown to the user in the first place and could not have been chosen. 

The results of different pre-processing steps on the title is shown below. 
<table>
  <tr>
    <th rowspan="2">Version</th>
    <th rowspan="2">Content features</th>
    <th rowspan="2">Precision @k</th>
    <th rowspan="2">Standard deviation</th>
    <th colspan="5">Hyperparameters</th>
  </tr>
  <tr>
    <td>Epochs</td>
    <td>Learning rate</td>
    <td>Number of components</td>
    <td>Alpha</td>
    <td>Scaling</td>
  </tr>
  <tr>
    <td>3</td>
    <td>title</td>
    <td>0,15</td>
    <td>0,05</td>
    <td>6</td>
    <td>0,01</td>
    <td>142</td>
    <td>4,61E-04</td>
    <td>0,70</td>
  </tr>
  <tr>
    <td>3a</td>
    <td>title words</td>
    <td>0,14</td>
    <td>0,05</td>
    <td>2</td>
    <td>0,02</td>
    <td>21</td>
    <td>7,93</td>
    <td>0,05</td>
  </tr>
  <tr>
    <td>3b</td>
    <td>title words without stopwords</td>
    <td>0,15</td>
    <td>0,05</td>
    <td>3</td>
    <td>0,01</td>
    <td>25</td>
    <td>8,35E-05</td>
    <td>0,08</td>
  </tr>
</table>

## Further steps
<font color='red'>Look at writing</font><br>
- Evaluation
    - Add confidence interval
    - Add the MAP metric in the evaluation
- Data
    - Include event data of even more days
    - Get results of not only choice data but also include offer data
- Method
    - Include more content features (also use different text processing techniques)