# RQ3
- Main research question: _'Can a hybrid recommendation system using metadata perform better than the current recommendation system of NPO Start which uses collaborative filtering?'_
- RQ3: _'Which metadata features improve the recommendation the most?'_

## Context 
NPO Start is a service that offers users the ability to watch video content on demand. This video content is displayed to users in so-called "ribbons" or rows that have a certain theme, like 'Populair', 'Nieuw' and 'Aanbevolen voor jou'. Each ribbon consists out of a ranked list of several items. 

Users of the service who have an account have the ability to receive several personalized ribbons which contain items that are recommended for a specific user. These recommendations are materialized on the front page of the service in the ribbon called 'Aanbevolen voor jou'. This ribbon utilizes the history of user interactions with items to perform collaborative filtering. These user interactions are grouped on series level and evaluated by pairs of series which are frequently watched together, or coincide often, with the history of the user. Of these coincidences, the top 100 pairs are extracted which are ordered based on their frequency, which results in a personalized ribbon of items for a single user. 

However, there is a lot of metadata available about the offered content which is unused by the current recommendation system. In this thesis, the metadata of broadcasts will be utilised in a hybrid recommendation system to determine if it can improve the performance of the current video recommendation system.

## Operationalization
#### Evaluation
To assess the quality of recommendations three metrics are evaluated: precision@k (precision at k), AP (average precision) and CTR (click-through rate).  

* __Precision@k__
Precision@k is a metric that evaluates the proportion of top-k recommended items that are relevant to the user. A relevant item is an item that was chosen by a user when it was offered in a ribbon. Relevant items are denoted as a true positive (TP) which are positive predicted values. Precision is then given as the total number of predicted positives out of all predicted items.
\begin{equation}
P@k = \frac{|\{i \in TP\mid i\mbox{ ranked in top k}\}|}{k}
\end{equation}
The precision@k over all users is given by the mean precision, where for each recommendation the corresponding precision@k is calculated and the mean is taken out of all the scores. 

* __AP__
AP evaluates the quality and rank of the recommended items. This metric is lower when positive predicted values do not appear at the top of the item list. It assesses the precision at each rank and divides it by the total amount of TPs.
\begin{equation}
AP = \frac{1}{TP}\sum_{k=1}^m\frac{|\{i \in TP\mid i\mbox{ ranked in top k}\}|}{k}
\end{equation}
The AP over all the users is given by the MAP (mean average precision), where for each recommendation the corresponding AP is calculated and the mean is taken out of all the scores. This metric then shows how good the model is at performing the recommendations. R is the number of recommendations in the set and AP(r) is the average precision for a given recommendation, r. 
Q is the number of queries in the set and AveP(q) is the average precision (AP) for a given query, q.
\begin{equation}
MAP = \frac{\sum_{r=1}^R AP(r)}{R}
\end{equation}

* __CTR__ <br>
The CTR measures the proportion of choices to click on an item of the recommended ribbon divided by the times the ribbon was offered. The equation of the click-through rate is shown in the equation below.
\begin{equation}
CTR = \frac{\textit{number of click-throughs}}{\textit{number of offers}}
\end{equation}


The higher the value of the metrics, the better. The version with the highest mean precision and CTR has the most success of recommending items that users are interested in, and the version with the highest MAP is most successful in ranking the recommendations in a personalized manner. 

#### Data
The input data for the hybrid recommendation system consists of user interaction data and content features. 

The user interaction data consists of NPO event data, which has two types: offers and choices. Offers describe the ranked items that were shown in the 'Aanbevolen voor jou' ribbon to a user. Choices describe the chosen items out of offered ribbons and their rank. 

The content features consists of metadata of each item which is provided by the Publieke Omroep Media Service (POMS) of the NPO. 
Nine content features were chosen for the hybrid recommendation system, namely age rating, broadcaster, credits, description, genres, mid, series reference, subtitles and title. These content features are evaluated to determine which feature or combination of several features improves the recommendation system the most. These features are used for the content-based side of the recommendation by transforming them into vectors. An example of broadcaster and title information for a serie is shown below which results in the encoded vector {'title_bestezangers': 1, 'broadcaster_avtr': 1, 'title_debestezangersvannederland': 1, 'broadcaster_tros': 1}. 

<table>
  <tr>
    <th>mid</th>
    <th>feature</th>
    <th>value</th>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>broadcaster</td>
    <td>TROS</td>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>broadcaster</td>
    <td>AVTR</td>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>title</td>
    <td>bestezangers</td>
  </tr>
  <tr>
    <td>AT_2033328</td>
    <td>title</td>
    <td>debestezangersvannederland</td>
  </tr>
</table>

#### Experimental setup
A hybrid recommendation system using the Python library LightFM was set up which is a hybrid matrix factorisation model which represents users and items as linear combinations of their content features’ latent factors. A weighted approximate-rank pairwise loss was used which learns to rank the items. The model learns user and item representations from the input data, by using the latent representation approach. It computes recommendations for new items and users, by representing items and users as linear combinations of their content features. 

The steps taken in implementing the LightFM model are:
1. Loading and cleaning the user interaction data
    1. removing duplicates and thresholding (making sure users have liked at least 5 items, and items are liked by at least 5 users)
2. Preparing content feature vectors
    2. one-hot-encoding the content features
3. Transforming the user interaction data into an interactions matrix
4. Preparing a 5-fold cross-validation 
5. Executing the model
6. Optimizing the hyperparameters 
    6. using the library scikit-optimize on the parameters epoch, learning rate, number of components and alpha

The performance of this model is assessed in an offline setting on historical user interaction data from the month March. Different versions of the model that incoorporate different content features are tested to assess the precision of recommendations. For now the precision@k of six different versions has been tested, which consists out of a baseline model that only employs MF, and six others that employ either one content feature that has or hasn't been pre-processed or several content features. The different versions and the accompanied length of the encoded content vector for each user is shown below. 
<table>
  <tr>
    <th>Version</th>
    <th>Content features</th>
    <th>Encoded vector length</th>
  </tr>
  <tr>
    <td>1</td>
    <td>none</td>
    <td>x</td>
  </tr>
  <tr>
    <td>2</td>
    <td>broadcaster</td>
    <td>28</td>
  </tr>
  <tr>
    <td>3</td>
    <td>title</td>
    <td>1244</td>
  </tr>
  <tr>
    <td>3a</td>
    <td>title words</td>
    <td>1802</td>
  </tr>
  <tr>
    <td>3b</td>
    <td>title words without stopwords</td>
    <td>1699</td>
  </tr>
  <tr>
    <td>4</td>
    <td>broadcaster + title</td>
    <td>1272</td>
  </tr>
</table>

## Result

Each version described above has been executed on user interaction data of the period March 1-7. The mean precision@k (k=5) has been evaluated for each one, and the results accompanied by the standard deviation and optimized hyperparameters are shown below. 

<table>
  <tr>
    <th rowspan="2">Version</th>
    <th rowspan="2">Implementation</th>
    <th rowspan="2">Precision @k</th>
    <th rowspan="2">Standard deviation</th>
    <th colspan="5">Hyperparameters</th>
  </tr>
  <tr>
    <td>Epochs</td>
    <td>Learning rate</td>
    <td>Number of components</td>
    <td>Alpha</td>
    <td>Scaling</td>
  </tr>
  <tr>
    <td>1</td>
    <td>MF</td>
    <td>0,17</td>
    <td>0,05</td>
    <td>49</td>
    <td>0,02</td>
    <td>80</td>
    <td>7,7E-04</td>
    <td>x</td>
  </tr>
  <tr>
    <td>2</td>
    <td>MF + 'broadcaster'</td>
    <td>0,15</td>
    <td>0,04</td>
    <td>2</td>
    <td>0,04</td>
    <td>42</td>
    <td>1,9E-05</td>
    <td>0,27</td>
  </tr>
  <tr>
    <td>3</td>
    <td>MF + 'title'</td>
    <td>0,15</td>
    <td>0,05</td>
    <td>6</td>
    <td>0,01</td>
    <td>142</td>
    <td>4,61E-04</td>
    <td>0,70</td>
  </tr>
  <tr>
    <td>4</td>
    <td>MF + 'broadcaster' + 'title'</td>
    <td>0,14</td>
    <td>0,04</td>
    <td>7</td>
    <td>0,01</td>
    <td>135</td>
    <td>1,37E-04</td>
    <td>0,00</td>
  </tr>
</table>

The results indicate that the highest precision was achieved by the model that used no content features, so which only used matrix factorization. The lowest precision was gained when the content features broadcaster and title were incoorporated into the hybrid recommendation model. Both the hybrid models that only had one content feature had a precision around the 0,15 and all the versions have about the same standard deviation. This shows that incooporating content features into the model did not help the precision of recommendations. 

One possible explanation for the lower achieved precision for the models with content features is the drawback of offline experiments. Offline experiments assume that members would have behaved the same, e.g. playing the same videos, if the new model being evaluated was used to generate the recommendations (Gomez, 2015). Thus the new models that produce different recommendations from the current NPO recommendation system are unlikely to find that their recommendations are chosen more than the actual offered recommendations. Since the user interaction data is biased towards the current recommendation system, it may affect the produced precision in an offline environment. 

The results of different pre-processing steps on the title is shown below. 
<table>
  <tr>
    <th rowspan="2">Version</th>
    <th rowspan="2">Content features</th>
    <th rowspan="2">Precision @k</th>
    <th rowspan="2">Standard deviation</th>
    <th colspan="5">Hyperparameters</th>
  </tr>
  <tr>
    <td>Epochs</td>
    <td>Learning rate</td>
    <td>Number of components</td>
    <td>Alpha</td>
    <td>Scaling</td>
  </tr>
  <tr>
    <td>3</td>
    <td>title</td>
    <td>0,15</td>
    <td>0,05</td>
    <td>6</td>
    <td>0,01</td>
    <td>142</td>
    <td>4,61E-04</td>
    <td>0,70</td>
  </tr>
  <tr>
    <td>3a</td>
    <td>title words</td>
    <td>0,14</td>
    <td>0,05</td>
    <td>2</td>
    <td>0,02</td>
    <td>21</td>
    <td>7,93</td>
    <td>0,05</td>
  </tr>
  <tr>
    <td>3b</td>
    <td>title words without stopwords</td>
    <td>0,15</td>
    <td>0,05</td>
    <td>3</td>
    <td>0,01</td>
    <td>25</td>
    <td>8,35E-05</td>
    <td>0,08</td>
  </tr>
</table>

The results indicate that there is not much difference in precision between the different versions, however the hyperparameters do differ more between the pre-processed titles and untouched titles.

## Further steps
- Data
    - Include event data of even more days (21 days)
    - Get results of not only choice data but also include offer data
- Method
    - Include more content features 
    - Perform different text processing techniques (df, tf-idf, stemming, embedding)
- Results
    - Investigate the produced lists.
- Evaluation
    - Add the MAP metric in the evaluation
    - Investigate the distribution of P@k's and standard deviation per user (confidence interval)
    - Investigate if overfitting takes place