# Recommender Systems Using Content-Based Method

> *Recommender Systems*  
> *MSc in Data Science, Department of Informatics*  
> *Athens University of Economics and Business*

---

Find a dataset that can be used to inform a ***content-based*** recommender systems.

Build a Python notebook that:

- Loads the dataset
- Creates a content-based recommender system
- Uses quantitative metrics to evaluate the recommendations of the system


## *Table of Contents*

- [*1. Introduction*](#introduction)
    - [*1.1. Libraries*](#libraries)
    - [*1.2. Data*](#data)
    - [*1.3. Data Preprocessing*](#data_preprocessing)
- [*2. Recommendations Using Content-Based Method*](#content_based_method)
    - [*2.1. Load Preprocessed Data*](#load_preprocessed_data)
    - [*2.2. Recommendation Exampe*](#recommendation_example)
    - [*2.3. Generate Fake Users*](#generate_fake_users)
    - [*2.4. Make Recommendations*](#content_based_recommendations)
    - [*2.5. Evaluate Recommendations Using Ranking Based Methods*](#ranking_based_methods)
        - [*2.5.1. nDCG*](#ndcg)
        - [*2.5.2. Mean Reciprocal Rank*](#mrr)
        - [*2.5.3. Average Precision*](#ap)

---

## Introduction <a class='anchor' id='introduction'></a>

### *Libraries* <a class='anchor' id='libraries'></a>

In [18]:
import pandas as pd
import numpy as np

import time

from functions.data_preprocessing import keep_and_rename_desired_columns
from functions.data_preprocessing import preprocess_data

from functions.content_based_recommendations import load_and_prepare_data
from functions.content_based_recommendations import compute_similarity
from functions.content_based_recommendations import recommend_movies
from functions.content_based_recommendations import generate_fake_users
from functions.content_based_recommendations import exploit_simulate
from functions.content_based_recommendations import evaluate_recommendations_using_nDCG
from functions.content_based_recommendations import evaluate_recommendations_using_MRR
from functions.content_based_recommendations import evaluate_recommendations_using_AP

In [2]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

### *Data* <a class='anchor' id='data'></a>

- *The data contains movies and tv shows hosted in the Netflix platform*
- *There are 29 attributes for each movie and tv show*
- *We will focus in only 10 of them though, which are the following ones:*
    - *Title*
    - *Genre*
    - *Tags*
    - *Runtime*
    - *Director*
    - *Actors*
    - *Rating*
    - *Release Year*
    - *Production House*
    - *Summary*

##### *Read the data*

In [22]:
# define the filepath
filepath = './data/Netflix Dataset Latest 2021.xlsx'

# read the data
df = pd.read_excel(filepath, engine='openpyxl')

# shape
print(f'df.shape: {df.shape}')

# preview
df.head(3)

df.shape: (9425, 29)


Unnamed: 0,Title,Genre,Tags,Languages,Series or Movie,Hidden Gem Score,Country Availability,Runtime,Director,Writer,Actors,View Rating,IMDb Score,Rotten Tomatoes Score,Metacritic Score,Awards Received,Awards Nominated For,Boxoffice,Release Date,Netflix Release Date,Production House,Netflix Link,IMDb Link,Summary,IMDb Votes,Image,Poster,TMDb Trailer,Trailer Site
0,Lets Fight Ghost,"Crime, Drama, Fantasy, Horror, Romance","Comedy Programmes,Romantic TV Comedies,Horror ...","Swedish, Spanish",Series,4.3,Thailand,< 30 minutes,Tomas Alfredson,John Ajvide Lindqvist,"Lina Leandersson, Kåre Hedebrant, Per Ragnar, ...",R,7.9,98.0,82.0,74.0,57.0,2122065.0,2008-12-12,2021-03-04,"Canal+, Sandrew Metronome",https://www.netflix.com/watch/81415947,https://www.imdb.com/title/tt1139797,A med student with a supernatural gift tries t...,205926.0,https://occ-0-4708-64.1.nflxso.net/dnm/api/v6/...,https://m.media-amazon.com/images/M/MV5BOWM4NT...,https://www.youtube.com/watch?v=LqB6XJix-dM,YouTube
1,HOW TO BUILD A GIRL,Comedy,"Dramas,Comedies,Films Based on Books,British",English,Movie,7.0,Canada,1-2 hour,Coky Giedroyc,Caitlin Moran,"Cleo, Paddy Considine, Beanie Feldstein, Dónal...",R,5.8,79.0,69.0,1.0,,70632.0,2020-05-08,2021-03-04,"Film 4, Monumental Pictures, Lionsgate",https://www.netflix.com/watch/81041267,https://www.imdb.com/title/tt4193072,"When nerdy Johanna moves to London, things get...",2838.0,https://occ-0-1081-999.1.nflxso.net/dnm/api/v6...,https://m.media-amazon.com/images/M/MV5BZGUyN2...,https://www.youtube.com/watch?v=eIbcxPy4okQ,YouTube
2,The Con-Heartist,"Comedy, Romance","Romantic Comedies,Comedies,Romantic Films,Thai...",Thai,Movie,8.6,Thailand,> 2 hrs,Mez Tharatorn,"Pattaranad Bhiboonsawade, Mez Tharatorn, Thods...","Kathaleeya McIntosh, Nadech Kugimiya, Pimchano...",,7.4,,,,,,2020-12-03,2021-03-03,,https://www.netflix.com/watch/81306155,https://www.imdb.com/title/tt13393728,After her ex-boyfriend cons her out of a large...,131.0,https://occ-0-2188-64.1.nflxso.net/dnm/api/v6/...,https://m.media-amazon.com/images/M/MV5BODAzOG...,https://www.youtube.com/watch?v=md3CmFLGK6Y,YouTube


##### *Keep and rename the desired columns*

In [4]:
# execute function
df = keep_and_rename_desired_columns(df)

# shape
print(f'df.shape: {df.shape}')

# preview
df.head(3)

df.shape: (9425, 10)


Unnamed: 0,title,genre,tags,runtime,director,actors,rating,ryear,prod_house,summary
0,Lets Fight Ghost,"Crime, Drama, Fantasy, Horror, Romance","Comedy Programmes,Romantic TV Comedies,Horror ...",< 30 minutes,Tomas Alfredson,"Lina Leandersson, Kåre Hedebrant, Per Ragnar, ...",7.9,2008-12-12,"Canal+, Sandrew Metronome",A med student with a supernatural gift tries t...
1,HOW TO BUILD A GIRL,Comedy,"Dramas,Comedies,Films Based on Books,British",1-2 hour,Coky Giedroyc,"Cleo, Paddy Considine, Beanie Feldstein, Dónal...",5.8,2020-05-08,"Film 4, Monumental Pictures, Lionsgate","When nerdy Johanna moves to London, things get..."
2,The Con-Heartist,"Comedy, Romance","Romantic Comedies,Comedies,Romantic Films,Thai...",> 2 hrs,Mez Tharatorn,"Kathaleeya McIntosh, Nadech Kugimiya, Pimchano...",7.4,2020-12-03,,After her ex-boyfriend cons her out of a large...


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9425 entries, 0 to 9424
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   title       9425 non-null   object        
 1   genre       9400 non-null   object        
 2   tags        9389 non-null   object        
 3   runtime     9424 non-null   object        
 4   director    7120 non-null   object        
 5   actors      9314 non-null   object        
 6   rating      9417 non-null   float64       
 7   ryear       9217 non-null   datetime64[ns]
 8   prod_house  4393 non-null   object        
 9   summary     9420 non-null   object        
dtypes: datetime64[ns](1), float64(1), object(8)
memory usage: 736.5+ KB


### *Data Preprocessing* <a class='anchor' id='data_preprocessing'></a>

##### *Preprocessing the data to bring it to the appropriate format*

In [6]:
# execute function
df = preprocess_data(df)

# shape
print(f'df.shape: {df.shape}')

# preview
df.head(3)

df.shape: (6945, 12)


Unnamed: 0,index,title,genre,tags,runtime,director,actors,rating,ryear,prod_house,summary,star_actor
0,0,Lets Fight Ghost,"Crime, Drama, Fantasy, Horror, Romance","Comedy Programmes,Romantic TV Comedies,Horror ...",15,Tomas Alfredson,"Lina Leandersson, Kare Hedebrant, Per Ragnar, ...",7.9,2008,Canal+,A med student with a supernatural gift tries t...,Lina Leandersson
1,1,HOW TO BUILD A GIRL,Comedy,"Dramas,Comedies,Films Based on Books,British",90,Coky Giedroyc,"Cleo, Paddy Considine, Beanie Feldstein, Donal...",5.8,2020,Film 4,"When nerdy Johanna moves to London, things get...",Cleo
2,2,The Con-Heartist,"Comedy, Romance","Romantic Comedies,Comedies,Romantic Films,Thai...",150,Mez Tharatorn,"Kathaleeya McIntosh, Nadech Kugimiya, Pimchano...",7.4,2020,Unknown,After her ex-boyfriend cons her out of a large...,Kathaleeya McIntosh


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6945 entries, 0 to 6944
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   index       6945 non-null   int64  
 1   title       6945 non-null   object 
 2   genre       6945 non-null   object 
 3   tags        6945 non-null   object 
 4   runtime     6945 non-null   int64  
 5   director    6945 non-null   object 
 6   actors      6945 non-null   object 
 7   rating      6945 non-null   float64
 8   ryear       6945 non-null   int64  
 9   prod_house  6945 non-null   object 
 10  summary     6945 non-null   object 
 11  star_actor  6945 non-null   object 
dtypes: float64(1), int64(3), object(8)
memory usage: 651.2+ KB


##### *Save the preprocessed data*

In [8]:
df.to_csv('./data/data_preprocessed.csv', index=False)

---

## Recommendations Using Content-Based Method <a class='anchor' id='content_based_method'></a>

### *Load Preprocessed Data*

- *In this step, we will load the data that we preprocessed and saved above*
- *We will use the function `load_and_prepare_data`*
- *The function takes a file path as an argument and reads a CSV file containing the preprocessed data*
- *It creates a new class `movie` which defines the attributes of a movie object*
- *The function returns three items:*
    - *A list called `movies` holding each movie object*
    - *A dictionary called `title_index` holding the index of each movie title*
    - *A tensor called `summaries_sim_matrix` holding the cosine similarity between the movie summaries*

##### *Load the data*

In [9]:
# define the filepath
filepath = './data/data_preprocessed.csv'

# start time
st = time.time()

# execute function
movies, title_index, summaries_sim_matrix = load_and_prepare_data(filepath)

# end time
et = time.time()

# print
print('Finished.')
print()
print(f'Elapsed time: {int(et-st)} secs.')

Finished.

Elapsed time: 58 secs.


In [10]:
print(type(movies))
print(type(title_index))
print(type(summaries_sim_matrix))

<class 'list'>
<class 'dict'>
<class 'torch.Tensor'>


### *Recommendation Example*

- *In this step, we will run a recommendation example to see a demonstration of how our recommender system works*
- *We will use the function `recommend_movies`*
- *The function takes an input movie title, along with some other parameters*
- *The parameter `weights` contains the weights we want to assign to each factor and can range between 0 and 1*
- *The function returns candidate recommended movies, sorted by their similarity score with the input movie title*

##### *Define the weights for each factor*

In [11]:
weights = {
    'genre':1,
    'tags':1,
    'runtime':1,
    'director':1,
    'actors':1,
    'rating':1,
    'ryear':1,
    'prod_house':1,
    'summary':1,
    'star_actor':1,
}

##### *Run recommendation example*

In [12]:
# movie for which to search for similar movies
input_movie = 'The Queens Gambit' # or try Schindlers List

# start time
st = time.time()

# recommend similar movies
recommendations = recommend_movies(input_movie,
                                   movies,
                                   title_index,
                                   summaries_sim_matrix,
                                   weights)

# end time
et = time.time()

# print
print('Finished.')
print()
print(f'Elapsed time: {int(et-st)} secs.')

Finished.

Elapsed time: 4 secs.


##### *View the top 10 recommended similar movies*

In [13]:
print(f'Movies similar to "{input_movie}"')
print('='*37)
    
for i, mv in enumerate(recommendations):
    
    # display first 10 recommendations
    if i == 11: break
    
    # skip the same movie
    if mv[0] == input_movie: continue
    
    print()
    print(f'Title: {mv[0]}')
    print(f'Similarity: {mv[1][0]}')
    print(f'Factor Contribution: {mv[1][1]}')
    print()

Movies similar to "The Queens Gambit"

Title: Philomena
Similarity: 3.28
Factor Contribution: [('director', 1.0), ('rating', 0.76), ('runtime', 0.56), ('genre', 0.5), ('summary', 0.39), ('ryear', 0.07)]


Title: Gandhi
Similarity: 3.19
Factor Contribution: [('genre', 1.0), ('runtime', 1.0), ('rating', 0.8), ('ryear', 0.23), ('summary', 0.16)]


Title: Schindlers List
Similarity: 3.19
Factor Contribution: [('genre', 1.0), ('runtime', 1.0), ('rating', 0.89), ('summary', 0.18), ('ryear', 0.12)]


Title: A Man for All Seasons
Similarity: 3.18
Factor Contribution: [('genre', 1.0), ('runtime', 1.0), ('rating', 0.77), ('ryear', 0.4), ('summary', 0.01)]


Title: Hidden Figures
Similarity: 3.16
Factor Contribution: [('genre', 1.0), ('runtime', 1.0), ('rating', 0.78), ('summary', 0.27), ('ryear', 0.11)]


Title: Gone with the Wind
Similarity: 3.09
Factor Contribution: [('runtime', 1.0), ('rating', 0.81), ('ryear', 0.66), ('genre', 0.4), ('summary', 0.22)]


Title: A Very English Scandal
Similari

### *Generate fake users*

- *In this step, we will generate some fake users to use them to evaluate the performance of our recommender system*
- *We will use the function `generate_fake_users`*
- *The function will generate 10 fake users with 5 random seed movies each*

##### *Define the factors to be taken into account when generating fake users*

In [14]:
factors = [
    'genre',
    'tags',
    'runtime',
    'director',
    'actors',
    'rating',
    'ryear',
    'prod_house',
    'summary',
    'star_actor'
]

##### *Generate fake users*

In [15]:
# start time
st = time.time()

# execute function
generated_users = generate_fake_users(movies,
                                      title_index,
                                      summaries_sim_matrix,
                                      factors)

# end time
et = time.time()

# print
print()
print(f'Elapsed time: {int(et-st)} secs.')

Creating fake user 1...
Creating fake user 2...
Creating fake user 3...
Creating fake user 4...
Creating fake user 5...
Creating fake user 6...
Creating fake user 7...
Creating fake user 8...
Creating fake user 9...
Creating fake user 10...

Fake users have been created successfully!

Elapsed time: 202 secs.


### *Make Recommendations*

- *In this step we will use the previously generated fake users and make recommendations for them*
- *We will use the function `exploit_simulate`*
- *The function will recommend 10 movies to each fake user*
- *We will also have the chance to see how many of the recommended movies would actually be liked by each user*

##### *Make recommendations to the generated users*

In [16]:
# start time
st = time.time()

# execute function
total_users_dict = exploit_simulate(generated_users,
                                   movies,
                                   title_index,
                                   summaries_sim_matrix,
                                   factors,
                                   num_rec_per_user=10)

# end time
et = time.time()

# print
print()
print(f'Elapsed time: {int(et-st)} secs.')

 1/10 - Oasis: Supersonic [No]
 2/10 - Passion [Yes]
 3/10 - Carlitos Way [Yes]
 4/10 - Scarface [No]
 5/10 - The Untouchables [Yes]
 6/10 - The Shadow [Yes]
 7/10 - The Hunt for Red October [Yes]
 8/10 - Indiana Jones and the Last Crusade [Yes]
 9/10 - Indiana Jones and the Raiders of the Lost Ark [Yes]
10/10 - Indiana Jones and the Temple of Doom [Yes]
User 1: 8/10 liked movies (122 secs.)

 1/10 - Sohni Mahiwal [Yes]
 2/10 - Flower of Evil [Yes]
 3/10 - Thottappan [Yes]
 4/10 - The Pilgrimage to Kevlaar [Yes]
 5/10 - Pororoca [Yes]
 6/10 - ARASHIs Diary -Voyage- [Yes]
 7/10 - Pieces of a Woman [Yes]
 8/10 - Ingeborg Holm [Yes]
 9/10 - Repast [Yes]
10/10 - Yearning [Yes]
User 2: 10/10 liked movies (184 secs.)

 1/10 - Breaking In [No]
 2/10 - Patriot Games [No]
 3/10 - Les Miserables [No]
 4/10 - The Berlin File [Yes]
 5/10 - The City of Violence [No]
 6/10 - Arahan [Yes]
 7/10 - Crying Fist [Yes]
 8/10 - The Champion [Yes]
 9/10 - The Unjust [Yes]
10/10 - Flower of Evil [Yes]
User 3

### *Evaluate Recommendations Using Ranking Based Methods* <a class='anchor' id='ranking_based_methods'></a>

<p style='text-align: justify;'><i>Methods like <b>Precision</b> or <b>Recall</b> allow us to understand the overall performance of the results we get from the RecSys. But they provide no information on how the items were ordered. A model can have a good <b>Precision</b> or <b>Recall</b>, but if the top three items that it recommends are not relevant to the user, then the recommendation is not much useful. If the user has to scroll down to search for relevant items then what’s the point of recommendations in the first place? Even without the recommendation user can scroll to look for items of their liking. <b>Ranking based evaluation methods</b> assist us in understanding how suggested items are ordered in terms of their relevancy for the users. They help us to measure quality items ranking.</i></p>

### *nDCG* <a class='anchor' id='ndcg'></a>

<p style='text-align: justify;'><i>nDCG has three parts. First is <b>"CG"</b> which stands for <b>Cumulative Gains</b>. It deals with the fact that most relevant items are more useful than somewhat relevant items that are more useful than irrelevant items. It sums the items based on its relevancy, hence, the term cumulative. Suppose we are asked to score the items based on their relevancy as:</i></p>

- *(Highly liked movies) Most relevant score = 2*
- *(Moderately liked movies) Somewhat relevant score = 1*
- *(Disliked movies) Least relevant score = 0*

*If we are to sum these score we will get cumulative gain for the given items as follows:*

$$ \mathrm{CG_{p}} = \sum_{i=1}^{p} rel_i $$

| Items Ranking | Relevancy Score |
| :-----------: | --------------- |
| Movie 1 | 1 |
| Movie 3 | 2 |
| Movie 2 | 2 |
| Movie 5 | 0 |
| Movie 4 | 1 |
| **CG =** | **6** |

<p style='text-align: justify;'><i>But CG doesn’t account for the position of the items on the list. And, hence, changing the item's position won’t change the CG. This is where the second part of nDCG comes in to play i.e. "D".</i></p>

<p style='text-align: justify;'><i><b>Discounted Cumulative Gain</b>, <b>DCG</b> for short, penalizes the items that appear lower in the list. A relevant item appearing at the end of the list is a result of a bad recommender system and hence that item should be discounted to indicate the bad performance of the model. To do so we divide the relevance score of items with the log of its rank on the list.</i></p>

$$ \mathrm{DCG_{p}} = \sum_{i=1}^{p} \frac{rel_{i}}{\log_{2}(i+1)} $$

| Items Ranking | Relevancy Score |
| :-----------: | --------------- |
| Movie 1 | 1 |
| Movie 3 | 2 |
| Movie 2 | 2 |
| Movie 5 | 0 |
| Movie 4 | 1 |
| **CG =** | **6** |
| **DCG =** | **12.1** |

<p style='text-align: justify;'><i>DCG helps with the ranking, but suppose we are comparing the different lists of the recommender. DCG for each of the lists will be different depending upon where the recommender places the items. What will be DCG for when the most relevant item was placed at 10th position on 20 items list of recommender verses DCG for when the somewhat relevant item was paced at 10th position on 11th item list. To normalize this, "n" of nDCG, the third part, comes in to play.</i></p>

<p style='text-align: justify;'><i><b>nDCG</b> normalized the DCG values of the different number of the items lists. To do so we sort the item list by relevancy and calculate the DCG for that list. This will be the perfect DCG score as items are sorted by their relevancy score. We divide all DCG score of all the list we get by this perfect DCG to get the normalized score for that list. <b>nDCG</b> score ranges from 0 to 1, where 1 indicates perfect ranking of relevant items and 0 indicates no relevant items in the recommendation list.</i></p>

$$ {\mathrm {nDCG_{{p}}}}={\frac {DCG_{{p}}}{IDCG_{{p}}}}={\frac{12.1}{13.9}}=0.87 $$

*where <b>IDCG</b> is ideal discounted cumulative gain,*

$$ \mathrm{IDCG_p} = \sum_{i=1}^{|REL_p|} \frac{rel_i}{\log_2(i+1)} $$

*and ${|REL_p|}$ represents the list of relevant documents (ordered by their relevance) in the corpus up to position p.*

| Perfect Ranking | Relevancy Score |
| :-----------: | --------------- |
| Movie 3 | 2 |
| Movie 2 | 2 |
| Movie 1 | 1 |
| Movie 4 | 1 |
| Movie 5 | 0 |
| **CG =** | **6** |
| **IDCG =** | **13.9** |

##### *Compute nDCG*

In [19]:
nDCG_list = []

# for each user in generated fake users
for user, recommended_movies in total_users_dict.items():
    # compute current user's nDCG result
    current_user_nDCG = evaluate_recommendations_using_nDCG(recommended_movies)
    # append to the list
    nDCG_list.append(current_user_nDCG)

# get the average performance
nDCG = round(np.mean(nDCG_list),2)

print(f'nDCG: {nDCG}')

nDCG: 0.74


### *Mean Reciprocal Rank* <a class='anchor' id='mrr'></a>

<p style='text-align: justify;'><i>Mean Reciprocal Rank, <b>MRR</b> for short, focuses on where is the first relevant item in the recommended list. <b>MRR</b> for a list with the first relevant item at its third position will be greater than for a list with the first relevant item at 4th position. <b>MRR</b> takes the reciprocal of the relevant items’ position and sums them. If relevant items are on positions 2, 3 and 5 on an item list, <b>MRR</b> will be $\frac{1/2 + 1/3 + 1/5}{3}$.</i></p>

| Items Ranking | Relevant Items | Reciprocal Ranking |
| :-----------: | :------------: | :----------------: |
| Movie 1 | No | 0 |
| Movie 3 | Yes | 1/2 |
| Movie 2 | Yes | 1/3 |
| Movie 5 | No | 0 |
| Movie 4 | Yes | 1/5 |

$$ \mathrm{MRR} = {\frac{\frac{1}{2} + \frac{1}{3} + \frac{1}{5}}{3}} = 0.34 $$

<p style='text-align: justify;'><i>Typically, a higher <b>MRR</b> score indicates better performance, where a score of 1.0 represents perfect accuracy (i.e., all relevant items are ranked first), while a score of 0.0 represents random guessing. In general, an <b>MRR</b> score above 0.2 is considered to be a good score, but it can vary depending on the specific task and dataset.</i></p>

<p style='text-align: justify;'><i>It's important to keep in mind that the <b>MRR</b> score is just one evaluation metric for a recommender system, and it should be considered alongside other metrics such as precision and recall. It's also important to consider the specific requirements of the application and the preferences of the users to determine what constitutes good performance.</i></p>

##### *Compute MRR*

In [20]:
MRR_list = []

# for each user in generated fake users
for user, recommended_movies in total_users_dict.items():
    # compute current user's MRR result
    current_user_MRR = evaluate_recommendations_using_MRR(recommended_movies)
    # append to the list
    MRR_list.append(current_user_MRR)

# get the average performance
MRR = round(np.mean(MRR_list),2)

print(f'Mean Reciprocal Rank: {MRR}')

Mean Reciprocal Rank: 0.2


### *Average Precision* <a class='anchor' id='ap'></a>

<p style='text-align: justify;'><i>Precision helps to understand the overall performance of the model but doesn’t tell if the items were ranked properly. <b>Average Precision</b>, AP for short, helps to measure the quality of the selected item’s ranking of the recommender model. It calculates the precision for only the relevant items that are recommended.</i></p>

<p style='text-align: justify;'><i>Suppose our model recommends 8 items, as depicted below, out of which 4 are correct and 4 are incorrect. We take the first relevant item and calculate its precision which in our case is the first item, therefore, its precision will be 1/1. Next, calculate precision for the second relevant item (item 3). Its precision will be 2/3. 2 because from 1st till the current item there are two correctly predicted items out of total 3 items. We will do the same for all the relevant items. Lastly, take the mean of the precision list to compute <b>AP</b>. The overall precision for this example is $0.5$, while the <b>AP</b> is $0.75$. Lower <b>AP</b> indicates the quality ranking. <b>AP</b> takes values from 0 (if there are no relevant recommendations) to 1 (if all recommendations are relevant).</i></p>

<p style='text-align: justify;'><i><b>AP</b> gives more weight to the precision of the top recommendations, while precision gives equal weight to all the recommendations. This means that if the relevant items are mainly concentrated at the top of the recommendation list, <b>AP</b> will be higher than precision. On the other hand, if the relevant items are distributed randomly throughout the recommendation list, <b>AP</b> may be lower than precision. Therefore, it's possible for the AP to be either higher or lower than the precision, depending on the order in which the recommendations are presented and the distribution of the relevant items within the list.</i></p>

<br>

<div style="text-align:center">
    <img src="./images/average_precision.png" alt="description of image">
</div>

$$ \mathrm{AP} = {\frac{\frac{1}{1} + \frac{2}{3} + \frac{3}{4} + \frac{4}{7}}{4}} = 0.75 $$

##### *Compute AP*

In [21]:
AP_list = []

# for each user in generated fake users
for user, recommended_movies in total_users_dict.items():
    # compute current user's AP result
    current_user_AP = evaluate_recommendations_using_AP(recommended_movies)
    # append to the list
    AP_list.append(current_user_AP)

# get the average performance
AP = round(np.mean(AP_list),2)

print(f'Average Precision: {AP}')

Average Precision: 0.64


---

*Thank you!*

---