# How to measure diversity in a recommendation list?


Diversity measures how narrow or wide the spectrum of recommended products are. A recommender that only recommends the music of one artiste is pretty narrow; one that recommends across multiple artistes is more diverse.

There are two main ways of measuring diversity—based on item and based on users.

Measuring diversity based on item is straightforward. We can do this based on metadata of the recommended items:

+ How many different categories/genres?
+ How many different artistes/authors/sellers?
+ What is the kurtosis (“tailness”) of the price distribution?
+ How different (i.e., distant) are the product embeddings?

Another approach is measuring diversity based on existing customers. For each item in the recommended set, who has consumed it? If the items have a relatively large proportion of common users, the recommended items are likely very similar.

+ Do you care about how different the recommended items look? If so, measure diversity based on item features.
+ Do you care about the cross-pollination of communities and user groups (e.g., Facebook group and meet-up group recommendations)? 
+ If so, consider diversity based on users.


from [(https://eugeneyan.com/writing/serendipity-and-accuracy-in-recommender-systems/)]


low diversity -> suggesting the next movie of a trilogy
high diversity -> complete random suggestion

## Calculating diversity for movielens data set (small)

In [1]:
# import libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.sparse as sp
from typing import List

from sklearn.metrics.pairwise import cosine_similarity

#from surprise import SVD, Dataset, Reader
#from surprise.model_selection import train_test_split





# Recmetrics

from https://github.com/statisticianinstilettos/recmetrics

GitHub repo with small functions to evaluate recommender systems offline

__Installation:__

pip install recmetrics

import recmetrics

    -> raises error: cannot import signature from sklearn

my alternative: copy the functions from GitHub Repo

## Functions from recmetrics

+ single list similarity
+ intra-list similarity

u: int

In [2]:
def _single_list_similarity(predicted: list, feature_df: pd.DataFrame, u: int) -> float:
    """
    Computes the intra-list similarity for a single list of recommendations.
    Parameters
    ----------
    predicted : a list
        Ordered predictions
        Example: ['X', 'Y', 'Z']
    feature_df: dataframe
        A dataframe with one hot encoded or latent features.
        The dataframe should be indexed by the id used in the recommendations.
    u:  User
    Returns:
    -------
    ils_single_user: float
        The intra-list similarity for a single list of recommendations.
    """
    # exception predicted list empty
    #if not(predicted):
    #    raise Exception('Predicted list is empty, index: {0}'.format(u))

    #get features for all recommended items
    recs_content = feature_df.loc[predicted]
    recs_content = recs_content.dropna()
    recs_content = sp.csr_matrix(recs_content.values)

    #calculate similarity scores for all items in list
    similarity = cosine_similarity(X=recs_content, dense_output=False)

    #get indicies for upper right triangle w/o diagonal
    upper_right = np.triu_indices(similarity.shape[0], k=1)

    #calculate average similarity score of all recommended items in list
    ils_single_user = np.mean(similarity[upper_right])
    return ils_single_user

In [3]:
def intra_list_similarity(predicted: List[list], feature_df: pd.DataFrame) -> float:
    """
    Computes the average intra-list similarity of all recommendations.
    This metric can be used to measure diversity of the list of recommended items.
    Parameters
    ----------
    predicted : a list of lists
        Ordered predictions
        Example: [['X', 'Y', 'Z'], ['X', 'Y', 'Z']]
    feature_df: dataframe
        A dataframe with one hot encoded or latent features.
        The dataframe should be indexed by the id used in the recommendations.
    Returns:
    -------
        The average intra-list similarity for recommendations.
    """
    feature_df = feature_df.fillna(0)
    Users = range(len(predicted))
    ils = [_single_list_similarity(predicted[u], feature_df, u) for u in Users]
    return np.mean(ils)

In [4]:
def diversity(ils):
    '''
    Calculates the diversity of a recommendation based on the intra-list similarity
    '''
    d = (1- ils)*100
    return d

## Preprocessing of movielens data

In [5]:
# load the data
df_movies = pd.read_csv("../data/ml-latest-small/movies.csv")
df_ratings = pd.read_csv("../data/ml-latest-small/ratings.csv")

In [6]:
df_movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [7]:
df_ratings.head()


Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [8]:
# Split the genres at the pipe and convert it to a new list in a new column
df_movies["genre"] = df_movies \
    .genres.str.split("|") \
    .to_list()

In [9]:
# For every row in the dataframe, iterate through the list of genres and place 1 into the corresponding column
for index, row in df_movies.iterrows():
    for genre in row['genre']:
        df_movies.at[index, genre] = 1

# Filling in the NaN values with 0 and remove the genre columns
df_movies = df_movies \
    .fillna(0) \
    .drop("genres", axis=1) \
    .drop("genre", axis=1)

df_movies.head()

Unnamed: 0,movieId,title,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1,Toy Story (1995),1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Jumanji (1995),1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,Grumpier Old Men (1995),0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,Waiting to Exhale (1995),0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5,Father of the Bride Part II (1995),0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [10]:
df_movies = df_movies.drop("title", axis=1)

In [11]:
# change index of df_movie : movie id is new index
df_movies = df_movies.set_index("movieId")

In [12]:
df_movies

Unnamed: 0_level_0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193581,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
193583,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
193585,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
193587,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### At first we calculate the single list similarity for the generic recommendation list from the baseline model and the genre
based on how often they appear in the rating list (counts)
pred_list 1 = [356,  318,  296,  593, 2571,  260,  480,  110,  589,  527]


| movieId | title |
| --- | --- |
|110	|Braveheart (1995)|
|260	|Star Wars: Episode IV - A New Hope (1977)|
|296	|Pulp Fiction (1994)|
|318	|Shawshank Redemption, The (1994)|
|356	|Forrest Gump (1994)|
|480	|Jurassic Park (1993)|
|527	|Schindler's List (1993)|
|589	|Terminator 2: Judgment Day (1991)|
|593	|Silence of the Lambs, The (1991)|
|2571	|Matrix, The (1999)|



In [13]:
# Calculate sls for recommendation list for user 1
pred_list1 = [356,  318,  296,  593, 2571,  260,  480,  110,  589,  527]
sls1 = _single_list_similarity(pred_list1, df_movies, 1)
sls1

0.29912845088700013

In [14]:
# Diversity of recommendation list for user 1
diversity(sls1)

70.08715491129999

### Secondly, we calculate the single list similarity for the generic recommendation list from the baseline model and the genre
based on the median rating of a single movie

| movieId | title |
| --- | --- |
1151|	Lesson Faust (1994)
3851|	I'm the One That I Want (2000)
3942|	Sorority House Massacre II (1990)
5416|	Cherish (2002)
86237|	Connections (1978)
114265|	Laggies (2014)
115122|	What We Do in the Shadows (2014)
146662|	Dragons: Gift of the Night Fury (2011)
146684|	Cosmic Scrat-tastrophe (2015)
147250|	The Adventures of Sherlock Holmes and Doctor W...

In [15]:
# Calculate sls for recommendation list for user 2
pred_list2 = [3942, 147250, 115122, 86237, 1151, 146662, 114265, 146684, 3851, 5416]
sls2 = _single_list_similarity(pred_list2, df_movies, 2)
sls2

0.2461189292444546

In [16]:
# Diversity of recommendation list for user 2
diversity(sls2)

75.38810707555454

### first content-based list, based on cosine similarity of description

    Problems occurring

original list: [2348, 7320, 5947, 2385, 6393, 4447, 837, 6316, 4755, 9143]

6393, 4755, 9143 not in index, because list derives from content-based nb (Alex) and was merged with tmdb. Some movies must have gotten lost in the merge process.


| movieId | title |
| --- | --- |
|2348               |Toy Story 2
7320|               Toy Story 3
5947 |   The 40 Year Old Virgin
2385  |         Man on the Moon
6393   |           Factory Girl
4447    |What's Up, Tiger Lily?
837      |Rebel Without a Cause
6316    |For Your Consideration
4755     |     Rivers and Tides
9143      |Welcome to Happiness

In [17]:
# check out the missing movie
df_movies.iloc[9143]

Adventure             0.0
Animation             0.0
Children              0.0
Comedy                0.0
Fantasy               0.0
Romance               0.0
Drama                 0.0
Action                0.0
Crime                 1.0
Thriller              0.0
Horror                0.0
Mystery               0.0
Sci-Fi                0.0
War                   0.0
Musical               0.0
Documentary           0.0
IMAX                  0.0
Western               0.0
Film-Noir             0.0
(no genres listed)    0.0
Name: 147328, dtype: float64

In [18]:
# compute the sls with movies that occur in the index (missing movies excluded)
pred_list3 = [2348, 7320, 5947, 2385, 4447, 837, 6316]
sls3 = _single_list_similarity(pred_list3, df_movies, 3)
sls3

0.26846487355123716

In [19]:
# Diversity of recommendation list for user 3
diversity(sls3)

73.15351264487629

In [20]:
pred_list4 = [2348, 2348, 2348, 2348, 2348, 2348, 2348, 2348, 2348, 2348]
sls4 = _single_list_similarity(pred_list4, df_movies, 4)
sls4

1.0

In [21]:
# Diversity of recommendation list for user 3
diversity(sls4)

0.0

### Intra-list similarity (ILS)
Here, two recommendation lists are compared and the similarity score is calculated.

In [22]:
# List of two lists
pred_list = [pred_list1, pred_list2]

In [23]:
# Compute 
ils = intra_list_similarity(pred_list, df_movies)

In [24]:
# calculate the diversity of the two lists (in percent)
diversity(ils)

72.73763099342726

### Findings: The 

## Changing the movie features
Before, we calculated the similarity based on genre. Now, different movie feature will be chosen to 

## Conclusion

The similarity isn´t spread out enough, because it ranges only around 0.25. As a solution, I suggest to define better movie features, so the movies can be better compared.

