In [2]:
import numpy as np

The matrix reflects: in the rows, the topics. in the columns, the users (+ the last column, which is the Home column). The numbers in the matrix are the number of news of topic i that user j has encountered on the homepage.

In [3]:
matrix = np.array([[1, 2, 1, 0, 0, 0, 0, 0, 0],
                   [2, 1, 3, 1, 0, 0, 0, 0, 4],
                   [0, 0, 2, 3, 0, 0, 0, 0, 2],
                   [0, 0, 0, 0, 6, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0, 8, 0, 0, 0],
                   [0, 0, 0, 0, 0, 1, 8, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 3, 0],
                   [0, 0, 0, 0, 0, 0, 0, 5, 0],
                   [7, 0, 0, 0, 0, 0, 0, 0, 0],
                   [0, 7, 0, 0, 0, 0, 0, 0, 0],
                   [0, 0, 4, 2, 0, 1, 0, 2, 0],
                   [0, 0, 0, 4, 0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 4, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0, 0, 2, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 0, 2]])

### Click-through rate (CTR)

CTR (Click-Through Rate) is a metric that measures **the percentage of clicks on recommended articles compared to the total number of recommended articles**.

- Original CTR: In the original paper, the CTR (Click-Through Rate) is calculated as **the fraction of recommended articles (N) that the user clicks on (ri = 1)**. **The sum of clicks on recommended items is divided by the total number of recommended items (N items)**.

- Our CTR: *calculate_custom_CTR* calculates a CTR-like metric based on the matrix. The metric represents **the fraction of views in the "Home" column compared to total views**. This metric takes into account article views on the homepage.

In the original paper, CTR is defined as the fraction of recommended articles that the user clicks on, which represents specific user behavior towards recommended content.

In our case, we are trying to similarly evaluate how much users view content from a certain topic on the homepage, but without considering actual clicks. So, while the approach is similar in that they both seek to measure user interaction with recommended content, there are key differences in defining the metrics.

In [4]:
# Definisci una funzione per calcolare il CTR
def calculate_CTR(matrix):
    # Calcola la somma delle visualizzazioni nella matrice
    total_views = np.sum(matrix)
    
    # Calcola la somma delle visualizzazioni nella colonna "Home" (ultima colonna)
    home_views = np.sum(matrix[:, -1])
    
    # Calcola la metrica come la frazione di visualizzazioni nella colonna "Home"
    CTR = home_views / total_views
    
    return CTR

# Calcola il CTR
ctr = calculate_CTR(matrix)
print("Click-Through Rate (CTR):", ctr)

Click-Through Rate (CTR): 0.09090909090909091


Our Click-Through Rate (CTR) result is 0.09090909090909091, which represents the percentage of homepage article views compared to total views. In other words, approximately 8.33% of homepage views had a click or similar interaction.

### Average document stance

- Original Average document stance is the **average partisan score of the articles that are shown to the users**.

- Our Average document stance: *calculate_average_document_stance* takes **the topic array and index as input and calculates the average document stance for that specific topic**. This function calculates the average of the partisan scores for the topic, excluding zero values in the average.

In [5]:
def calculate_average_document_stance(matrix, topic_index):
    # Estrai la riga corrispondente al topic
    topic_row = matrix[topic_index - 1]  # Sottrai 1 perché gli indici partono da 0
    
    # Calcola la somma dei punteggi per il topic
    total_score = np.sum(topic_row)
    
    # Calcola l'average document stance
    avg_stance = total_score / np.sum(topic_row != 0)  # Calcola la media solo sui valori non zero
    
    return avg_stance

# Calcola l'average document stance per i tre topic: repubblicano, democratico, neutrale
avg_stance_repubblicano = calculate_average_document_stance(matrix, 1)  # Topic repubblicano
avg_stance_democratico = calculate_average_document_stance(matrix, 2)   # Topic democratico
avg_stance_neutrale = calculate_average_document_stance(matrix, 3)      # Topic neutrale

print("Average Document Stance per il topic Repubblicano:", avg_stance_repubblicano)
print("Average Document Stance per il topic Democratico:", avg_stance_democratico)
print("Average Document Stance per il topic Neutrale:", avg_stance_neutrale)

Average Document Stance per il topic Repubblicano: 1.3333333333333333
Average Document Stance per il topic Democratico: 2.2
Average Document Stance per il topic Neutrale: 2.3333333333333335


These results represent the average position of articles shown to users for each topic. For example, for the Republican topic, the average document stance is approximately 1.33, while for the Democratic topic it is approximately 2.2, and for the neutral topic it is approximately 2.33. These scores indicate the average positions of recommended articles in each topic, but remember that these values are specific to your display matrix and may vary based on actual data and user preferences.

### Normalized document stance

- Original Normalized document stance: they represent the fraction of articles that are shown to the users who have stance i. Normalized stance entropy is the entropy of this distribution,
normalized by log m so that its maximum is 1, where m = 5 in their case (representing the five political stances).

- Our Normalized document stance: *calculate_normalized_stance_entropy* takes the matrix and the value of m (in our case, 3) as input and **calculates the normalized position entropy based on the fractions of articles shown to users for each of the three positions**.

In [6]:
# Definisci una funzione per calcolare l'entropia di posizione normalizzata
def calculate_normalized_stance_entropy(matrix, m):
    # Calcola le frazioni di articoli mostrati agli utenti per ciascuna posizione
    position_fractions = np.sum(matrix, axis=1) / np.sum(matrix)
    
    # Calcola l'entropia della distribuzione
    entropy = -np.sum(position_fractions * np.log(position_fractions)) / np.log(m)
    
    return entropy

# Valore di m (numero di posizioni)
m = 3

# Calcola l'entropia di posizione normalizzata con m = 3
normalized_stance_entropy = calculate_normalized_stance_entropy(matrix, m)
print("Normalized Stance Entropy:", normalized_stance_entropy)

Normalized Stance Entropy: 2.369491306537947


This metric represents the diversity of item locations recommended to users. A higher entropy value indicates greater diversity in the positions of recommended items. In this case, a value of 2.369491306537947 suggests some diversity in item positions.

### Normalized topic entropy

- Original Normalized topic entropy: similar to normalized stance entropy, they measure the
diversity of topics. This provides a measure of topical diversity, in addition to stance diversity above. The metric is the same as Equation 3, where pi is instead the probability of articles having topic i in a sequence of recommendations, and m = 14 since there are 14 topics.

- Our Normalized topic entropy: c*alculate_normalized_topic_entropy* takes the matrix and the value of m (in your case, 15, as 15 total topics) as input and **calculates the normalized topic entropy based on the fractions of articles that have each topic in a recommendation sequence**.

In [11]:
# Definisci una funzione per calcolare l'entropia dell'argomento normalizzata
def calculate_normalized_topic_entropy(matrix, m):
    # Calcola le frazioni di articoli che hanno ciascun argomento in una sequenza di raccomandazioni
    topic_fractions = np.sum(matrix, axis=0) / np.sum(matrix)
    
    # Calcola l'entropia della distribuzione
    entropy = -np.sum(topic_fractions * np.log(topic_fractions)) / np.log(m)
    
    return entropy

# Calcola l'entropia dell'argomento normalizzata con m = 15
m = 15
normalized_topic_entropy = calculate_normalized_topic_entropy(matrix, m)
print("Normalized Topic Entropy:", normalized_topic_entropy)

Normalized Topic Entropy: 0.8105601210727017


This metric represents the diversity of article topics recommended to users. A higher entropy value indicates a greater diversity of recommended content topics. In this case, a value of 0.8105601210727017 suggests some diversity in recommended article topics.