In [1]:
import pandas as pd
import numpy as np

Miscalibration Error, uses Kullback-Leibler (KL) Divergence to measure the difference between the preference distribution across all the item categories in a user profile and the distribution in the user's recommendation set. Both are based on the distribution of item categories $c$ for each item $i$, denoted by $p(c|i)$.

- $p(c|u)$: the distribution over categories $c$ of the set of items $\mathcal{H}_u$ interacted with by user $u$ in the past.

    \begin{equation}\label{input_preference}
        p(c|u) = \frac{\sum_{i \in \mathcal{H}_u} w_{u, i} \cdot p(c|i)}{\sum_{i \in \mathcal{H}_u} w_{u, i}},
    \end{equation}

    where $w_{i,u}$ is the weight of each item $i$, e.g. how recently it was liked or clicked on, or its popularity or rank.
    
- $q(c|u)$: the distribution across categories $c$ of the list of items recommended to user $u$.

    \begin{equation}
        q(c|u) = \frac{\sum_{i \in \mathcal{I}_u} w_{r(i)} \cdot p(c|i)}{\sum_{i \in \mathcal{I}_u} w_{r(i)}},
    \end{equation}
    
    where $\mathcal{I}_u$ is the set of recommended items and $w_{r(i)}$ is the weight of an item and can be measured by its rank $r(i)$ in the recommendation list.


KL-divergence is used to measure the difference between these two probability distributions, or the divergence of $p$ from $q$. KL-divergence is denoted by:

\begin{equation} \label{kl}
MC_{KL}(p||q) = KL(p||\tilde{q})= \sum_{c \in C}{p(c|u)\log\frac{p(c|u)}{\tilde{q}(c|u)}},
\end{equation}

where $p(c|u)$ is the target distribution. If $q$ is similar to $p$, $MC_{KL}$ will take small values, and in the case of perfect calibration, it is 0. $MC_{KL}$ diverges if a category $c$ is $q(c|u)=0$ and $p(c|u)>0$, so instead we use:
\begin{equation}
    \tilde{q}(c|u) = (1 - \alpha) \cdot q(c|u) + \alpha \cdot p(c|u),
\end{equation}

where $0 < \alpha < 1$, so that $q \approx \tilde{q}$. We set $\alpha = 0.01$ in this experiment.

We rename this metric to $MC_{KL}$ instead of $C_{KL}$ which is described in \cite{steck2018calibrated}, since it specifies the degree to which we have miscalibration in our recommendations and it is more in line with the values that KL-divergence takes. For example, if $p$ and $q$ are very similar, KL-divergence takes lower values, so miscalibration is low and vice versa.

$MC_{KL}$ is sensitive to small differences when $p$ is small. For example, if a user liked a category 2\% of the time and it is recommended to her 1\% of the time, $MC_{KL}$ considers it a significant change compared to a situation where a user likes a category 50\% of the time, while it's recommended to her 49\% of the time.



In [2]:
def KullbackLeiblerDivergence(interactDist, recommendedDist):
    import numpy as np
    
    alpha = 0.01 
    # not really a tuning parameter, 
    # it's there to make the computation more numerically stable.
    
    klDive = 0.0
    
    # over all the genres
    for i in range(len(interactDist)):
        # By convention, 0 * ln(0/a) = 0, so we can ignore keys in q that aren't in p
        if interactDist[i] == 0.0:
            continue
            
        # if q = recommendationDist and p = interactedDist, q-hat is the adjusted q.
        # given that KL divergence diverges if recommendationDist or q is zero, 
        # we instead use q-hat = (1-alpha).q + alpha . p
        # q(g|u) & p(g|u)
        recommendedDist[i] = (1 - alpha) * recommendedDist[i] + alpha * interactDist[i]
        
        klDive += interactDist[i] * np.log2(interactDist[i] / recommendedDist[i])

    return klDive

In [3]:
def ComputeGenreDistribution(itemList, p_g_i_final):
    '''
    either we pass the list of items from the training data/obsereved movies for each user,
    or we pass the list of recommended items to a user to create the distribution
    '''
    return p_g_i_final[itemList,:].sum(axis=0) / len(itemList)


- The first distrubution is calculated based on the training dataset. The q(g|i) that is used to calculate the p(g|u) and q(g|u).

- p(g|u) is the sum of p(g|i) in the obsereved movies divided by the number of observed items by the user.

- p(g|i) is the distribution of the genres for each movie. for each movie, there are several genres are assigned to it. They have the same probabilities so it sums up to 1. So, we calculate the probabilities of genres for each movie separately.


# Testing this function through an example

In [5]:
# import pandas as pd

# col = ["movie id",'movie title', 'release date', 'video release date', 'IMDb URL',
#            'unknown', 'Action', 'Adventure', 'Animation', "Children's", 'Comedy', 'Crime', 
#            'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'Musical', 'Mystery',
#            'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

# uitem = pd.read_csv('movielens-100k-dataset/ml-100k/uitem.csv', sep=";", skiprows=1,
#                     engine = "python", names=col)
# # uitem.head(3)

In [6]:
# uitem.drop(['movie title', 'release date', 'video release date', 'IMDb URL'],
#            axis='columns', inplace=True)
# uitem.head()

Unnamed: 0,movie id,unknown,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
1,2,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
2,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,4,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0
4,5,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0


## p(g|i)

In [7]:
# import numpy as np

In [8]:
# p_g_i = np.array(uitem)
# print(p_g_i)
# # the first column is the movie ids

# print ('\n', p_g_i[:,1:].shape)

# # sum of all p_g_i (excluding the first column which is the item id)
# vect = np.sum(p_g_i[:,1:], axis=1)[:,None]
# print('\n', vect)

[[   1    0    0 ...    0    0    0]
 [   2    0    1 ...    1    0    0]
 [   3    0    0 ...    1    0    0]
 ...
 [1680    0    0 ...    0    0    0]
 [1681    0    0 ...    0    0    0]
 [1682    0    0 ...    0    0    0]]

 (1682, 19)

 [[3]
 [3]
 [1]
 ...
 [2]
 [1]
 [1]]


For each movie, there are several genres that are assigned to it. They have the same probabilities so it sums up to 1. So, we calculate the probabilities of genres for each movie separately.

In [13]:
# p_g_i_final = p_g_i[:,1:]/vect[:]
# p_g_i_final[:2]

array([[0.        , 0.        , 0.        , 0.33333333, 0.33333333,
        0.33333333, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.33333333, 0.33333333, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.33333333, 0.        , 0.        ]])

## p(g|u)
p(g|u) is the sum of p(g|i) in the obsereved movies divided by the number of observed items by the user.

In [21]:
# # let's imagine that item 1 to 5 are seen by the user, then..
# p_g_i_final[:5].sum(axis=0)/5.0

array([0.        , 0.13333333, 0.06666667, 0.06666667, 0.06666667,
       0.13333333, 0.06666667, 0.        , 0.13333333, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.33333333, 0.        , 0.        ])

In [25]:
# p_g_i_final[[1,2,3,4,5]].sum(axis=0)/5.0

array([0.        , 0.13333333, 0.06666667, 0.        , 0.        ,
       0.06666667, 0.06666667, 0.        , 0.33333333, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.33333333, 0.        , 0.        ])

In [27]:
# p_g_i_final[1:5].sum(axis=0)/5.0

array([0.        , 0.13333333, 0.06666667, 0.        , 0.        ,
       0.06666667, 0.06666667, 0.        , 0.13333333, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.33333333, 0.        , 0.        ])

In [33]:
# p_g_i_final[:2,:]

array([[0.        , 0.        , 0.        , 0.33333333, 0.33333333,
        0.33333333, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.33333333, 0.33333333, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.33333333, 0.        , 0.        ]])

In [41]:
# p_g_i_final[[1,2,3,4,5],:].sum(axis=0)/5.0

array([0.        , 0.13333333, 0.06666667, 0.        , 0.        ,
       0.06666667, 0.06666667, 0.        , 0.33333333, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.33333333, 0.        , 0.        ])

In [49]:
# ComputeGenreDistribution([1,2,3,4,5], p_g_i_final)

array([0.        , 0.13333333, 0.06666667, 0.        , 0.        ,
       0.06666667, 0.06666667, 0.        , 0.33333333, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.33333333, 0.        , 0.        ])

## aha!

this is then p(g|u)!

We can call this function for every user & compare her recommendations and her profile items.