In [1]:
%matplotlib inline
import pymc3 as pm
import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import arviz as az
from IPython.display import display, Markdown
az.style.use('arviz-darkgrid')
import numpy as np
np.random.seed(44)


In [2]:
plt.rcParams['font.size'] = 15
plt.rcParams['legend.fontsize'] = 'medium'
plt.rcParams.update({
    'figure.figsize': [12.0, 5.0],
    'figure.facecolor': '#fffff8',
    'axes.facecolor': '#fffff8',
    'figure.constrained_layout.use': True,
    'font.size': 14.0,
    'hist.bins': 'auto',
    'lines.linewidth': 3.0,
    'lines.markeredgewidth': 2.0,
    'lines.markerfacecolor': 'none',
    'lines.markersize': 8.0, 
})

In [3]:

az.style.use('arviz-darkgrid')

# Hierarchical models without predictors


Giorgio Corani <br/>
*Bayesian Data Analysis and Probabilistic Programming*
<br/>
<br/>
``giorgio.corani@supsi.ch``





# Based on 

* Alicia A. Johnson, Miles Q. Ott, Mine Dogucu, Bayes Rules! An Introduction to Applied Bayesian Modeling, Chapter 16,  *Hierarchical Models without Predictors*, https://www.bayesrulesbook.com/chapter-16.html

* Some images are indeed taken  from that book. 

# Our first hierarchical models: songs popularity

* We assume to have the response variable $Y$, but no predictors $X$. 

* Spotify  provides a data bases of songs and their *popularity* score,  which a rating  on tje 0-100 scale. 
* In general, the more recent the plays of the  song has on the platform, the higher its popularity rating. 

# Research questions 


*    What’s the typical popularity of a Spotify song?
*   To what extent does popularity vary from artist to artist?
*   For any single artist, how much might popularity vary from song to song?

* A priori we can expect the average popularity rating to be around 50; apart from that, we don’t have any strong prior information.

In [8]:
# the full data set is available from the bayesrule package for R. This is a reduced version which only contains song, author and popularity.
#The data set contains 350 songs by 44 artists
spotify = pd.read_csv("data/spotify.csv")

spotify

Unnamed: 0,artist,title,popularity
0,Alok,On & On,79
1,Alok,All The Lies,56
2,Alok,Hear Me Now,75
3,Alok,The Wall,65
4,Alok,Hear Me Now,52
...,...,...,...
345,Zeds Dead,Frontlines,58
346,Zeds Dead,Stardust,44
347,Zeds Dead,Save My Grave,54
348,Zeds Dead,Shake,49


# Hierarchical data set

The data set is hierarchical:

* it comprises  multiple songs for each of 44 artists 
* the artists  were sampled from the population of all artists that have songs on Spotify 

<img src='img/spotify-hierarchical-data-diagram.png' width=600 align="center" >



In [21]:
# Mean popularity of the songs of each artist.
# There are major differences between authors, both in the number of produced songs and in their  popularity.

artist_popularity=spotify.groupby(['artist']).mean().sort_values('popularity')
artist_popularity

Unnamed: 0_level_0,popularity
artist,Unnamed: 1_level_1
Mia X,13.25
Chris Goldarg,16.4
Soul&Roll,24.2
Honeywagon,31.666667
Röyksopp,33.25
Freestyle,33.666667
DA Image,36.666667
Jean Juan,36.8
TV Noise,38.142857
Kid Frost,40.666667


In [19]:
# Number of the songs of each artist, which varies between 2 and 40.
artist_count=spotify.groupby(['artist']).count().sort_values('popularity')
artist_count

Unnamed: 0_level_0,title,popularity
artist,Unnamed: 1_level_1,Unnamed: 2_level_1
Sean Kingston,2,2
David Lee Roth,3,3
Lil Skies,3,3
Tamar Braxton,3,3
Honeywagon,3,3
Michael Kiwanuka,3,3
Freestyle,3,3
The Wrecks,3,3
Elisa,3,3
León Larregui,3,3


# The structure of the data

* We use $i$ and $j$ subscripts to denote the $j$-th  artist and its  $i$-th song.
* $n_j$ is  the number of songs of  artist $j$ , $j \in  \{1,2,…,44\}$. 
*  $Y_{ij}$  represents the $i$-th song of artist $j$, where $i \in \{1,2,…,n_j\}$ and $j \in \{1,2,…,44\}$.

* The sample of 350  songs is the collection of 44 smaller samples, one for  each artist:

$Y := \left((Y_{11}, Y_{21}, \ldots, Y_{n_1,1}), (Y_{12}, Y_{22}, \ldots, Y_{n_2,2}), \ldots, (Y_{1,44}, Y_{2,44}, \ldots, Y_{n_{44},44})\right)  $

# Three different modelling approaches

* *Complete pooling*: ignore artists and lump all songs together

* *No pooling*:  analyze each artist independently.

* *Partial pooling* via hierarchical model. Model the population of  artists and the population of songs from each artist;  share  information about different artistist, to obtain more robust estimates for artists which few songs.

# Complete pooling 

* The complete pooled model ignores the clustering (or *grouping*) structure implied by the different authors.
*  It treats  all songs as a sample from the same population, without modelling the sub-populations (due to the different authors).
* As a simplifying assumption,  we assume  the ratings to be normally distributed, even if they are  somewhat left skewed.

<img src='img/spotify-density.png' width=400 align="center" >

# Quiz your self

* What would be the  prediction of the completely pooled model for:

    * a new song of  Mia X, the artist with the lowest mean popularity  (13) ? 
    * Beyoncé, the artist with nearly the highest mean popularity in our sample (70)?
    * Mohsen Beats, a group not present in the sample?




# Normal-normal model

* The Normal-normal model has a likelihood which is normal and a prior which is also normal.


* The popularity of every song is a draw from a normal distribution $N(\mu,\sigma)$.  Thus $\mu$ and $\sigma$ are shared by every song; those are  *global* parameters which do not vary by artist.

* $\mu$: global mean of the popularity
* $\sigma$: global standard deviation of the popularity, expressing variability  from song to song.

* A priori, our beliefs about the global mean is that it is around 50, but we are uncertain about it. Our uncertainty in this can be represent by saying that with probability 68% the global mean lies within (25, 75), in which the standard deviation of our prior is 25. Of course different choices are also plausible. The current speciation allows the mean to vary in (-25, 125), i.e. beyond the actual limits.

* We also set a broad prior on $\sigma$. 

* **fare vedere che queste prior possono essere decise campionando rv e poi guardando i quantili**

\begin{equation}
\begin{split}
Y_{ij} | \mu, \sigma & \sim N(\mu, \sigma^2) \\
\mu    & \sim N(50, 25^2) \\
\sigma & \sim HalfNormal (20) \\
\end{split}
\tag{16.1}
\end{equation}