# Metadata

```
Course:  DS 5001
Module:  03 Lab
Topic:   Entropy and Peplexity
Author:  R.C. Alvarado
Purpose: Clarify concept of perplexity.
```

# Entropy

**Probability $p$**

$\Large p = \frac{n}{N}$

$p(w) = \Large\frac{n_w}{N_{corpus}}$ 

`p = n / n.sum()`

Most terms have low probability.

**Surprise $s$**

$\Large s = \Large\frac{1}{p}$

$s(w) = p(w)^{-1}$

Surrprise $s$ increases as the inverse of $p$. Note how inverting $p$ adds variance to the long tail; the curve now looks like a simple quadratic. We can see a more gradual increase in surprise as terms become more rare.

<!-- V.s.value_counts().plot(style='*-') -->

**Information $i$**

$\Large i= log_2(s)$

$i(w) = log_2(s(w))$

As normalized suprise, information now has a long tail structure. But notice also the range of information -- it is between 1 and 18. What does this correspond to?

<!-- V.i.value_counts().plot(style='*-'); -->

**Entropy $h$**

$\Large h = p i$

$h(w) = p(w)i(w)$

For the self-entropy of each term, we multiply $p$ and $i$. When summed, this will give us the expectation of the information in the distribution, i.e. it's entropy.

<!-- V.h.value_counts().plot(style='*-'); -->

**Perplexity $PP$**

$\Large PP = \Large 2^{i}$

**Chiasmus**

The process of computing entropy follows a chiasmus pattern.

$A_1 \rightarrow B_1 \rightarrow B_2 \rightarrow A_2$  

<!--
$p := A_1, s := B_1, i := B_2, h := A_2$
-->

$p \rightarrow s \rightarrow i \rightarrow h$ 

$A: \{p,h\}$

$B: \{s,i\}$

# Setup

## Libraries

In [1]:
import pandas as pd

## Config

In [6]:
data_home = "../data"
ohco = ['book_id','chap_num','para_num','sent_num','token_num']

## Import data

In [68]:
K = pd.read_csv(f"{data_home}/output/austen-combo-TOKENS.csv").set_index(ohco)
V = pd.read_csv(f"{data_home}/output/austen-combo-VOCAB.csv").set_index('term_str')
LM = {}
for n in range(1, 4):
    widx = [f"w{i}" for i in range(n)]
    LM[n] = pd.read_csv(f"{data_home}/output/austen-combo-LM{n}.csv").set_index(widx)

# Compute Perplexity

In [52]:
K['i'] = K.term_str.map(V.i)
K['h'] = K.term_str.map(V.h)

In [53]:
2**((V.n * V.i).sum() / V.n.sum())

568.0180578365186

In [54]:
2**(K.i.sum() / K.shape[0])

568.0180578365186

In [55]:
2**K.i.mean()

568.0180578365186

In [56]:
K.i.mean()

9.149792984886869

In [69]:
for n in range(1, 4):

    M = LM[n]
    i_col = 'i'
    if n > 1: i_col = 'ci'

    N = int(M.n.sum())
    i_sum = (M.n * M[i_col]).sum()
    i_mean = i_sum / N
    pp = 2**i_mean
    
    print('model:', n, 'N:', N, 'i_sum:', i_sum, 'i_mean:', i_mean, 'pp:', pp)

model: 1 N: 232972 i_sum: 2025835.1778587154 i_mean: 8.695616545587947 pp: 414.6115641028022
model: 2 N: 232971 i_sum: 828137.8382266178 i_mean: 3.5546820772826564 pp: 11.750759298549298
model: 3 N: 232970 i_sum: 238769.1546711369 i_mean: 1.0248922808564918 pp: 2.03480744918621


In [None]:
M

In [None]:
M.cp.sort_values().plot(style='.', rot=45, title='CP');

In [None]:
M.ci.sort_values().plot(style='.', rot=45, title='CI');

In [None]:
M.ch.sort_values().plot(style='.', rot=45, title='CH');

# Notes

## Cross Entropy and Perplexity

### Probabilities of Sequences

$ W = W_1^N = (w_1, w_2 ... w_N)$

True distribution: $ p = p(W) $

Model distribution: $ q = q(W) $

### Cross Entropy

$ H(p, q) = - \sum_{x}^{} p(x) log_2(q(x)) $ 

$ H(p, q) = \sum_{x}^{} p(x) log_2(\frac{1}{q(x)}) $ 

$ i_q(x) = log_2(\frac{1}{q(x)}) $

$ H(p, q) = \sum_{x} p(x) i_q(x) $ 

$ H(p, q) = \vec{p} \cdot \vec{i_q} $

### Cross Entropy relative to MaxEnt

$ N = C(x) = \sum_x c(x) $

$ p_{u} = \frac{1}{N} $ 

$ H_{cross} = H(p_u, q) $

$ H_{cross} = \sum_{x} \frac{1}{N} i(x) $

$ H_{cross} = \frac{1}{N} \sum_{x} i(x) $

$ H_{cross} = \frac{\sum_x i(x)}{N} $

$ H_{cross} = \frac{ |\vec{i}|_1 }{ N } $



#### Perplexity

$ PP(W) = P(w_1, w_2 ... w_N)^{-1/N} $

$ PP(p) = 2^{H(p)}$

$ PP(p_u, q) = 2^{H_{cross}}$

#### Redundancy

$ H_{max} = log_2(N) $

$ H_{max} = i(p_u) $

$ R = 1 - \frac{H}{H_{max}} $

## From J & M
<img src="perplexity.png">

## From Stack Overflow
https://stats.stackexchange.com/questions/129352/how-to-find-the-perplexity-of-a-corpus
<img src="stackover1.png">
<img src="stackover2.png">

## Perplexity

$ PP(w) = 2^{-\frac{\ell(w)}{M}} $

$ -\ell(w) = i_2(w) $

$ PP(w) = 2^{\frac{i_2(w)}{M}} $

## Perplexity

$ PP(W) = 2^{-\frac{\ell(W)}{N}} $

$ i_2(W) = -\ell(W) $

$ PP(W) = 2^{\frac{i_2(W)}{N}} $