In [11]:
import numpy as np
from scipy.stats import entropy
from IPython.display import display, Math

We want to look at the 'peakiness' of the distribution of PMI values for a given word.  Given a PMI matrix, where each row corresponds the the PMI of a given word with all the other words in the sentence, define peakiness as a function of a row of PMI values:
$$\text{peakiness}(\text{row}) = 1- \frac{S(\text{row}) }{ \log_2(\text{sentence length}) }$$
where $$S(\text{row}) = -\sum_{i \in \hat{\text{row}}} i \log_2(i),$$ the entropy of the row, normalized, treated as a probability vector.

In [184]:
# examples
def peakiness(vec):
    return 1 - entropy(vec, base=2)/np.log2(len(vec))
examples = ([1,0,0,0],[0,1,1,1],[1,1,1,1])
display(Math(r"\text{peakiness}: V \to \mathbb{R}[0,1]"))
for row in examples:
    display(Math(rf"{row} \mapsto {peakiness(row)}"))

<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

One issue, however, if we are going to treat a sequence of PMI scores as a probability vector, is that PMI may be negative.  An intuitive way to do this still eludes me.  Shifting the values by the minimum value will not preserve the intuitive peakiness (for instance, taking $[0,0,0,1,-1] \to [1,1,1,2,0]$ becomes much 'flatter' than it should).  Taking the absolute value also doesn't seem right, since the distinction between positive and negative is meaningful (in takin $[0,0,0,1,-1] \to [0,0,0,1,1]$, we lose the information that the penultimate position has the maximum PMI).

Perhaps we should use some notion of total variation to measure peakiness?

In [155]:
# example
RESULTS_DIR = "results/distilbert-base-cased(5)_pad10_2020-06-30-13-43/" 
npz = np.load(RESULTS_DIR + 'pmi_matrices.npz')

for sentence, matrix in npz.items():
    print(sentence)
    matrix = matrix + np.transpose(matrix) # symmetrize
    for i, row in enumerate(matrix):
        row -= min(row) # shift to remove negative values
        row = row[np.arange(len(row))!=i] # remove diagonal
        print(peakiness(row))
    print()

We 're about to see if advertising works .
0.5184355844078246
0.2486014563695923
0.15419496887546558
0.3623848885319584
0.29113989940705765
0.2479175793387406
0.18143920109804623
0.2830238145592764
0.21864999485332337

Odds and Ends
0.02699802995067968
0.016740503786128125
0.0825769775345

Not his autograph ; power-hitter McGwire 's .
0.11545121521735024
0.13459092655179405
0.15198408157112553
0.24378931866111697
0.13362304303492478
0.2600062100941012
0.15749299334349165
0.10205521779744886

FRANKFURT :
nan
nan

Other brokerage firms , including Merrill Lynch & Co. , were plotting out potential new ad strategies .
0.1143776777044534
0.05777756483148322
0.1082898765501169
0.26014044565979433
0.11820719121620504
0.16943801238939205
0.07416295604587408
0.10607769499187591
0.10876216521325721
0.09359218548666293
0.19446574654765514
0.3242239425304474
0.4385697938472425
0.23936938034928645
0.07590997982803793
0.031906424758163854
0.042134580538705
0.059202807821267966

