In this simple notebook we use the package `lm-scorer` to calculate the log probability of a sentence based on a GPT2 language model.

The initial idea was to use this:

https://discuss.huggingface.co/t/generation-probabilities-how-to-compute-probabilities-of-output-scores-for-gpt2/3175

But it seemed complicated and not intuitive and I was lazy.

The goal now is to fine-tune a GPT2 model with poetry to see whether a text "sounds" like poetry or not (fluency feature in Erato).

In [3]:
#!git clone https://github.com/simonepri/lm-scorer.git

Cloning into 'lm-scorer'...
remote: Enumerating objects: 396, done.[K
remote: Counting objects: 100% (114/114), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 396 (delta 106), reused 101 (delta 101), pack-reused 282[K
Receiving objects: 100% (396/396), 4.68 MiB | 3.71 MiB/s, done.
Resolving deltas: 100% (214/214), done.


In [7]:
#!mv lm-scorer/lm_scorer/ ./

In [8]:
from lm_scorer.models.auto import AutoLMScorer as LMScorer

In [10]:
import torch
from lm_scorer.models.auto import AutoLMScorer as LMScorer

# Available models
list(LMScorer.supported_model_names())
# => ["gpt2", "gpt2-medium", "gpt2-large", "gpt2-xl", distilgpt2"]

# Load model to cpu or cuda
device = "cuda:0" if torch.cuda.is_available() else "cpu"
batch_size = 1
scorer = LMScorer.from_pretrained("gpt2", device=device, batch_size=batch_size)

# Return token probabilities (provide log=True to return log probabilities)
scorer.tokens_score("I like this package.")
# => (scores, ids, tokens)
# scores = [0.018321, 0.0066431, 0.080633, 0.00060745, 0.27772, 0.0036381]
# ids    = [40,       588,       428,      5301,       13,      50256]
# tokens = ["I",      "Ġlike",   "Ġthis",  "Ġpackage", ".",     "<|endoftext|>"]

([0.018321018666028976,
  0.006643158383667469,
  0.08063224703073502,
  0.0006074536358937621,
  0.27771326899528503,
  0.003638095920905471],
 [40, 588, 428, 5301, 13, 50256],
 ['I', 'Ġlike', 'Ġthis', 'Ġpackage', '.', '<|endoftext|>'])

In [11]:
# Compute sentence score as the product of tokens' probabilities
scorer.sentence_score("I like this package.", reduce="prod")
# => 6.0231e-12

6.023056185744391e-12

In [12]:
# Compute sentence score as the mean of tokens' probabilities
scorer.sentence_score("I like this package.", reduce="mean")
# => 0.064593


0.06459253281354904

In [13]:
# Compute sentence score as the geometric mean of tokens' probabilities
scorer.sentence_score("I like this package.", reduce="gmean")
# => 0.013489



0.013488681055605412

In [14]:
# Compute sentence score as the harmonic mean of tokens' probabilities
scorer.sentence_score("I like this package.", reduce="hmean")
# => 0.0028008



0.002800857648253441

In [15]:
# Get the log of the sentence score.
scorer.sentence_score("I like this package.", log=True)
# => -25.835



-25.835426330566406

In [16]:
# Score multiple sentences.
scorer.sentence_score(["Sentence 1", "Sentence 2"])
# => [1.1508e-11, 5.6645e-12]

# NB: Computations are done in log space so they should be numerically stable.

[1.1507757420592402e-11, 5.66442205640616e-12]

In [17]:
# Score multiple sentences.
scorer.sentence_score(["This is bullshit", "This bullshit is"])
# => [1.1508e-11, 5.6645e-12]

# NB: Computations are done in log space so they should be numerically stable.

[3.696429967670056e-11, 8.797713635887421e-15]