# Estimating surprisal from language models
Take sentences from CommitmentBank, MegaAttitudes, and stimuli from experiment, mask the attitude predicate, and get predicted probability of occurrence for the target verb. Then, calculate from that the surprisal of the verb.

- Info on fill-mask pipeline: https://huggingface.co/transformers/main_classes/pipelines.html#transformers.FillMaskPipeline
- Info on particular models: https://huggingface.co/models


- Once the issues above are worked out, then the rest of this should be pretty straightforward.

- For discussion about which model is best, check out the following twitter thread: https://twitter.com/bruno_nicenboim/status/1379168059311656963

- Probably we should use GPT3, not BERT.

Helpful articles:
- https://towardsdatascience.com/how-to-use-bert-from-the-hugging-face-transformer-library-d373a22b0209

In [631]:
from transformers import pipeline
import pandas as pd
import numpy as np
import re
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

In [632]:
# This makes the display show more info
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

In [None]:
d = pd.read_csv("../data/data_for_lm.csv")

# Contents
1. [BERT](#BERT)
    1. [Mask-fill with complement](#Mask-fill-with-complement)
    2. [Mask-fill without complemement](#Mask-fill-without-complement)
2. [GPT-2](#GPT-2)

# BERT


#### How to get the probability of a multi-token word, which isn't in the model dictionary:
- This post says that you have to put a space before the target verb? https://github.com/huggingface/transformers/issues/547
- this post is how to predict complex words: https://stackoverflow.com/questions/59435020/get-probability-of-multi-token-word-in-mask-position

In [680]:
unmasker = pipeline('fill-mask', model='bert-large-uncased')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=571.0, style=ProgressStyle(description_…




Some weights of the model checkpoint at bert-large-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [806]:
# using indexing and key to get the relevant output score
unmasker("Dana was [MASK] that Mars has no water.",targets=" surprise")[0]['score']

1.3996995221532416e-05

In [807]:
unmasker(f"Dana was {unmasker.tokenizer.mask_token} that Mars has no water.",targets=" surprised")[0]['score']

0.0471670962870121

In [808]:
# using indexing and key to get the relevant output score
unmasker("Dana was [MASK]",targets=" surprised")[0]['score']

1.1388302745274359e-08

In [718]:
# remove cases where there's more than one mask
d["match"] = d.apply(lambda x: re.findall(fr'MASK',x['Masked']), axis=1)

In [728]:
d["len"] = d.match.apply(lambda x: len(x))

In [759]:
single = d.loc[d.len == 1]

In [737]:
single["BERT_score_CC"] = ""

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single["BERT_score"] = ""


## Mask-fill with complement

In [745]:
scores = [unmasker(x,targets=y)[0]['score'] for x,y in zip(single["Masked"],single["Verb"])]

The specified target token `hypothesize` does not exist in the model vocabulary. Replacing with `h`.
The specified target token `surmise` does not exist in the model vocabulary. Replacing with `sur`.
The specified target token `surmise` does not exist in the model vocabulary. Replacing with `sur`.
The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `ascertain` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `depress` does not exist in the model vocabulary. Replacing with `de`.
The specified target token `disapprove` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `flatter` does not exist in the model vocabulary. Replacing with `flat`.
The specified target token `hearten` does not exist in the model vocabulary. Replacing with `heart`.
The specified target token `ruminate` does not exist in the model vocabulary. Replacing with `rum`.

The specified target token `diagnose` does not exist in the model vocabulary. Replacing with `dia`.
The specified target token `stun` does not exist in the model vocabulary. Replacing with `stu`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `fantasize` does not exist in the model vocabulary. Replacing with `fan`.
The specified target token `reaffirm` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `feign` does not exist in the model vocabulary. Replacing with `fei`.
The specified target token `jest` does not exist in the model vocabulary. Replacing with `je`.
The specified target token `fret` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `certify` does not exist in the model vocabulary. Replacing with `ce`.
The specified target token `stutter` does not exist in the model vocabulary. Replacing with `stu`.
The specified targ

The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `unsettle` does not exist in the model vocabulary. Replacing with `un`.
The specified target token `agitate` does not exist in the model vocabulary. Replacing with `ag`.
The specified target token `scoff` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `invigorate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `presume` does not exist in the model vocabulary. Replacing with `pre`.
The specified target token `irritate` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `articulate` does not exist in the model vocabulary. Replacing with `art`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `overhear` does not exist in the model vocabulary. Replacing with `over`.
The

The specified target token `pinpoint` does not exist in the model vocabulary. Replacing with `pin`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `excite` does not exist in the model vocabulary. Replacing with `ex`.
The specified target token `perplex` does not exist in the model vocabulary. Replacing with `per`.
The specified target token `hoot` does not exist in the model vocabulary. Replacing with `ho`.
The specified target token `flatter` does not exist in the model vocabulary. Replacing with `flat`.
The specified target token `despise` does not exist in the model vocabulary. Replacing with `des`.
The specified target token `relish` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `repress` does not exist in the model vocabulary. Replacing with `rep`.
The specified target token `congratulate` does not exist in the model vocabulary. Replacing with `cong`.
The specified

The specified target token `bicker` does not exist in the model vocabulary. Replacing with `bi`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `bellow` does not exist in the model vocabulary. Replacing with `bell`.
The specified target token `astound` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `reiterate` does not exist in the model vocabulary. Replacing with `rei`.
The specified target token `tweet` does not exist in the model vocabulary. Replacing with `t`.
The specified target token `fabricate` does not exist in the model vocabulary. Replacing with `fabric`.
The specified target token `conceive` does not exist in the model vocabulary. Replacing with `con`.
The specified target token `chastise` does not exist in the model vocabulary. Replacing with `cha`.
The specified target token `posit` does not exist in the model vocabulary. Replacing with `po`.
The specified t

The specified target token `recap` does not exist in the model vocabulary. Replacing with `rec`.
The specified target token `scribble` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `corroborate` does not exist in the model vocabulary. Replacing with `co`.
The specified target token `prejudge` does not exist in the model vocabulary. Replacing with `pre`.
The specified target token `irritate` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `attest` does not exist in the model vocabulary. Replacing with `at`.
The specified target token `delude` does not exist in the model vocabulary. Replacing with `del`.
The specified target token `frustrate` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `omit` does not exist in the model vocabulary. Replacing with `om`.
The specified target token `deduce` does not exist in the model vocabulary. Replacing with `de`.
The specified ta

The specified target token `conceive` does not exist in the model vocabulary. Replacing with `con`.
The specified target token `miff` does not exist in the model vocabulary. Replacing with `mi`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `disapprove` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `insinuate` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `ascertain` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `nonplus` does not exist in the model vocabulary. Replacing with `non`.
The specified target token `agonize` does not exist in the model vocabulary. Replacing with `ago`.
The specified target token `overhear` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `sadden` does not exist in the model vocabulary. Replacing with `sad`.
The specif

The specified target token `rationalize` does not exist in the model vocabulary. Replacing with `rational`.
The specified target token `rant` does not exist in the model vocabulary. Replacing with `ran`.
The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `baffle` does not exist in the model vocabulary. Replacing with `ba`.
The specified target token `deplore` does not exist in the model vocabulary. Replacing with `de`.
The specified target token `enthrall` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `publicize` does not exist in the model vocabulary. Replacing with `public`.
The specified target token `underscore` does not exist in the model vocabulary. Replacing with `un

The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `infuriate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `signify` does not exist in the model vocabulary. Replacing with `sign`.
The specified target token `squeal` does not exist in the model vocabulary. Replacing with `sq`.
The specified target token `cringe` does not exist in the model vocabulary. Replacing with `cr`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `envision` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `disgruntle` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `startle` does not exist in the model vocabulary. Replacing with `start`.
The specified target token `gladden` does not exist in the model vocabulary. Replacing with `glad`.
The spe

The specified target token `annoy` does not exist in the model vocabulary. Replacing with `ann`.
The specified target token `astonish` does not exist in the model vocabulary. Replacing with `aston`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `allege` does not exist in the model vocabulary. Replacing with `all`.
The specified target token `irk` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `whine` does not exist in the model vocabulary. Replacing with `w`.
The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `obsess` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `allege` does not exist in the model vocabulary. Replacing with `all`.
The specified target t

The specified target token `baffle` does not exist in the model vocabulary. Replacing with `ba`.
The specified target token `yearn` does not exist in the model vocabulary. Replacing with `year`.
The specified target token `mistrust` does not exist in the model vocabulary. Replacing with `mist`.
The specified target token `compel` does not exist in the model vocabulary. Replacing with `com`.
The specified target token `ordain` does not exist in the model vocabulary. Replacing with `or`.
The specified target token `screech` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `foresee` does not exist in the model vocabulary. Replacing with `fore`.
The specified target token `disappoint` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `taunt` does not exist in the model vocabulary. Replacing with `tau`.
The specified target token `mumble` does not exist in the model vocabulary. Replacing with `mum`.
The specified t

The specified target token `yearn` does not exist in the model vocabulary. Replacing with `year`.
The specified target token `excite` does not exist in the model vocabulary. Replacing with `ex`.
The specified target token `mistrust` does not exist in the model vocabulary. Replacing with `mist`.
The specified target token `ridicule` does not exist in the model vocabulary. Replacing with `rid`.
The specified target token `gloat` does not exist in the model vocabulary. Replacing with `g`.
The specified target token `enthrall` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `foresee` does not exist in the model vocabulary. Replacing with `fore`.
The specified target token `energize` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `confide` does not exist in the model vocabulary. Replacing with `con`.
The specified tar

The specified target token `unsettle` does not exist in the model vocabulary. Replacing with `un`.
The specified target token `fret` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `tweet` does not exist in the model vocabulary. Replacing with `t`.
The specified target token `entice` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `terrorize` does not exist in the model vocabulary. Replacing with `terror`.
The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `hearten` does not exist in the model vocabulary. Replacing with `heart`.
The specified target token `tempt` does not exist in the model vocabulary. Replacing with `te`.
The specified target token `infuriate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `ridicule` does not exist in the model vocabulary. Replacing with `rid`.
The specified

The specified target token `misjudge` does not exist in the model vocabulary. Replacing with `mis`.
The specified target token `irk` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `inquire` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `deem` does not exist in the model vocabulary. Replacing with `dee`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `stun` does not exist in the model vocabulary. Replacing with `stu`.
The specified target token `infer` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `reaffirm` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `wager` does not exist in the model vocabulary. Replacing with `wage`.
The specified target token `o

The specified target token `invigorate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `chastise` does not exist in the model vocabulary. Replacing with `cha`.
The specified target token `bicker` does not exist in the model vocabulary. Replacing with `bi`.
The specified target token `hustle` does not exist in the model vocabulary. Replacing with `hu`.
The specified target token `deem` does not exist in the model vocabulary. Replacing with `dee`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `pester` does not exist in the model vocabulary. Replacing with `pest`.
The specified target token `insure` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `ordain` does not exist in the model vocabulary. Replacing with `or`.
The specified 

The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `coerce` does not exist in the model vocabulary. Replacing with `coe`.
The specified target token `diagnose` does not exist in the model vocabulary. Replacing with `dia`.
The specified target token `hanker` does not exist in the model vocabulary. Replacing with `hank`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `categorize` does not exist in the model vocabulary. Replacing with `cat`.
The specified target token `tickle` does not exist in the model vocabulary. Replacing with `tick`.
The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `presume` does not exist in the model vocabulary. Replacing with `pre`.
T

The specified target token `annoy` does not exist in the model vocabulary. Replacing with `ann`.
The specified target token `rant` does not exist in the model vocabulary. Replacing with `ran`.
The specified target token `tantalize` does not exist in the model vocabulary. Replacing with `tan`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `appease` does not exist in the model vocabulary. Replacing with `app`.
The specified target token `insinuate` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `omit` does not exist in the model vocabulary. Replacing with `om`.
The specified target token `deceive` does not exist in the model vocabulary. Replacing with `dec`.
The specified target token `compel` does not exist in the model vocabulary. Replacing with `com`.
The specified target token `amuse` does not exist in the model vocabulary. Replacing with `am`.
The specified targe

The specified target token `mystify` does not exist in the model vocabulary. Replacing with `my`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specified target token `signify` does not exist in the model vocabulary. Replacing with `sign`.
The specified target token `notify` does not exist in the model vocabulary. Replacing with `not`.
The specified target token `dissatisfy` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `mortify` does not exist in the model vocabulary. Replacing with `mort`.
The specified target token `terrify` does not exist in the model vocabulary. Replacing with `terri`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specified target token `certify` does not exist in the model vocabulary. Replacing with `ce`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specif

The specified target token `reason_out` does not exist in the model vocabulary. Replacing with `reason`.
The specified target token `flip_out` does not exist in the model vocabulary. Replacing with `flip`.
The specified target token `piece_together` does not exist in the model vocabulary. Replacing with `piece`.
The specified target token `come_out` does not exist in the model vocabulary. Replacing with `come`.
The specified target token `freak_out` does not exist in the model vocabulary. Replacing with `freak`.
The specified target token `point_out` does not exist in the model vocabulary. Replacing with `point`.
The specified target token `flip_out` does not exist in the model vocabulary. Replacing with `flip`.
The specified target token `figure_out` does not exist in the model vocabulary. Replacing with `figure`.
The specified target token `freak_out` does not exist in the model vocabulary. Replacing with `freak`.
The specified target token `piece_together` does not exist in the mode

In [748]:
len(scores)

6252

In [770]:
single["BertScoreCC"] = scores

In [768]:
single = single.drop(columns = ["BERT_score"])

In [763]:
single = single.drop(columns = ["match","len"]).rename(columns={"bert2":"BertScore"})

## Mask-fill without complement

In [None]:
df['text_new1'] = [x.split('::')[0] for x in df['text']]

In [795]:
# remove CC from Masked
single["Masked_noCC"] = [x.split('[MASK]')[0] + " [MASK]" for x in single['Masked']]
# single["Masked_noCC"] = single["Masked"].apply(lambda x: x.replace('(?<=\[MASK\])(.*$)',''))

In [797]:
single["BertScoreNoCC"] = ""

In [798]:
scores_noCC = [unmasker(x,targets=y)[0]['score'] for x,y in zip(single["Masked_noCC"],single["Verb"])]

The specified target token `hypothesize` does not exist in the model vocabulary. Replacing with `h`.
The specified target token `surmise` does not exist in the model vocabulary. Replacing with `sur`.
The specified target token `surmise` does not exist in the model vocabulary. Replacing with `sur`.
The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `ascertain` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `depress` does not exist in the model vocabulary. Replacing with `de`.
The specified target token `disapprove` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `flatter` does not exist in the model vocabulary. Replacing with `flat`.
The specified target token `hearten` does not exist in the model vocabulary. Replacing with `heart`.
The specified target token `ruminate` does not exist in the model vocabulary. Replacing with `rum`.

The specified target token `diagnose` does not exist in the model vocabulary. Replacing with `dia`.
The specified target token `stun` does not exist in the model vocabulary. Replacing with `stu`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `fantasize` does not exist in the model vocabulary. Replacing with `fan`.
The specified target token `reaffirm` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `feign` does not exist in the model vocabulary. Replacing with `fei`.
The specified target token `jest` does not exist in the model vocabulary. Replacing with `je`.
The specified target token `fret` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `certify` does not exist in the model vocabulary. Replacing with `ce`.
The specified target token `stutter` does not exist in the model vocabulary. Replacing with `stu`.
The specified targ

The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `unsettle` does not exist in the model vocabulary. Replacing with `un`.
The specified target token `agitate` does not exist in the model vocabulary. Replacing with `ag`.
The specified target token `scoff` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `invigorate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `presume` does not exist in the model vocabulary. Replacing with `pre`.
The specified target token `irritate` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `articulate` does not exist in the model vocabulary. Replacing with `art`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `overhear` does not exist in the model vocabulary. Replacing with `over`.
The

The specified target token `pinpoint` does not exist in the model vocabulary. Replacing with `pin`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `excite` does not exist in the model vocabulary. Replacing with `ex`.
The specified target token `perplex` does not exist in the model vocabulary. Replacing with `per`.
The specified target token `hoot` does not exist in the model vocabulary. Replacing with `ho`.
The specified target token `flatter` does not exist in the model vocabulary. Replacing with `flat`.
The specified target token `despise` does not exist in the model vocabulary. Replacing with `des`.
The specified target token `relish` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `repress` does not exist in the model vocabulary. Replacing with `rep`.
The specified target token `congratulate` does not exist in the model vocabulary. Replacing with `cong`.
The specified

The specified target token `bicker` does not exist in the model vocabulary. Replacing with `bi`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `bellow` does not exist in the model vocabulary. Replacing with `bell`.
The specified target token `astound` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `reiterate` does not exist in the model vocabulary. Replacing with `rei`.
The specified target token `tweet` does not exist in the model vocabulary. Replacing with `t`.
The specified target token `fabricate` does not exist in the model vocabulary. Replacing with `fabric`.
The specified target token `conceive` does not exist in the model vocabulary. Replacing with `con`.
The specified target token `chastise` does not exist in the model vocabulary. Replacing with `cha`.
The specified target token `posit` does not exist in the model vocabulary. Replacing with `po`.
The specified t

The specified target token `recap` does not exist in the model vocabulary. Replacing with `rec`.
The specified target token `scribble` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `corroborate` does not exist in the model vocabulary. Replacing with `co`.
The specified target token `prejudge` does not exist in the model vocabulary. Replacing with `pre`.
The specified target token `irritate` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `attest` does not exist in the model vocabulary. Replacing with `at`.
The specified target token `delude` does not exist in the model vocabulary. Replacing with `del`.
The specified target token `frustrate` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `omit` does not exist in the model vocabulary. Replacing with `om`.
The specified target token `deduce` does not exist in the model vocabulary. Replacing with `de`.
The specified ta

The specified target token `conceive` does not exist in the model vocabulary. Replacing with `con`.
The specified target token `miff` does not exist in the model vocabulary. Replacing with `mi`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `disapprove` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `insinuate` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `ascertain` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `nonplus` does not exist in the model vocabulary. Replacing with `non`.
The specified target token `agonize` does not exist in the model vocabulary. Replacing with `ago`.
The specified target token `overhear` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `sadden` does not exist in the model vocabulary. Replacing with `sad`.
The specif

The specified target token `rationalize` does not exist in the model vocabulary. Replacing with `rational`.
The specified target token `rant` does not exist in the model vocabulary. Replacing with `ran`.
The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `baffle` does not exist in the model vocabulary. Replacing with `ba`.
The specified target token `deplore` does not exist in the model vocabulary. Replacing with `de`.
The specified target token `enthrall` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `publicize` does not exist in the model vocabulary. Replacing with `public`.
The specified target token `underscore` does not exist in the model vocabulary. Replacing with `un

The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `infuriate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `signify` does not exist in the model vocabulary. Replacing with `sign`.
The specified target token `squeal` does not exist in the model vocabulary. Replacing with `sq`.
The specified target token `cringe` does not exist in the model vocabulary. Replacing with `cr`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `envision` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `disgruntle` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `startle` does not exist in the model vocabulary. Replacing with `start`.
The specified target token `gladden` does not exist in the model vocabulary. Replacing with `glad`.
The spe

The specified target token `annoy` does not exist in the model vocabulary. Replacing with `ann`.
The specified target token `astonish` does not exist in the model vocabulary. Replacing with `aston`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `allege` does not exist in the model vocabulary. Replacing with `all`.
The specified target token `irk` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `whine` does not exist in the model vocabulary. Replacing with `w`.
The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `obsess` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `allege` does not exist in the model vocabulary. Replacing with `all`.
The specified target t

The specified target token `baffle` does not exist in the model vocabulary. Replacing with `ba`.
The specified target token `yearn` does not exist in the model vocabulary. Replacing with `year`.
The specified target token `mistrust` does not exist in the model vocabulary. Replacing with `mist`.
The specified target token `compel` does not exist in the model vocabulary. Replacing with `com`.
The specified target token `ordain` does not exist in the model vocabulary. Replacing with `or`.
The specified target token `screech` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `foresee` does not exist in the model vocabulary. Replacing with `fore`.
The specified target token `disappoint` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `taunt` does not exist in the model vocabulary. Replacing with `tau`.
The specified target token `mumble` does not exist in the model vocabulary. Replacing with `mum`.
The specified t

The specified target token `yearn` does not exist in the model vocabulary. Replacing with `year`.
The specified target token `excite` does not exist in the model vocabulary. Replacing with `ex`.
The specified target token `mistrust` does not exist in the model vocabulary. Replacing with `mist`.
The specified target token `ridicule` does not exist in the model vocabulary. Replacing with `rid`.
The specified target token `gloat` does not exist in the model vocabulary. Replacing with `g`.
The specified target token `enthrall` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `foresee` does not exist in the model vocabulary. Replacing with `fore`.
The specified target token `energize` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `confide` does not exist in the model vocabulary. Replacing with `con`.
The specified tar

The specified target token `unsettle` does not exist in the model vocabulary. Replacing with `un`.
The specified target token `fret` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `tweet` does not exist in the model vocabulary. Replacing with `t`.
The specified target token `entice` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `terrorize` does not exist in the model vocabulary. Replacing with `terror`.
The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `hearten` does not exist in the model vocabulary. Replacing with `heart`.
The specified target token `tempt` does not exist in the model vocabulary. Replacing with `te`.
The specified target token `infuriate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `ridicule` does not exist in the model vocabulary. Replacing with `rid`.
The specified

The specified target token `misjudge` does not exist in the model vocabulary. Replacing with `mis`.
The specified target token `irk` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `inquire` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `deem` does not exist in the model vocabulary. Replacing with `dee`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `stun` does not exist in the model vocabulary. Replacing with `stu`.
The specified target token `infer` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `reaffirm` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `wager` does not exist in the model vocabulary. Replacing with `wage`.
The specified target token `o

The specified target token `invigorate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `chastise` does not exist in the model vocabulary. Replacing with `cha`.
The specified target token `bicker` does not exist in the model vocabulary. Replacing with `bi`.
The specified target token `hustle` does not exist in the model vocabulary. Replacing with `hu`.
The specified target token `deem` does not exist in the model vocabulary. Replacing with `dee`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `pester` does not exist in the model vocabulary. Replacing with `pest`.
The specified target token `insure` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `ordain` does not exist in the model vocabulary. Replacing with `or`.
The specified 

The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `coerce` does not exist in the model vocabulary. Replacing with `coe`.
The specified target token `diagnose` does not exist in the model vocabulary. Replacing with `dia`.
The specified target token `hanker` does not exist in the model vocabulary. Replacing with `hank`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `categorize` does not exist in the model vocabulary. Replacing with `cat`.
The specified target token `tickle` does not exist in the model vocabulary. Replacing with `tick`.
The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `presume` does not exist in the model vocabulary. Replacing with `pre`.
T

The specified target token `annoy` does not exist in the model vocabulary. Replacing with `ann`.
The specified target token `rant` does not exist in the model vocabulary. Replacing with `ran`.
The specified target token `tantalize` does not exist in the model vocabulary. Replacing with `tan`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `appease` does not exist in the model vocabulary. Replacing with `app`.
The specified target token `insinuate` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `omit` does not exist in the model vocabulary. Replacing with `om`.
The specified target token `deceive` does not exist in the model vocabulary. Replacing with `dec`.
The specified target token `compel` does not exist in the model vocabulary. Replacing with `com`.
The specified target token `amuse` does not exist in the model vocabulary. Replacing with `am`.
The specified targe

The specified target token `mystify` does not exist in the model vocabulary. Replacing with `my`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specified target token `signify` does not exist in the model vocabulary. Replacing with `sign`.
The specified target token `notify` does not exist in the model vocabulary. Replacing with `not`.
The specified target token `dissatisfy` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `mortify` does not exist in the model vocabulary. Replacing with `mort`.
The specified target token `terrify` does not exist in the model vocabulary. Replacing with `terri`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specified target token `certify` does not exist in the model vocabulary. Replacing with `ce`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specif

The specified target token `reason_out` does not exist in the model vocabulary. Replacing with `reason`.
The specified target token `flip_out` does not exist in the model vocabulary. Replacing with `flip`.
The specified target token `piece_together` does not exist in the model vocabulary. Replacing with `piece`.
The specified target token `come_out` does not exist in the model vocabulary. Replacing with `come`.
The specified target token `freak_out` does not exist in the model vocabulary. Replacing with `freak`.
The specified target token `point_out` does not exist in the model vocabulary. Replacing with `point`.
The specified target token `flip_out` does not exist in the model vocabulary. Replacing with `flip`.
The specified target token `figure_out` does not exist in the model vocabulary. Replacing with `figure`.
The specified target token `freak_out` does not exist in the model vocabulary. Replacing with `freak`.
The specified target token `piece_together` does not exist in the mode

In [799]:
single["BertScoreNoCC"] = scores_noCC

# Save to CSV

In [801]:
single.to_csv("../data/bert_scores.csv")