# Estimating surprisal from language models
Take sentences from CommitmentBank, MegaAttitudes, and stimuli from experiment, mask the attitude predicate, and get predicted probability of occurrence for the target verb. Then, calculate from that the surprisal of the verb.

In [631]:
from transformers import pipeline
import pandas as pd
import numpy as np
import re
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

In [632]:
# This makes the display show more info
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

# Contents
1. [Read in the three datasets](#Read-in-the-three-datasets)
2. [Masking out the correct verb](#Masking-out-the-correct-verb)
    1. [Remaining cases](#Remaining-cases)
    2. [Proposed Solution](#Proposed-Solution)
        1. [Step 1. Create a new column with list of pos tagged verbs from Sentence](#Step-1.-Create-a-new-column-with-list-of-pos-tagged-verbs-from-Sentence)
        2. [Step 2. Lemmatize VerbList](#Step-2.-Lemmatize-VerbList)
        3. [TROUBLESHOOT NEEDED](#TROUBLESHOOT-NEEDED)
    3. [Combine the datafriends together again](#Combine-the-dataframes-together-again)
    4. [Mask out the VerbToken from Sentence](#Mask-out-the-VerbToken-from-Sentence)
4. [Masked language modeling to estimate surprisal](#Masked-language-modeling-to-estimate-surprisal)

# Read in the three datasets
- Subset the dfs to just the relevant columns: ID, Verb, Sentence
- Make sure that the column names are consistent across the tree dfs

In [633]:
# CommitmentBank
# raw url: https://raw.githubusercontent.com/khuyen-le/projectivity-factors/master/data/CommitmentBank-All.csv
cb = pd.read_csv("../data/CommitmentBank-ALL.csv")[["uID","Verb","Target"]].drop_duplicates()
cb = cb.rename(columns={"Target": "Sentence","uID":"ID"})
len(cb)

1200

In [634]:
# MegaVeridicality
# raw URL: https://raw.githubusercontent.com/khuyen-le/projectivity-factors/master/data/mega-veridicality-v2.csv
mv = pd.read_csv("../data/mega-veridicality-v2.csv")[["verb","frame","voice","sentence"]].drop_duplicates()
mv = mv.rename(columns={"verb": "Verb", "sentence":"Sentence"})
mv["ID"] = mv[['frame', 'voice']].apply(lambda x: '_'.join(x), axis=1)
mv = mv.drop(columns=["frame","voice"])
len(mv)

5026

In [635]:
# Arousal/Valence Study
# raw URL: https://raw.githubusercontent.com/khuyen-le/projectivity-factors/master/data/1_sliderprojection/exp1_test-trials.csv
vs = pd.read_csv("../data/1_sliderprojection/exp1_test-trials.csv")[["Word","utterance","exp"]]
vs = vs[vs["exp"]=="stim"].drop_duplicates().drop(columns={"exp"})
vs = vs.rename(columns={"Word": "Verb","utterance":"Sentence"})
vs["ID"] = "projection"
len(vs)

54

In [636]:
# Combine them together into one df
df = pd.concat([cb,mv,vs])

In [637]:
1200 + 5026 + 54

6280

In [638]:
len(df)

6280

# Getting the correct verb token
What we need to do is mask out the correct verb in each of the sentences. We have the correct verb in the Verb column. We can easily use apply() with str.replace() to switch the verb with [MASK]. The problem is that the verbs in the sentences are inflected tokens, while the verbs in Verb are lemmatized.


For some of the verbs, we don't need to worry about this problem because there is morphological overlap between the Verb Token and the Verb Lemma. 


Solution:
1. Create a new verb token column
2. Regex + literal string interpolation to match works in cases where the Verb matches morphologically

In [639]:
# Frustratingly, this isn't working
# df["VerbToken"] = df['Sentence'].str.extract(fr'({df["Verb"]}\w*)')

Find a match in the Sentence column for the verb from the Verb column using a regex re.search() returns a match object, so you have to call .group() to get the string that is matched. In cases where there is no match, a NoneType object is returned and you can't call .group() on that. 

In [640]:
df["Token"] = df.apply(lambda x: re.search(fr'({x["Verb"]}\w*)',x['Sentence']), axis=1)

# In some cases there is nothing captured, it returns a NoneType and causes the code to fail
# because NoneType has no method .group()
df["Token"] = df["Token"].apply(lambda x: x.group() if x is not None else x)

## Mask out the VerbToken

In [641]:
nonempty = df[~df["Token"].isnull()]

In [642]:
nonempty["Masked"] = nonempty.apply(lambda x: x['Sentence'].replace(x["Token"],"[MASK]"),axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nonempty["Masked"] = nonempty.apply(lambda x: x['Sentence'].replace(x["Token"],"[MASK]"),axis=1)


In [643]:
len(nonempty)

5711

In [644]:
5711+569

6280

In [645]:
len(df)

6280

# Remaining cases

In [646]:
# cases where the above solution did not work
empty = df[df["Token"].isnull()]
len(empty)

569

In [647]:
len(empty)/len(df)*100

9.060509554140127

## Proposed Solution
Overarching: lemmatize Sentence, find the verb lemma that matches the respective Verb column. But we actually need the actual verb token not the lemma, because to replace the correct verb in Sentence with [Mask], we will need to extract the relevant token in order to do a successful str.replace().

More concrete:
1. Make a new column with POS tag verbs from Sentence
2. Lemmatize the verbs from the new column
3. Here there be dragons

### Step 1. Create a new column with list of pos tagged verbs from Sentence

In [648]:
def get_verb(sentence):
    nltk_tagged = nltk.pos_tag(nltk.word_tokenize(sentence))
    verbs = []
    for i in nltk_tagged:
        if 'VB' in i[1]:
            verbs.append(i)
    return verbs

In [649]:
empty["VerbList"] = empty["Sentence"].apply(lambda x: get_verb(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  empty["VerbList"] = empty["Sentence"].apply(lambda x: get_verb(x))


### Step 2. Lemmatize VerbList

In [650]:
# code from: https://gaurav5430.medium.com/using-nltk-for-lemmatizing-sentences-c1bfff963258

# initialize the lemmatizer
lemmatizer = WordNetLemmatizer()

# function to convert nltk tag to wordnet tag
def nltk_tag_to_wordnet_tag(nltk_tag):
    if nltk_tag.startswith('J'):
        return wordnet.ADJ
    elif nltk_tag.startswith('V'):
        return wordnet.VERB
    elif nltk_tag.startswith('N'):
        return wordnet.NOUN
    elif nltk_tag.startswith('R'):
        return wordnet.ADV
    else:          
        return None

def lemmatize_from_nltk_tagged_list(nltk_tagged):
    #tuple of (token, wordnet_tag)
    wordnet_tagged = map(lambda x: (x[0], nltk_tag_to_wordnet_tag(x[1])), nltk_tagged)
    lemmatized_sentence = []
    for word, tag in wordnet_tagged:
        if tag is None:
            #if there is no available tag, append the token as is
            lemmatized_sentence.append(word)
        else:        
            #else use the tag to lemmatize the token
            lemmatized_sentence.append(lemmatizer.lemmatize(word, tag))
    return lemmatized_sentence

empty["VerbListLemmatized"] = empty["VerbList"].apply(lambda x: lemmatize_from_nltk_tagged_list(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  empty["VerbListLemmatized"] = empty["VerbList"].apply(lambda x: lemmatize_from_nltk_tagged_list(x))


### pull just the Verb Token and Lemma from the VerbListLemmatized

In [651]:
# solution by brandon papineau
check_list = []
iter_list = []
for index,row in empty.iterrows():
    inner_list = []
    if row["Verb"] not in check_list:
        check_list.append(row["Verb"])
    for i in row["VerbListLemmatized"]:
        if i in check_list:
            lemma = i
            locator = row["VerbListLemmatized"].index(i)
            tagged = row["VerbList"][locator]
            inner_list.append([lemma,tagged])
    iter_list.append(inner_list)
empty["LemmaTokenPair"] = iter_list

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  empty["LemmaTokenPair"] = iter_list


In [652]:
len(empty)

569

# Separate out the sucessful cases

In [653]:
good = empty[empty.astype(str)["LemmaTokenPair"] != "[]"]
len(good)

425

In [654]:
good["Token"] = good['LemmaTokenPair'].apply(lambda x: x[0][1][0])
len(good)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  good["Token"] = good['LemmaTokenPair'].apply(lambda x: x[0][1][0])


425

### Mask out the verb token

In [655]:
good["Masked"] = good.apply(lambda x: x['Sentence'].replace(x["Token"],"[MASK]"),axis=1)
len(good)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  good["Masked"] = good.apply(lambda x: x['Sentence'].replace(x["Token"],"[MASK]"),axis=1)


425

# Separate out the unsucessful cases

Several cases aren't getting caught because:
1. The word isn't correctly tagged as a verb, so not ending up in VerbList in the first place
    - example: 'thought' in item 901
2. The word isn't lemmatized correctly, so the match with Verb isn't happening
    - examples: 'felt'
3. Orthographic differences/errors
    - examples: 'realize'/'realise', 'facinate' / 'fascinate'
4. Cases where the Verb has a particle ---> the majority of cases
    - 'flip_out' vs. 'flip'

In [656]:
missing = empty[empty.astype(str)["LemmaTokenPair"] == "[]"]
len(missing)

144

## Look at cases of Type 4
Solution: same as before but try a str.constains or something

In [657]:
type_4 = missing.loc[missing["Verb"].str.contains("_")].drop(columns={"LemmaTokenPair"})
len(type_4)

101

In [658]:
type_4a = type_4["Verb"].str.split("_",expand=True)
type_4 = type_4a.merge(type_4, left_index = True, right_index = True).rename(columns={0:"VerbSplit"})

In [659]:
# run BPap's code again
check_list = []
iter_list = []
for index,row in type_4.iterrows():
    inner_list = []
    if row["VerbSplit"] not in check_list: # search for check on the result of splitting the Verb column
        check_list.append(row["VerbSplit"])
    for i in row["VerbListLemmatized"]:
        if i in check_list:
            lemma = i
            locator = row["VerbListLemmatized"].index(i)
            tagged = row["VerbList"][locator]
            inner_list.append([lemma,tagged])
    iter_list.append(inner_list)
type_4["LemmaTokenPair"] = iter_list

In [660]:
# Split up the LemmaTokenPair column into two columns
type_4 = type_4.LemmaTokenPair.apply(pd.Series).merge(type_4, left_index = True, right_index = True)
len(type_4)

101

## Cases where that worked

In [661]:
t4_nonnull = type_4.loc[~type_4[0].isnull()]
len(t4_nonnull)

82

In [662]:
t4_nonnull = t4_nonnull[0].apply(pd.Series).merge(t4_nonnull, left_index = True, right_index = True)
t4_nonnull = t4_nonnull.rename(columns={"0_x":"Lemma"})
t4_nonnull["Token"] = t4_nonnull["1_x"].apply(lambda x: x[0])

### Mask out 

In [663]:
t4_nonnull["Masked"] = t4_nonnull.apply(lambda x: x['Sentence'].replace(x["Token"],"[MASK]"),axis=1)

In [664]:
len(t4_nonnull)

82

## Missed cases

In [665]:
t4_null = type_4.loc[type_4[0].isnull()]
len(t4_null)

19

In [666]:
t4_null = t4_null.drop(columns=[0])
len(t4_null)

19

In [667]:
t4_null["Token"] = t4_null.apply(lambda x: re.search(fr'({x["VerbSplit"]}\w*)',x['Sentence']), axis=1)
t4_null["Token"] = t4_null["Token"].apply(lambda x: x.group() if x is not None else x)

### Mask Out

In [668]:
t4_null["Masked"] = t4_null.apply(lambda x: x['Sentence'].replace(x["Token"],"[MASK]"),axis=1)

In [669]:
len(t4_null)

19

## Everything else

In [670]:
every_else = missing.loc[~missing["Verb"].str.contains("_")].drop(columns={"LemmaTokenPair"})
len(every_else)

43

In [671]:
every_else = every_else[["ID","Verb","Sentence"]]
every_else["Token"] = ""

In [672]:
# orthographic differences/errors
every_else["Verb"].loc[every_else["Verb"]=="facinate"] = "fascinate"
every_else["Verb"].loc[every_else["Verb"]=="realize"] = "realise"

every_else["Token"] = every_else.apply(lambda x: re.search(fr'({x["Verb"]}\w*)',x['Sentence']), axis=1)
every_else["Token"] = every_else["Token"].apply(lambda x: x.group() if x is not None else x)

In [673]:
# Tokenize all the irregular conjugations
every_else["Token"].loc[every_else["Verb"]=="understand"] = "understood"
every_else["Token"].loc[every_else["Verb"]=="feel"] = "felt"
every_else["Token"].loc[every_else["Verb"]=="think"] = "thought"
every_else["Token"].loc[every_else["Verb"]=="hope"] = "hoping"
every_else["Token"].loc[every_else["Verb"]=="see"] = "saw"
every_else["Token"].loc[every_else["Verb"]=="spellbind"] = "spellbound"
every_else["Token"].loc[every_else["Verb"]=="sing"] = "sung"
every_else["Token"].loc[every_else["Verb"]=="swear"] = "swore"
every_else["Token"].loc[every_else["Verb"]=="bear"] = "borne"
every_else["Token"].loc[every_else["Verb"]=="choose"] = "chosen"
every_else["Token"].loc[every_else["Verb"]=="undertake"] = "undertook"
every_else["Token"].loc[every_else["Verb"]=="uphold"] = "upheld"
every_else["Token"].loc[every_else["Verb"]=="satisfy"] = "satisfied"
every_else["Token"].loc[every_else["Verb"]=="teach"] = "taught"
every_else["Token"].loc[every_else["Verb"]=="foretell"] = "foretold"
every_else["Token"].loc[every_else["Verb"]=="curse"] = "curst"
every_else["Token"].loc[every_else["Verb"]=="send"] = "sent"
every_else["Token"].loc[every_else["Verb"]=="teach"] = "taught"
every_else["Token"].loc[every_else["Verb"]=="weep"] = "wept"
every_else["Token"].loc[every_else["Verb"]=="fight"] = "faught"
every_else["Token"].loc[every_else["Verb"]=="forbid"] = "forbade"
len(every_else)

43

### MASK OUT

In [674]:
every_else["Masked"] = every_else.apply(lambda x: x['Sentence'].replace(x["Token"],"[MASK]"),axis=1)

In [675]:
len(every_else)

43

# Combine the dataframes together again

In [676]:
print(f"nonempty: {len(nonempty)}")
print(f"good: {len(good)}")
print(f"t4_nonnull: {len(t4_nonnull)}")
print(f"t4_null: {len(t4_null)}")
print(f"every_else: {len(every_else)}")
print(f"total: {len(nonempty) + len(good) + len(t4_nonnull) + len(t4_null) + len(every_else)}")

nonempty: 5711
good: 425
t4_nonnull: 82
t4_null: 19
every_else: 43
total: 6280


In [677]:
nonempty = nonempty[["ID","Verb","Sentence","Masked","Token"]]
good = good[["ID","Verb","Sentence","Masked","Token"]]
t4_nonnull = t4_nonnull[["ID","Verb","Sentence","Masked","Token"]]
t4_null = t4_null[["ID","Verb","Sentence","Masked","Token"]]
every_else = every_else[["ID","Verb","Sentence","Masked","Token"]]

In [678]:
len(nonempty) + len(good) + len(t4_nonnull) + len(t4_null) + len(every_else)

6280

In [679]:
d = pd.concat([nonempty,good,t4_nonnull,t4_null,every_else])
len(d)

6280

# Masked language modeling to estimate surprisal

- Info on fill-mask pipeline: https://huggingface.co/transformers/main_classes/pipelines.html#transformers.FillMaskPipeline
- Info on particular models: https://huggingface.co/models


- Once the issues above are worked out, then the rest of this should be pretty straightforward.

- For discussion about which model is best, check out the following twitter thread: https://twitter.com/bruno_nicenboim/status/1379168059311656963

- Probably we should use GPT3, not BERT.

In [680]:
unmasker = pipeline('fill-mask', model='bert-large-uncased')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=571.0, style=ProgressStyle(description_…




Some weights of the model checkpoint at bert-large-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [742]:
# using indexing and key to get the relevant output score
unmasker("Dana was [MASK] that Mars has no water.",targets="surprised")[0]['score']

0.0471670962870121

In [718]:
# remove cases where there's more than one mask
d["match"] = d.apply(lambda x: re.findall(fr'MASK',x['Masked']), axis=1)

In [728]:
d["len"] = d.match.apply(lambda x: len(x))

In [759]:
single = d.loc[d.len == 1]

In [737]:
single["BERT_score_CC"] = ""

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single["BERT_score"] = ""


In [745]:
scores = [unmasker(x,targets=y)[0]['score'] for x,y in zip(single["Masked"],single["Verb"])]

The specified target token `hypothesize` does not exist in the model vocabulary. Replacing with `h`.
The specified target token `surmise` does not exist in the model vocabulary. Replacing with `sur`.
The specified target token `surmise` does not exist in the model vocabulary. Replacing with `sur`.
The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `ascertain` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `depress` does not exist in the model vocabulary. Replacing with `de`.
The specified target token `disapprove` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `flatter` does not exist in the model vocabulary. Replacing with `flat`.
The specified target token `hearten` does not exist in the model vocabulary. Replacing with `heart`.
The specified target token `ruminate` does not exist in the model vocabulary. Replacing with `rum`.

The specified target token `diagnose` does not exist in the model vocabulary. Replacing with `dia`.
The specified target token `stun` does not exist in the model vocabulary. Replacing with `stu`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `fantasize` does not exist in the model vocabulary. Replacing with `fan`.
The specified target token `reaffirm` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `feign` does not exist in the model vocabulary. Replacing with `fei`.
The specified target token `jest` does not exist in the model vocabulary. Replacing with `je`.
The specified target token `fret` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `certify` does not exist in the model vocabulary. Replacing with `ce`.
The specified target token `stutter` does not exist in the model vocabulary. Replacing with `stu`.
The specified targ

The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `unsettle` does not exist in the model vocabulary. Replacing with `un`.
The specified target token `agitate` does not exist in the model vocabulary. Replacing with `ag`.
The specified target token `scoff` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `invigorate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `presume` does not exist in the model vocabulary. Replacing with `pre`.
The specified target token `irritate` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `articulate` does not exist in the model vocabulary. Replacing with `art`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `overhear` does not exist in the model vocabulary. Replacing with `over`.
The

The specified target token `pinpoint` does not exist in the model vocabulary. Replacing with `pin`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `excite` does not exist in the model vocabulary. Replacing with `ex`.
The specified target token `perplex` does not exist in the model vocabulary. Replacing with `per`.
The specified target token `hoot` does not exist in the model vocabulary. Replacing with `ho`.
The specified target token `flatter` does not exist in the model vocabulary. Replacing with `flat`.
The specified target token `despise` does not exist in the model vocabulary. Replacing with `des`.
The specified target token `relish` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `repress` does not exist in the model vocabulary. Replacing with `rep`.
The specified target token `congratulate` does not exist in the model vocabulary. Replacing with `cong`.
The specified

The specified target token `bicker` does not exist in the model vocabulary. Replacing with `bi`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `bellow` does not exist in the model vocabulary. Replacing with `bell`.
The specified target token `astound` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `reiterate` does not exist in the model vocabulary. Replacing with `rei`.
The specified target token `tweet` does not exist in the model vocabulary. Replacing with `t`.
The specified target token `fabricate` does not exist in the model vocabulary. Replacing with `fabric`.
The specified target token `conceive` does not exist in the model vocabulary. Replacing with `con`.
The specified target token `chastise` does not exist in the model vocabulary. Replacing with `cha`.
The specified target token `posit` does not exist in the model vocabulary. Replacing with `po`.
The specified t

The specified target token `recap` does not exist in the model vocabulary. Replacing with `rec`.
The specified target token `scribble` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `corroborate` does not exist in the model vocabulary. Replacing with `co`.
The specified target token `prejudge` does not exist in the model vocabulary. Replacing with `pre`.
The specified target token `irritate` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `attest` does not exist in the model vocabulary. Replacing with `at`.
The specified target token `delude` does not exist in the model vocabulary. Replacing with `del`.
The specified target token `frustrate` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `omit` does not exist in the model vocabulary. Replacing with `om`.
The specified target token `deduce` does not exist in the model vocabulary. Replacing with `de`.
The specified ta

The specified target token `conceive` does not exist in the model vocabulary. Replacing with `con`.
The specified target token `miff` does not exist in the model vocabulary. Replacing with `mi`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `disapprove` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `insinuate` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `ascertain` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `nonplus` does not exist in the model vocabulary. Replacing with `non`.
The specified target token `agonize` does not exist in the model vocabulary. Replacing with `ago`.
The specified target token `overhear` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `sadden` does not exist in the model vocabulary. Replacing with `sad`.
The specif

The specified target token `rationalize` does not exist in the model vocabulary. Replacing with `rational`.
The specified target token `rant` does not exist in the model vocabulary. Replacing with `ran`.
The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `baffle` does not exist in the model vocabulary. Replacing with `ba`.
The specified target token `deplore` does not exist in the model vocabulary. Replacing with `de`.
The specified target token `enthrall` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `publicize` does not exist in the model vocabulary. Replacing with `public`.
The specified target token `underscore` does not exist in the model vocabulary. Replacing with `un

The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `infuriate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `signify` does not exist in the model vocabulary. Replacing with `sign`.
The specified target token `squeal` does not exist in the model vocabulary. Replacing with `sq`.
The specified target token `cringe` does not exist in the model vocabulary. Replacing with `cr`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `envision` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `disgruntle` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `startle` does not exist in the model vocabulary. Replacing with `start`.
The specified target token `gladden` does not exist in the model vocabulary. Replacing with `glad`.
The spe

The specified target token `annoy` does not exist in the model vocabulary. Replacing with `ann`.
The specified target token `astonish` does not exist in the model vocabulary. Replacing with `aston`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `allege` does not exist in the model vocabulary. Replacing with `all`.
The specified target token `irk` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `whine` does not exist in the model vocabulary. Replacing with `w`.
The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `obsess` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `allege` does not exist in the model vocabulary. Replacing with `all`.
The specified target t

The specified target token `baffle` does not exist in the model vocabulary. Replacing with `ba`.
The specified target token `yearn` does not exist in the model vocabulary. Replacing with `year`.
The specified target token `mistrust` does not exist in the model vocabulary. Replacing with `mist`.
The specified target token `compel` does not exist in the model vocabulary. Replacing with `com`.
The specified target token `ordain` does not exist in the model vocabulary. Replacing with `or`.
The specified target token `screech` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `foresee` does not exist in the model vocabulary. Replacing with `fore`.
The specified target token `disappoint` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `taunt` does not exist in the model vocabulary. Replacing with `tau`.
The specified target token `mumble` does not exist in the model vocabulary. Replacing with `mum`.
The specified t

The specified target token `yearn` does not exist in the model vocabulary. Replacing with `year`.
The specified target token `excite` does not exist in the model vocabulary. Replacing with `ex`.
The specified target token `mistrust` does not exist in the model vocabulary. Replacing with `mist`.
The specified target token `ridicule` does not exist in the model vocabulary. Replacing with `rid`.
The specified target token `gloat` does not exist in the model vocabulary. Replacing with `g`.
The specified target token `enthrall` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `foresee` does not exist in the model vocabulary. Replacing with `fore`.
The specified target token `energize` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `confide` does not exist in the model vocabulary. Replacing with `con`.
The specified tar

The specified target token `unsettle` does not exist in the model vocabulary. Replacing with `un`.
The specified target token `fret` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `tweet` does not exist in the model vocabulary. Replacing with `t`.
The specified target token `entice` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `terrorize` does not exist in the model vocabulary. Replacing with `terror`.
The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `hearten` does not exist in the model vocabulary. Replacing with `heart`.
The specified target token `tempt` does not exist in the model vocabulary. Replacing with `te`.
The specified target token `infuriate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `ridicule` does not exist in the model vocabulary. Replacing with `rid`.
The specified

The specified target token `misjudge` does not exist in the model vocabulary. Replacing with `mis`.
The specified target token `irk` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `inquire` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `deem` does not exist in the model vocabulary. Replacing with `dee`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `stun` does not exist in the model vocabulary. Replacing with `stu`.
The specified target token `infer` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `reaffirm` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `wager` does not exist in the model vocabulary. Replacing with `wage`.
The specified target token `o

The specified target token `invigorate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `chastise` does not exist in the model vocabulary. Replacing with `cha`.
The specified target token `bicker` does not exist in the model vocabulary. Replacing with `bi`.
The specified target token `hustle` does not exist in the model vocabulary. Replacing with `hu`.
The specified target token `deem` does not exist in the model vocabulary. Replacing with `dee`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `pester` does not exist in the model vocabulary. Replacing with `pest`.
The specified target token `insure` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `ordain` does not exist in the model vocabulary. Replacing with `or`.
The specified 

The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `coerce` does not exist in the model vocabulary. Replacing with `coe`.
The specified target token `diagnose` does not exist in the model vocabulary. Replacing with `dia`.
The specified target token `hanker` does not exist in the model vocabulary. Replacing with `hank`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `categorize` does not exist in the model vocabulary. Replacing with `cat`.
The specified target token `tickle` does not exist in the model vocabulary. Replacing with `tick`.
The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `presume` does not exist in the model vocabulary. Replacing with `pre`.
T

The specified target token `annoy` does not exist in the model vocabulary. Replacing with `ann`.
The specified target token `rant` does not exist in the model vocabulary. Replacing with `ran`.
The specified target token `tantalize` does not exist in the model vocabulary. Replacing with `tan`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `appease` does not exist in the model vocabulary. Replacing with `app`.
The specified target token `insinuate` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `omit` does not exist in the model vocabulary. Replacing with `om`.
The specified target token `deceive` does not exist in the model vocabulary. Replacing with `dec`.
The specified target token `compel` does not exist in the model vocabulary. Replacing with `com`.
The specified target token `amuse` does not exist in the model vocabulary. Replacing with `am`.
The specified targe

The specified target token `mystify` does not exist in the model vocabulary. Replacing with `my`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specified target token `signify` does not exist in the model vocabulary. Replacing with `sign`.
The specified target token `notify` does not exist in the model vocabulary. Replacing with `not`.
The specified target token `dissatisfy` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `mortify` does not exist in the model vocabulary. Replacing with `mort`.
The specified target token `terrify` does not exist in the model vocabulary. Replacing with `terri`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specified target token `certify` does not exist in the model vocabulary. Replacing with `ce`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specif

The specified target token `reason_out` does not exist in the model vocabulary. Replacing with `reason`.
The specified target token `flip_out` does not exist in the model vocabulary. Replacing with `flip`.
The specified target token `piece_together` does not exist in the model vocabulary. Replacing with `piece`.
The specified target token `come_out` does not exist in the model vocabulary. Replacing with `come`.
The specified target token `freak_out` does not exist in the model vocabulary. Replacing with `freak`.
The specified target token `point_out` does not exist in the model vocabulary. Replacing with `point`.
The specified target token `flip_out` does not exist in the model vocabulary. Replacing with `flip`.
The specified target token `figure_out` does not exist in the model vocabulary. Replacing with `figure`.
The specified target token `freak_out` does not exist in the model vocabulary. Replacing with `freak`.
The specified target token `piece_together` does not exist in the mode

In [748]:
len(scores)

6252

In [770]:
single["BertScoreCC"] = scores

In [768]:
single = single.drop(columns = ["BERT_score"])

In [763]:
single = single.drop(columns = ["match","len"]).rename(columns={"bert2":"BertScore"})

# LM without complement

In [None]:
df['text_new1'] = [x.split('::')[0] for x in df['text']]

In [795]:
# remove CC from Masked
single["Masked_noCC"] = [x.split('[MASK]')[0] + " [MASK]" for x in single['Masked']]
# single["Masked_noCC"] = single["Masked"].apply(lambda x: x.replace('(?<=\[MASK\])(.*$)',''))

In [797]:
single["BertScoreNoCC"] = ""

In [798]:
scores_noCC = [unmasker(x,targets=y)[0]['score'] for x,y in zip(single["Masked_noCC"],single["Verb"])]

The specified target token `hypothesize` does not exist in the model vocabulary. Replacing with `h`.
The specified target token `surmise` does not exist in the model vocabulary. Replacing with `sur`.
The specified target token `surmise` does not exist in the model vocabulary. Replacing with `sur`.
The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `ascertain` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `depress` does not exist in the model vocabulary. Replacing with `de`.
The specified target token `disapprove` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `flatter` does not exist in the model vocabulary. Replacing with `flat`.
The specified target token `hearten` does not exist in the model vocabulary. Replacing with `heart`.
The specified target token `ruminate` does not exist in the model vocabulary. Replacing with `rum`.

The specified target token `diagnose` does not exist in the model vocabulary. Replacing with `dia`.
The specified target token `stun` does not exist in the model vocabulary. Replacing with `stu`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `fantasize` does not exist in the model vocabulary. Replacing with `fan`.
The specified target token `reaffirm` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `feign` does not exist in the model vocabulary. Replacing with `fei`.
The specified target token `jest` does not exist in the model vocabulary. Replacing with `je`.
The specified target token `fret` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `certify` does not exist in the model vocabulary. Replacing with `ce`.
The specified target token `stutter` does not exist in the model vocabulary. Replacing with `stu`.
The specified targ

The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `unsettle` does not exist in the model vocabulary. Replacing with `un`.
The specified target token `agitate` does not exist in the model vocabulary. Replacing with `ag`.
The specified target token `scoff` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `invigorate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `presume` does not exist in the model vocabulary. Replacing with `pre`.
The specified target token `irritate` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `articulate` does not exist in the model vocabulary. Replacing with `art`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `overhear` does not exist in the model vocabulary. Replacing with `over`.
The

The specified target token `pinpoint` does not exist in the model vocabulary. Replacing with `pin`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `excite` does not exist in the model vocabulary. Replacing with `ex`.
The specified target token `perplex` does not exist in the model vocabulary. Replacing with `per`.
The specified target token `hoot` does not exist in the model vocabulary. Replacing with `ho`.
The specified target token `flatter` does not exist in the model vocabulary. Replacing with `flat`.
The specified target token `despise` does not exist in the model vocabulary. Replacing with `des`.
The specified target token `relish` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `repress` does not exist in the model vocabulary. Replacing with `rep`.
The specified target token `congratulate` does not exist in the model vocabulary. Replacing with `cong`.
The specified

The specified target token `bicker` does not exist in the model vocabulary. Replacing with `bi`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `bellow` does not exist in the model vocabulary. Replacing with `bell`.
The specified target token `astound` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `reiterate` does not exist in the model vocabulary. Replacing with `rei`.
The specified target token `tweet` does not exist in the model vocabulary. Replacing with `t`.
The specified target token `fabricate` does not exist in the model vocabulary. Replacing with `fabric`.
The specified target token `conceive` does not exist in the model vocabulary. Replacing with `con`.
The specified target token `chastise` does not exist in the model vocabulary. Replacing with `cha`.
The specified target token `posit` does not exist in the model vocabulary. Replacing with `po`.
The specified t

The specified target token `recap` does not exist in the model vocabulary. Replacing with `rec`.
The specified target token `scribble` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `corroborate` does not exist in the model vocabulary. Replacing with `co`.
The specified target token `prejudge` does not exist in the model vocabulary. Replacing with `pre`.
The specified target token `irritate` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `attest` does not exist in the model vocabulary. Replacing with `at`.
The specified target token `delude` does not exist in the model vocabulary. Replacing with `del`.
The specified target token `frustrate` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `omit` does not exist in the model vocabulary. Replacing with `om`.
The specified target token `deduce` does not exist in the model vocabulary. Replacing with `de`.
The specified ta

The specified target token `conceive` does not exist in the model vocabulary. Replacing with `con`.
The specified target token `miff` does not exist in the model vocabulary. Replacing with `mi`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `disapprove` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `insinuate` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `ascertain` does not exist in the model vocabulary. Replacing with `as`.
The specified target token `nonplus` does not exist in the model vocabulary. Replacing with `non`.
The specified target token `agonize` does not exist in the model vocabulary. Replacing with `ago`.
The specified target token `overhear` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `sadden` does not exist in the model vocabulary. Replacing with `sad`.
The specif

The specified target token `rationalize` does not exist in the model vocabulary. Replacing with `rational`.
The specified target token `rant` does not exist in the model vocabulary. Replacing with `ran`.
The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `baffle` does not exist in the model vocabulary. Replacing with `ba`.
The specified target token `deplore` does not exist in the model vocabulary. Replacing with `de`.
The specified target token `enthrall` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `publicize` does not exist in the model vocabulary. Replacing with `public`.
The specified target token `underscore` does not exist in the model vocabulary. Replacing with `un

The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `infuriate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `signify` does not exist in the model vocabulary. Replacing with `sign`.
The specified target token `squeal` does not exist in the model vocabulary. Replacing with `sq`.
The specified target token `cringe` does not exist in the model vocabulary. Replacing with `cr`.
The specified target token `snitch` does not exist in the model vocabulary. Replacing with `s`.
The specified target token `envision` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `disgruntle` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `startle` does not exist in the model vocabulary. Replacing with `start`.
The specified target token `gladden` does not exist in the model vocabulary. Replacing with `glad`.
The spe

The specified target token `annoy` does not exist in the model vocabulary. Replacing with `ann`.
The specified target token `astonish` does not exist in the model vocabulary. Replacing with `aston`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `allege` does not exist in the model vocabulary. Replacing with `all`.
The specified target token `irk` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `whine` does not exist in the model vocabulary. Replacing with `w`.
The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `obsess` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `rediscover` does not exist in the model vocabulary. Replacing with `red`.
The specified target token `allege` does not exist in the model vocabulary. Replacing with `all`.
The specified target t

The specified target token `baffle` does not exist in the model vocabulary. Replacing with `ba`.
The specified target token `yearn` does not exist in the model vocabulary. Replacing with `year`.
The specified target token `mistrust` does not exist in the model vocabulary. Replacing with `mist`.
The specified target token `compel` does not exist in the model vocabulary. Replacing with `com`.
The specified target token `ordain` does not exist in the model vocabulary. Replacing with `or`.
The specified target token `screech` does not exist in the model vocabulary. Replacing with `sc`.
The specified target token `foresee` does not exist in the model vocabulary. Replacing with `fore`.
The specified target token `disappoint` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `taunt` does not exist in the model vocabulary. Replacing with `tau`.
The specified target token `mumble` does not exist in the model vocabulary. Replacing with `mum`.
The specified t

The specified target token `yearn` does not exist in the model vocabulary. Replacing with `year`.
The specified target token `excite` does not exist in the model vocabulary. Replacing with `ex`.
The specified target token `mistrust` does not exist in the model vocabulary. Replacing with `mist`.
The specified target token `ridicule` does not exist in the model vocabulary. Replacing with `rid`.
The specified target token `gloat` does not exist in the model vocabulary. Replacing with `g`.
The specified target token `enthrall` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `dupe` does not exist in the model vocabulary. Replacing with `du`.
The specified target token `foresee` does not exist in the model vocabulary. Replacing with `fore`.
The specified target token `energize` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `confide` does not exist in the model vocabulary. Replacing with `con`.
The specified tar

The specified target token `unsettle` does not exist in the model vocabulary. Replacing with `un`.
The specified target token `fret` does not exist in the model vocabulary. Replacing with `fr`.
The specified target token `tweet` does not exist in the model vocabulary. Replacing with `t`.
The specified target token `entice` does not exist in the model vocabulary. Replacing with `en`.
The specified target token `terrorize` does not exist in the model vocabulary. Replacing with `terror`.
The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `hearten` does not exist in the model vocabulary. Replacing with `heart`.
The specified target token `tempt` does not exist in the model vocabulary. Replacing with `te`.
The specified target token `infuriate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `ridicule` does not exist in the model vocabulary. Replacing with `rid`.
The specified

The specified target token `misjudge` does not exist in the model vocabulary. Replacing with `mis`.
The specified target token `irk` does not exist in the model vocabulary. Replacing with `ir`.
The specified target token `inquire` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `deem` does not exist in the model vocabulary. Replacing with `dee`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `stun` does not exist in the model vocabulary. Replacing with `stu`.
The specified target token `infer` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `reaffirm` does not exist in the model vocabulary. Replacing with `re`.
The specified target token `wager` does not exist in the model vocabulary. Replacing with `wage`.
The specified target token `o

The specified target token `invigorate` does not exist in the model vocabulary. Replacing with `in`.
The specified target token `chastise` does not exist in the model vocabulary. Replacing with `cha`.
The specified target token `bicker` does not exist in the model vocabulary. Replacing with `bi`.
The specified target token `hustle` does not exist in the model vocabulary. Replacing with `hu`.
The specified target token `deem` does not exist in the model vocabulary. Replacing with `dee`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `pester` does not exist in the model vocabulary. Replacing with `pest`.
The specified target token `insure` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `ordain` does not exist in the model vocabulary. Replacing with `or`.
The specified 

The specified target token `instruct` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `coerce` does not exist in the model vocabulary. Replacing with `coe`.
The specified target token `diagnose` does not exist in the model vocabulary. Replacing with `dia`.
The specified target token `hanker` does not exist in the model vocabulary. Replacing with `hank`.
The specified target token `oblige` does not exist in the model vocabulary. Replacing with `ob`.
The specified target token `categorize` does not exist in the model vocabulary. Replacing with `cat`.
The specified target token `tickle` does not exist in the model vocabulary. Replacing with `tick`.
The specified target token `frighten` does not exist in the model vocabulary. Replacing with `fright`.
The specified target token `overwhelm` does not exist in the model vocabulary. Replacing with `over`.
The specified target token `presume` does not exist in the model vocabulary. Replacing with `pre`.
T

The specified target token `annoy` does not exist in the model vocabulary. Replacing with `ann`.
The specified target token `rant` does not exist in the model vocabulary. Replacing with `ran`.
The specified target token `tantalize` does not exist in the model vocabulary. Replacing with `tan`.
The specified target token `sicken` does not exist in the model vocabulary. Replacing with `sick`.
The specified target token `appease` does not exist in the model vocabulary. Replacing with `app`.
The specified target token `insinuate` does not exist in the model vocabulary. Replacing with `ins`.
The specified target token `omit` does not exist in the model vocabulary. Replacing with `om`.
The specified target token `deceive` does not exist in the model vocabulary. Replacing with `dec`.
The specified target token `compel` does not exist in the model vocabulary. Replacing with `com`.
The specified target token `amuse` does not exist in the model vocabulary. Replacing with `am`.
The specified targe

The specified target token `mystify` does not exist in the model vocabulary. Replacing with `my`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specified target token `signify` does not exist in the model vocabulary. Replacing with `sign`.
The specified target token `notify` does not exist in the model vocabulary. Replacing with `not`.
The specified target token `dissatisfy` does not exist in the model vocabulary. Replacing with `di`.
The specified target token `mortify` does not exist in the model vocabulary. Replacing with `mort`.
The specified target token `terrify` does not exist in the model vocabulary. Replacing with `terri`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specified target token `certify` does not exist in the model vocabulary. Replacing with `ce`.
The specified target token `gratify` does not exist in the model vocabulary. Replacing with `gr`.
The specif

The specified target token `reason_out` does not exist in the model vocabulary. Replacing with `reason`.
The specified target token `flip_out` does not exist in the model vocabulary. Replacing with `flip`.
The specified target token `piece_together` does not exist in the model vocabulary. Replacing with `piece`.
The specified target token `come_out` does not exist in the model vocabulary. Replacing with `come`.
The specified target token `freak_out` does not exist in the model vocabulary. Replacing with `freak`.
The specified target token `point_out` does not exist in the model vocabulary. Replacing with `point`.
The specified target token `flip_out` does not exist in the model vocabulary. Replacing with `flip`.
The specified target token `figure_out` does not exist in the model vocabulary. Replacing with `figure`.
The specified target token `freak_out` does not exist in the model vocabulary. Replacing with `freak`.
The specified target token `piece_together` does not exist in the mode

In [799]:
single["BertScoreNoCC"] = scores_noCC

# Save to CSV

In [801]:
single.to_csv("../data/bert_scores.csv")