# Validate predictions against external examples sentences

Several studies report example sentences that prototypically illustrate populist statements (Ernst et al. 2017, Schürmann and Gründl 2022, Bonikowski et al. 2022, Dai and Kustov 2022). In addition Castanho Silva et al. 2022 published an extensive list of populist statements that are designed as rating scales for use in surveys. These sentences are each assigned by the authors to a specific subdimension of populism, or are generally referred to as populist. We use these samples for out of sample predicitions, to establish construct validity. In figure X, we show the relative frequency of sentences from each externally predefined labels that are predicted to belong to one of our label classes.


In [None]:
from pathlib import Path

import pandas as pd
import torch
from transformers import AutoModel
from transformers import AutoTokenizer

import src

In [None]:
DEVICE = "cpu"
pd.set_option("display.max_colwidth", 2048)

In [None]:
out_path = Path("/home/lukas/overleaf/bert_populism/tables")

## Import and predict external sentences


In [None]:
df = pd.read_csv(src.PATH / "data/Populist_examples_from_other_studies.csv", sep=",")
df = df.rename({"Snippet": "German", "Domain": "Label"}, axis=1)

In [None]:
commit_hash = "f0fdc3be891596f4cd5a7c3896995c36a7d0ae9c"
tokenizer = AutoTokenizer.from_pretrained("luerhard/PopBERT")
model = AutoModel.from_pretrained("luerhard/PopBERT", trust_remote_code=True, revision=commit_hash)

Some weights of the model checkpoint at deepset/gbert-large were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initia

In [None]:
with torch.inference_mode():
    encodings = tokenizer(list(df["German"]), padding=True, return_tensors="pt").to(DEVICE)
    _, probabilities = model(**encodings)
    probabilities = probabilities.numpy().round(3)

## Create table for appendix


In [None]:
cols = ["Anti-Elite", "People-Centric", "Host-Left", "Host-Right"]
probabilities_df = pd.DataFrame(probabilities, columns=cols)
table = pd.concat([df, probabilities_df], axis=1)

In [None]:
def bold_formatter_thresh(num, thresh):
    num_str = str(round(num, 2))
    if num < thresh:
        return num_str
    else:
        return r"\textbf{" + num_str + "}"


def add_textit(text):
    return r"\textit{" + text + "}"


def add_bold_font(text):
    return r"\textbf{" + text + "}"


def add_parbox(text, size):
    return r"\parbox[t]{" + size + "}{" + text + "}"

In [None]:
thresh = {
    "Anti-Elite": 0.5013018,
    "People-Centric": 0.5017193,
    "Host-Left": 0.42243505,
    "Host-Right": 0.38281676,
}

In [None]:
table = pd.concat([df, probabilities_df], axis=1)

table["Label"] = table["Label"].replace(
    {
        "anti elitism": "anti-elitism",
        "people centrism": "people-centrism",
        "anti elitism & people centrism": "populism",
    }
)

for col in cols:
    table[col] = table[col].apply(lambda x: bold_formatter_thresh(x, thresh[col]))

table["English"] = table["English"].apply(add_textit)
table["Text"] = table["German"] + r"\\" + table["English"]
# table["Label"] = table["Label"].str.replace(r"&", r"\&")

table["Text"] = table["Text"].apply(lambda x: add_parbox(x, r".45\textwidth"))
table["Source"] = table["Source"].apply(lambda x: add_parbox(x, r".1\textwidth"))
table["Label"] = table["Label"].apply(add_bold_font)
table["Label"] = table["Label"].apply(lambda x: add_parbox(x, r".07\textwidth"))

table = table.set_index(["Source", "Label", "Text"])
tex = (
    table[["Anti-Elite", "People-Centric", "Host-Left", "Host-Right"]]
    .style.set_table_styles(
        [
            {"selector": "toprule", "props": ":toprule;"},
            {"selector": "bottomrule", "props": ":bottomrule;"},
        ]
    )
    .to_latex()
)

lines = tex.splitlines()
new = []
for i, line in enumerate(lines, 1):
    if i == 1:
        line = r"\begin{longtable}{p{.1\textwidth}p{.07\textwidth}p{.45\textwidth}p{.04\textwidth}p{.04\textwidth}p{.04\textwidth}p{.04\textwidth}}\\"
    if i == len(lines):
        line = r"\end{longtable}"
    if i == 3:
        line = r""" Source & Label & Text & Anti-Elite & People-Centric & Host-Left & Host-Right \\
\midrule
\endhead
"""
    if i == 4:
        continue

    line = line.replace(r"\multirow[c]", "\multirow[t]")

    if i > 4 and i < len(lines) - 2:
        line = line + "\midrule"
    new.append(line)

tex = "\n".join(new)

(out_path / "external_sents.tex").write_text(tex)

print(tex)

\begin{longtable}{p{.1\textwidth}p{.07\textwidth}p{.45\textwidth}p{.04\textwidth}p{.04\textwidth}p{.04\textwidth}p{.04\textwidth}}\\
\toprule
 Source & Label & Text & Anti-Elite & People-Centric & Host-Left & Host-Right \\
\midrule
\endhead

\multirow[t]{7}{*}{\parbox[t]{.1\textwidth}{Schürmann and Gründl 2022}} & \multirow[t]{7}{*}{\parbox[t]{.07\textwidth}{\textbf{populism}}} & \parbox[t]{.45\textwidth}{"Die da oben" bestimmen über "uns hier unten"? Dieses Gefühl der Ohnmacht vieler Bürger will die AfD aufheben. Einer der mir wichtigsten Punkte in unserem Wahlprogramm ist deshalb dieser: Wir wollen dem Volk das Recht geben, den Abgeordneten auf die Finger zu schauen und vom Parlament beschlossene Gesetze zu ändern oder abzulehnen. Das Volk soll auch die Möglichkeit erhalten, eigene Gesetzesinitiativen einzubringen und per Volksabstimmung zu beschließen. \\\textit{The one up there” controls “us down here”? This feeling of disempowerment for many citizens is what the AfD wants to overc