# Exploring the Manifesto Project

The [Manifesto Project](https://manifesto-project.wzb.eu/) analyses parties’ election manifestos in order to study parties’ policy preferences.  

This notebook explores the [`manifestoberta` model](https://manifesto-project.wzb.eu/information/documents/manifestoberta) for classifying political text.

In [1]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
model = AutoModelForSequenceClassification.from_pretrained("manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2023-1-1")

pytorch_model.bin: 100%|██████████| 2.24G/2.24G [14:54<00:00, 2.50MB/s]


In [4]:
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")

In [22]:
def manifestoberta_predict_class(sentence: str, model, tokenizer, top_class = 1):
    inputs = tokenizer(sentence,
                   return_tensors="pt",
                   max_length=200,  #we limited the input to 200 tokens during finetuning
                   padding="max_length",
                   truncation=True
                   )

    logits = model(**inputs).logits

    # probabilities = torch.softmax(logits, dim=1).tolist()[0]
    # probabilities = {model.config.id2label[index]: round(probability * 100, 2) for index, probability in enumerate(probabilities)}
    # probabilities = dict(sorted(probabilities.items(), key=lambda item: item[1], reverse=True))
    predicted_class = model.config.id2label[logits.topk(5).indices[0][top_class-1].item()]

    return predicted_class

In [23]:
sentence = "Auf allen Autobahnen soll ein generelles Tempolimit gelten."
manifestoberta_predict_class(sentence, model, tokenizer, top_class=1)

'411 - Technology and Infrastructure'

In [24]:
import pandas as pd

wahlomat_all = pd.read_csv("data/wahlomat_responses_2021/wahlomat_2021.csv")

wahlomat_statements = pd.DataFrame({"statements": wahlomat_all["statement"].unique(),
                         "predicted_class_1" : None,
                         "predicted_class_2" : None})

In [25]:
wahlomat_statements["predicted_class_1"] = wahlomat_statements.apply(lambda row: manifestoberta_predict_class(row["statements"], model, tokenizer, top_class = 1), axis=1)
wahlomat_statements["predicted_class_2"] = wahlomat_statements.apply(lambda row: manifestoberta_predict_class(row["statements"], model, tokenizer, top_class = 2), axis=1)

In [26]:
wahlomat_statements

Unnamed: 0,statements,predicted_class_1,predicted_class_2
0,Auf allen Autobahnen soll ein generelles Tempo...,411 - Technology and Infrastructure,501 - Environmental Protection: Positive
1,Deutschland soll seine Verteidigungsausgaben e...,104 - Military: Positive,105 - Military: Negative
2,Bei Bundestagswahlen sollen auch Jugendliche a...,202 - Democracy,706 - Non-economic Demographic Groups
3,Die Förderung von Windenergie soll beendet wer...,501 - Environmental Protection: Positive,416 - Anti-Growth Economy: Positive
4,Die Möglichkeiten der Vermieterinnen und Vermi...,412 - Controlled Economy,403 - Market Regulation
5,Impfstoffe gegen Covid-19 sollen weiterhin dur...,403 - Market Regulation,411 - Technology and Infrastructure
6,Der für das Jahr 2038 geplante Ausstieg aus de...,501 - Environmental Protection: Positive,416 - Anti-Growth Economy: Positive
7,Alle Erwerbstätigen sollen in der gesetzlichen...,504 - Welfare State Expansion,503 - Equality: Positive
8,Das Recht anerkannter Flüchtlinge auf Familien...,608 - Multiculturalism: Negative,601 - National Way of Life: Positive
9,"Auf den Umsatz, der in Deutschland mit digital...",403 - Market Regulation,412 - Controlled Economy


In [27]:
wahlomat_statements.to_csv("data/wahlomat_responses_2021/wahlomat_2021_predicted_classes.csv", index=False)