**APPENDIX**

**CODEBOOK ON POLARIZATION SCORE**

This notebook was not used in the current project in this final form. The purpose of this notebook was to achieve a better prompt for the GPT coder, or to generate a codebook for a human annotator coder, or to generate code to annotate the file directly through other libraries in Python, like re. While the use was not directed to the library, this code was useful to build the prompt for GPT, and it was added as a context for the task.

**Structure:**

Codebook intro: Explains the purpose of the codebook as a guide for human or AI annotators, based on a scale used in an international group for political polarization research.

Codebook Table: A detailed markdown table lists each polar_score from -7 to +7 (plus "INV"). For each score, it provides:

The corresponding pole_label (left, right, neutral, inv).

A Description of the type of content that falls into that category.

Concrete Examples of tweets for the U.S. political context.

Coding Criteria: Outlines the key elements annotators should consider: Tone, Target, Intent, Lexicon, and References.

Rule-Based Python Function: Includes a Python function, classify_polarization, that uses regular expressions (re) to attempt a rule-based classification based on keywords and phrases from the codebook. While not used for the final classification, this serves as a practical implementation of the coding logic and was used to help formulate the LLM prompts.

**Codebook**
**Appendix: Polarization Coding Codebook (U.S. Context)**

**Overview**

This codebook presents the criteria used to manually label tweets and messages along a 15-point political polarization scale, ranging from –7 (extreme left pole) to +7 (extreme right pole). Each entry is assigned two labels:
	•	polar_score: a numerical value from –7 to +7 or 'INV' for irrelevant content.
	•	pole_label: a categorical tag: 'left', 'right', 'neutral', or 'inv'.

The coding was based on a combination of ideological stance, affiliation cues, and sentiment direction toward political actors (Democratic vs. Republican), policies, or broader partisan coalitions. The code is part of a broader exercise in labelling political polarization across different countries leaded by the University "Autónoma de Madrid" in Spain, and followed for research groups across the world.

**Guidelines**

Score	Label	Description	Examples in U.S. context
–7	left	Strongly negative toward both Democrats and Republicans from a leftist perspective	“Both parties are corporate puppets — real change comes from the grassroots left.”
–6	left	Strongly negative toward Republicans, highly emotional tone, explicitly favoring far-left	“MAGA fascists are ruining democracy — time for radical progressive reform.”
–5	left	Strongly negative toward Republicans only	“Trump and the GOP are a threat to civil rights and the environment.”
–4	left	Negative toward Democrats (moderates) from the left	“Biden is just a neoliberal — no different from Reagan economically.”
–3	left	Moderate criticism of Democratic figures from the left	“Why is Biden siding with corporations again? Not very progressive.”
–2	left	Positive/energizing tone for the left, without attacking the right	“We’re building a better future with progressive values!”
–1	left	Cooperative or unifying tone that slightly favors the left	“Democrats and Republicans must work together — but Dems are more trustworthy.”
0	neutral	Neutral, factual, non-partisan, or purely promotional content	“Tune in at 9 PM for tonight’s presidential debate.”
+1	right	Cooperative/unifying tone slightly favoring the right	“It’s time for bipartisan solutions — Republicans have a stronger economic plan.”
+2	right	Positive message for right-wing identity or political movement	“We stand for conservative values that made this country great.”
+3	right	Moderate criticism of Republicans (e.g., RINOs) from a right-wing stance	“The GOP is losing its spine — we need real conservatives like Trump.”
+4	right	Negative toward moderate Republicans from the right	“Establishment Republicans are just Democrats in disguise.”
+5	right	Strongly negative toward Democrats only	“The radical left wants to destroy traditional America.”
+6	right	Highly emotional, polarized attack on Democrats and progressives	“Biden and his socialist cronies are turning this into Venezuela.”
+7	right	Strongly negative toward both Democrats and Republicans from a far-right perspective	“The entire political class — Dems and RINOs alike — are corrupt globalist elites.”
INV	inv	Message is off-topic, irrelevant to the debate, spam, or unintelligible	“Thanks, Angel! Do you have it in a file?”


**Coding Criteria**

Coders should evaluate messages based on:
	•	Tone: Is the message aggressive, satirical, or uplifting?
	•	Target: Who is being praised or criticized (left, right, both, neither)?
	•	Intent: Is the goal to promote, criticize, unite, or mock?
	•	Lexicon: Does it use partisan labels (e.g., MAGA, woke, neoliberal)?
	•	References: Mentions of parties, figures (e.g., Trump, Biden), or ideological frames.

**Additional Notes**
	•	Messages may express ideological criticism across axes (e.g., anti-system, anti-elite, libertarian). Use –7 or +7 only when this critique targets both major parties from a distinct pole.
	•	If unclear whether a message supports or attacks either side, assign 0 or 'INV' depending on clarity and relevance.
	•	Humor, irony, or sarcasm must be interpreted in context. When ambiguous, err on the side of caution (e.g., 0 or INV).


In [None]:
import re

def classify_polarization(text):
    text = str(text).lower()

    # Invalid: No clear connection to debate or core political actors
    if not any(word in text for word in ['trump', 'biden', 'kamala', 'harris', 'debate', 'president', 'maga']):
        return 'INV', 'inv'

    # Extreme Left (-7 to -6)
    if any(phrase in text for phrase in [
        'trump is dumb', 'fascist', 'maga cult', 'nazi propaganda', 'dictator trump',
        'felon standing next to him', 'convicted felon', 'trump has lost it',
        'blood on trump’s hands', 'trump = war', 'hebephrenic', 'destroying america',
        'trump just lies', 'trump just admitted', 'trump summoned the mob',
        'abc lets trump lie', 'trump cremates americans', 'trump is a danger'
    ]):
        return -6, 'left'
    if re.search(r"(trump).+?(liar|dangerous|insane|traitor|nazi|fascist|threat)", text):
        return -6, 'left'

    # Strong Left (-5 to -4)
    if any(phrase in text for phrase in [
        'kamala destroyed trump', 'kamala dismantled your candidate',
        'kamala crushed', 'kamala wins', 'kamala annihilated', 'president harris',
        'trump handed his ass', 'debate disaster for trump', 'kamala drone show',
        'kamala owns trump', 'kamala should point out', 'kamala on offense',
        'kamala with antifa', 'retweet so all americans see this', 'trump swallowing bait',
        'trump admitted', 'moderator calls out trump', 'putin is shaking', 'kamala leads'
    ]):
        return -5, 'left'

    # Moderate Left (-3 to -2)
    if any(phrase in text for phrase in [
        'vote biden', 'kamala for president', 'biden saves america',
        'retweet if you support kamala', 'debunked', 'trump unhinged', 'biden better than',
        'retweet if', 'quiet biden over lying trump', 'biden speaks slower with feeling',
        'joe has a stutter', 'biden speaks with genuine feeling'
    ]):
        return -3, 'left'
    if 'biden' in text and any(word in text for word in ['won', 'better', 'stronger']):
        return -2, 'left'

    # Mild Left (-1)
    if any(phrase in text for phrase in [
        'moderators go easier on trump', 'trump got help from moderators',
        'abc lets trump talk nonsense', 'trump lies', 'biden trying to tell the truth',
        'trump doesn’t care about truth', 'kamala should point out'
    ]):
        return -1, 'left'

    # Neutral (0)
    if any(phrase in text for phrase in [
        'debate coverage', 'breaking', 'reported', 'cnn debate', 'live coverage',
        'presidential debate', 'kamala and trump debate', 'debate night',
        'debate2024', 'live', 'spin room', 'town hall', 'debate tips', 'post debate'
    ]):
        return 0, 'neutral'

    # Mild Right (+1)
    if 'trump' in text and any(word in text for word in ['won', 'better', 'stronger']):
        return 1, 'right'

    # Moderate Right (+2 to +3)
    if any(phrase in text for phrase in [
        'vote trump', 'make america great', 'maga rally', 'trump wins debate',
        'kamala clown', 'open borders', 'project 2025 is not real', 'trump on fire',
        'debate beast', 'pack it up dude belongs in a bed', 'kamala hasn’t done anything',
        'kamala clueless', 'kamala annoying', 'kamala failed america',
        'trump leads with numbers', 'debate unfair to trump', 'abc moderators biased',
        'kamala = communism', 'biden crime family'
    ]):
        return 3, 'right'

    # Strong Right (+4 to +5)
    if any(phrase in text for phrase in [
        'hunter laptop', 'biden inflation', 'trump crushed it', 'trump should cede time to biden',
        'kamala lying', 'fake news', 'deep state', 'charlottesville hoax',
        'stand back and stand by lie', 'trump said what we’re all thinking',
        'abortion lies', 'kamala with antifa', 'trump is debate king', 'kamala confused',
        'kamala communist', 'vp harris puppet'
    ]):
        return 5, 'right'
    if re.search(r"(biden|kamala).+?(communist|liar|hoax|traitor|fake)", text):
        return 5, 'right'

    # Extreme Right (+6 to +7)
    if any(phrase in text for phrase in [
        'godless commies', 'pedo joe', 'biden has alzheimers', 'bloodbath',
        'biden = war', 'biden = puppet', 'biden is a disgrace', 'biden is not mentally fit',
        'biden destroys america', 'biden kills economy', 'kamala = puppet',
        'biden created inflation', 'bring back trump', 'bidenomics is crushing us'
    ]):
        return 6, 'right'

    # Default fallback: neutral if content is ambiguous but political
    return 0, 'neutral'