# Gendered Reception of Politicians in Online Political Discourse

### Computational Social Sciences

**Authors:** BELLAIS Salome, GONZALEZ DARDIK Micaela Natali, MARCULESCU Tudor, RODRIGUEZ Miguel, VIELLARD Mathilde.
**Course:** Computational Social Sciences (2025–2026)

---

## Abstract

Online social media platforms are key arenas for political debate, shaping public perception of political figures. While prior research highlights that female politicians often face different forms of evaluation and criticism than their male counterparts—frequently involving gendered language and personal attacks—data availability can limit broad generalizations.

In this project, we focus on the online reception of selected French political figures, primarily Marine Le Pen and Emmanuel Macron, with additional data on a male politician from a similar ideological background. Using computational methods, we analyze textual content and interaction patterns on Twitter to examine differences in sentiment, toxicity, thematic focus, and network dynamics.

By combining natural language processing techniques with graph-based analysis, this study investigates how gender and political alignment intersect in shaping online discourse, while explicitly acknowledging the methodological and ethical limitations inherent in computational social science approaches.

## 1. Introduction

Social media platforms play a central role in contemporary political communication, allowing direct interaction between political figures and the public. While these platforms can foster political engagement, they also expose public figures to large volumes of unmoderated commentary, including harassment and hate speech.

Gender bias in political communication has been documented in traditional media, where women are often evaluated based on personal attributes rather than political positions. Online platforms introduce additional dynamics such as anonymity, virality, and network effects, which may amplify these biases.

In this project, we focus on the online reception of selected French political figures, primarily Marine Le Pen and Emmanuel Macron, with additional data on a male politician from a similar ideological background. Using computational tools, we aim to systematically analyze large-scale online discourse and examine how gender and political alignment intersect in shaping public perception and online commentary.

## 2. Research Question and Hypotheses

### Research Question

How does the online reception of Marine Le Pen differ from that of Emmanuel Macron on Twitter, and to what extent can observed differences be associated with gender versus political alignment?

### Sub-questions

- Are tweets referring to Marine Le Pen more negative or toxic than those referring to Emmanuel Macron?  
- Do the dominant topics differ between discussions about these politicians?  
- Are gendered or personal themes (e.g. appearance, legitimacy, personal life) more prevalent in tweets about Marine Le Pen?  
- How does the reception of Marine Le Pen compare to that of a male politician from a similar ideological background?

### Hypotheses

- **H1:** Tweets referring to Marine Le Pen exhibit higher levels of toxicity and personal attacks than those referring to Emmanuel Macron.  
- **H2:** Topic modeling reveals gender-specific themes, with tweets about Marine Le Pen more frequently referencing appearance or personal attributes.  
- **H3:** Differences in toxicity and interaction patterns persist, though are partially reduced, when comparing Marine Le Pen to a male politician from a similar political orientation.

## 3. Methodological Overview

To address our research question, we adopt a computational approach combining text analysis and network analysis on Twitter data referring to selected French political figures, primarily Marine Le Pen and Emmanuel Macron, with additional data on a male politician from a similar ideological background.

### Methods

- **Natural Language Processing (NLP):**
  - Text preprocessing (tokenization, cleaning, and normalization)
  - Sentiment and toxicity classification using pre-trained models
  - Topic modeling to identify dominant themes in tweets

- **Graph Analysis:**
  - Construction of reply and interaction networks
  - Analysis of centrality, clustering, and coordination patterns among users
  - Comparison of interaction patterns across politicians to detect potential gendered or ideological clustering

### Computational Social Science Perspective

While computational methods allow large-scale analysis, they are not neutral. Pre-trained NLP models may encode social biases, and network structures may reflect platform-specific affordances. Our analysis explicitly considers these limitations, especially given the restricted scope of the dataset and the overlap between gender and political alignment.


In [1]:
import sys
sys.executable

# INSTALL ALL OF THIS LIBRARIES BEFORE RUNNING THE CODE
# %pip install pandas numpy matplotlib seaborn nltk scikit-learn networkx transformers torch wordcloud textblob kagglehub

'c:\\Users\\micag\\anaconda3\\envs\\css_full\\python.exe'

In [4]:
#!pip install pandas numpy matplotlib seaborn nltk scikit-learn networkx transformers torch wordcloud textblob kagglehub detoxify
!pip install detoxify

# Basic libraries
import pandas as pd
import numpy as np
import sqlite3
import os

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# NLP
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer

# Topic modeling
from sklearn.decomposition import LatentDirichletAllocation

# Graphs
import networkx as nx

# Utils
from collections import Counter

# Kagglehub for dataset download
import kagglehub

# install torch and after, transformers
import torch
from transformers import pipeline
import detoxify

print("Environment ready")

Environment ready


In [5]:
nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\micag\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\micag\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## 4. Data Sources

Due to access restrictions and recent changes in the Twitter/X API, this project relies on **publicly available datasets** from Kaggle. The data consists of tweets referring to selected French political figures, primarily Marine Le Pen and Emmanuel Macron, with additional tweets about a male politician from a similar ideological background.

Each observation includes textual content and basic interaction metadata, enabling both linguistic analysis (sentiment, toxicity, topics) and network-based analysis (reply and interaction structures). 

Using open datasets ensures reproducibility, transparency, and ethical compliance, while still allowing us to study real-world political discourse at scale.

In [None]:
###### HERE WOULD GO ALL THE SCRAPING AND SENTIMENTAL ANALYSIS CODE ######
## I'M NOT GOING TO INCLUDE IT NOW, AS IT'S ALREADY SAVED IN CSV FILES ##
#######################################################################

## 5. Data Analysis Overview

We use the preprocessed dataset created by our collaborator, which contains tweets mentioning Marine Le Pen, Emmanuel Macron, and a male politician from a similar ideological background.  

The dataset already includes:  
- Cleaned text  
- Sentiment labels and scores  
- Toxicity metrics (from Detoxify and VADER models)  
- Metadata such as timestamp and target politician  

This allows us to focus on comparing online reception between politicians without redoing the initial NLP preprocessing or scraping.
