# Social Representations and Boundaries of Humor: A focus on Gender roles

## Research questions: 

1) How are men and women depicted in New Yorker cartoons and captions, and do these depictions reflect traditional gender roles or stereotypes?

2) How does audience response (e.g., votes or winning captions) relate to gendered content—do captions about one gender receive more positive attention, and does this reinforce or challenge stereotypes?

## Structure: How do we answer these questions ? (To complete)

**Step 1:** detect gendered references in sentences and assign a gender to each of them (male, female, both, neutral). 

    *Method*: Found two gender lists that contains gendered word. I wanted a longer list so I manually augmented it with universal gendered word and contextual gender markers. Then I added words based on what words are actually on the dataset.

## Initialisation of the absolute Github repository path

In [54]:
from pathlib import Path
import sys

root = Path(__file__).resolve().parent if "__file__" in globals() else Path.cwd()
while root.parent != root:
    if ((root / ".git").exists() and 
        (root / "README.txt").exists() and 
        (root / "results.ipynb").exists()): break
    root = root.parent
if str(root) not in sys.path: sys.path.insert(0, str(root))

print("Root folder at: ", root)

Root folder at:  d:\


## Imports

In [42]:
# working librairies
import os
import pickle
import csv

# basics
import pandas as pd
import numpy as np

# plots
import matplotlib.pyplot as plt
import seaborn as sns

# text processing libraries
import nltk
import spacy

## Loading the data



In [None]:
stored_dataprep_pkl_path = r'D:\GitHub\ada-2025-project-adacore42\data\data_prepared.pkl'

with open(stored_dataprep_pkl_path, 'rb') as f:
    data = pickle.load(f)

In [44]:
# Extract the objects in the pickle

# dataA est une liste de DataFrames pandas (ou un objet similaire, comme un dictionnaire de DataFrames). Chaque élément de la liste contient un DataFrame avec 7 colonnes et un nombre variable de lignes.
dataA = data['dataA']
# dataC est un DataFrame de métadonnées de tous les cartoon contests.
dataC = data['dataC']
dataA_startID = data['dataA_startID']
dataA_endID = data['dataA_endID']
dataC_lastGoodID = data['dataC_lastGoodID']

In [50]:
def drop_NaN(dataA, dataC):
    """
    This function finds the contests with no metadata and drop them in dataA
    and dataC
    Input: dataA, dataC
    Return: dataA_removed, dataC_removed
    """
    dataC_copy = dataC.copy(deep=True)

    # find the where there are no NaN's are in the metadata
    NaN_in_rows = dataC_copy[dataC_copy['image_descriptions'].isna()].index
    # remove them in dataC
    dataC_copy.dropna(subset=['image_descriptions'], inplace=True)
    # Remove the corresponding contests in dataA
    dataA_removed = [x for i, x in enumerate(dataA) if i not in NaN_in_rows]

    return dataA_removed, dataC_copy

def get_absolute_index():
    return None

In [51]:
dataA_removed, dataC_removed = drop_NaN(dataA, dataC)

In [52]:
print(f"Length dataA: {len(dataA_removed)}\nShape dataC: {dataC_removed.shape}")

Length dataA: 240
Shape dataC: (240, 12)


In [53]:
dataC_removed

Unnamed: 0,num_captions,num_votes,image_locations,image_descriptions,image_uncanny_descriptions,entities,questions,date,cleaned_image_locations,cleaned_questions,cleaned_image_uncanny_descriptions,cleaned_image_descriptions
0,3905.0,41185.0,[the street],[A man is relaxing on a city street. Others ar...,[A man is just laying in the middle of the sid...,[https://en.wikipedia.org/wiki/Bystander_effec...,[Why is he laying there?],NaT,street,laying,man laying middle sidewalk,man relaxing city street others going business...
1,3325.0,28205.0,"[the front hard, a residential walkway]",[A man in a winter coat and cap is looking at ...,[It's unusual to see someone holding a snow sh...,"[https://en.wikipedia.org/wiki/Snowball_fight,...",[Is the man overly small or the shovel overly ...,NaT,front hard residential halfway,man overlay small shovel overlay big boy huge ...,unusual see someone holding snow shovel way ma...,man winter coat cap looking small bearded man ...
2,4399.0,21574.0,"[yoga place, a yoga studio]",[A man and woman are standing facing one anoth...,[Nothing is really out of place in this image....,"[https://en.wikipedia.org/wiki/Rug, https://en...","[Why is the man carrying a huge rug?, Why is t...",2016-03-21,place studio,man carrying huge rug man trying use living ro...,nothing really place image man huge rug big st...,man woman standing facing one another mirror i...
3,4141.0,16894.0,"[a workplace, an elevator]",[Three business men are walking down a hall. T...,[A suit case is usually carried by one person ...,[https://en.wikipedia.org/wiki/Worker_cooperat...,[Why is the briefcase big enough for three peo...,2016-03-27,workplace elevator,briefcase big enough three people carrying car...,suit case usually carried one person three sup...,three business men walking hall carrying brief...
4,3951.0,95790.0,[plains],[Some cowboys are riding through the desert. T...,[There are rocking horses in place of real hor...,"[https://en.wikipedia.org/wiki/Rocking_horse, ...",[Why is this chase taking place?],2016-04-03,plain,chase taking place,rocking horse place real horse,cowboy riding desert rocking horse
...,...,...,...,...,...,...,...,...,...,...,...,...
247,9411.0,779033.0,"[the ocean, the sea]",[Two fish are in the ocean and they seem to be...,[One of the fish that is mounted on the wall l...,[https://en.wikipedia.org/wiki/Big_Mouth_Billy...,[How is the mounted fish still alive and able ...,2021-06-28,ocean sea,mounted fish still alive able swim fish alive ...,one fish mounted wall like trophy still alive ...,two fish ocean seem discussion one fish normal...
248,7393.0,807956.0,[a restaurant],[Two identical waiters approach a couple sitti...,[A waiter is carrying a garbage can to a table...,"[https://en.wikipedia.org/wiki/Waiting_staff, ...",[Why is the waiter bringing the couple garbage...,2021-07-05,restaurant,waiter bringing couple garage waiter bringing ...,waiter carrying garage table restaurant water ...,two identical water approach couple sitting on...
249,10712.0,926127.0,[a field],[Death is giving some food to a man. He is say...,[Death doesn't usually give things to people.],[https://en.wikipedia.org/wiki/Death_(personif...,[Are they neighbors?],2021-07-19,field,neighbor,death usually give thing people,death giving food man saying something woman side
250,7076.0,901611.0,"[experimental facility, hamster cage]",[A mouse is conducting a hamster wheel experim...,[The rat is holding a clipboard and standing u...,"[https://en.wikipedia.org/wiki/Mouse, https://...","[Why is the mouse conducting the experiment?, ...",2021-07-26,experimental facility master cage,mouse conducting experiment saying turtle turt...,rat holding cupboard standing upright mouse co...,mouse conducting master wheel experiment mouse...


## Step 0: Augment the gendered lists

## Step 1: Detect gender

Use gender lexicons, but we need to define them first. 

For P2, I used two small lists with common gendered terms. For P3 I want to extend them based on what what terms are used in the contest!

In [27]:
# load nlp from spacy
nlp = spacy.load("en_core_web_sm")

## Other codes

In [21]:
dataA[0]['caption'][3901]

"This has 'Alice in Wonderland' beat by a mile."

In [13]:
# load nlp from spacy
nlp = spacy.load("en_core_web_sm")

In [22]:
example = dataA[0]['caption'][3901]
doc = nlp(example)

In [23]:
doc

This has 'Alice in Wonderland' beat by a mile.

In [24]:
tokens = [token.text for token in doc]
print(tokens)

['This', 'has', "'", 'Alice', 'in', 'Wonderland', "'", 'beat', 'by', 'a', 'mile', '.']


In [25]:
pos_tagged = [(token.text, token.pos_) for token in doc]
print(pos_tagged)

[('This', 'PRON'), ('has', 'VERB'), ("'", 'PUNCT'), ('Alice', 'PROPN'), ('in', 'ADP'), ('Wonderland', 'PROPN'), ("'", 'PUNCT'), ('beat', 'NOUN'), ('by', 'ADP'), ('a', 'DET'), ('mile', 'NOUN'), ('.', 'PUNCT')]


In [26]:
for ent in doc.ents:
    print(ent.text, ent.label_)

Alice PERSON
Wonderland GPE
a mile QUANTITY
