# Exploring Narrative Agency in Women’s Fiction (1800–1950)

This notebook analyzes how women and gendered subjects are represented in fiction written by women (and a few key aliases) between 1800 and 1950.

Using Natural Language Processing (NLP), we:

- Extract subject–verb structures to capture **who does what** in the narrative.
- Analyze how often women characters are described as **agentic**, **passive**, or **neutral** in their actions.
- Visualize patterns across books, characters, and (eventually) authors or decades.

Our goal is to explore **literary agency** and feminist themes in historical fiction — from _Pride and Prejudice_ to _The Awakening_ — and uncover how these texts portrayed women taking (or losing) control of their lives.


# Step 1

## Import Required Libraries
We start by importing the necessary Python libraries for text processing, natural language parsing, counting, and visualization.

In [1]:
import pandas as pd
import spacy
from collections import defaultdict, Counter
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm

## Load spaCy Language Model

We begin by loading the `en_core_web_sm` model from [spaCy](https://spacy.io/), a popular NLP library.

This small English model is fast and lightweight, yet still powerful enough for:
- **Tokenization**
- **Part-of-speech tagging**
- **Dependency parsing**
- **Named Entity Recognition (NER)**

It's the core engine that allows us to analyze sentence structure and extract subject–verb–object relationships in the texts.

In [2]:
nlp = spacy.load("en_core_web_sm")  # MUCH faster + still good enough for dependency parsing
# Now works properly in your Python 3.10 env

print("✅ spaCy model loaded!")

✅ spaCy model loaded!


## Load processed Text Data

We begin by loading a pre-processed CSV file containing text chunks from various books written by or associated with women between 1800 and 1950. This dataset includes:

- **`text`**: The actual excerpt from the book.
- **`sentiment`** *(optional)*: Placeholder for sentiment analysis scores (currently empty).
- **`entities`**: Named entities extracted using a spaCy model, especially focused on identifying character names and roles.
- **`verbs`**: A list of verbs associated with the given text chunk (e.g., actions performed by or affecting characters).
- **`title`**: The book title from which the text chunk is taken.

This table provides the foundation for subsequent analysis steps like character agency tracking, thematic verb clustering, and feminist pattern visualization across books, characters, and time.


In [None]:
df = pd.read_csv(r"/content/drive/MyDrive/AI project/processed_book_chunks.csv")
df.head()

Unnamed: 0,text,sentiment,entities,verbs,title
0,[Illustration: G...,,[('GEORGE ALLEN ...,read,Pride and Prejudice
1,[Illustration: _To ...,,"[('J. Comyns Carr', 'PERSON'), ('Hugh Thomson'...","owe, inscribe, have, love, love, apply, bring,...",Pride and Prejudice
2,"critical facts that its scale is small, and it...",,"[('first', 'ORDINAL'), ('Mansfield Park', 'FAC...","reach, have, exalt, destroy, admit, take, shoc...",Pride and Prejudice
3,"larger, the more varied, the more popular; the...",,"[('Bates', 'PERSON'), ('Eltons', 'PERSON'), ('...","see, improve, unite, declare, seem, permit, pr...",Pride and Prejudice
4,"not elaborate, is almost regular enough for Fi...",,"[('Fielding', 'PERSON'), ('Lydia', 'PERSON'), ...","retrench, connect, bring, fit, be, hide, seek,...",Pride and Prejudice


## Define a function to extract the PERSON entities from the data

In [None]:
def extract_person_entities(text):
    doc = nlp(text)
    return list(set([ent.text for ent in doc.ents if ent.label_ == "PERSON"]))


In [None]:
df["person_entities_spacy"] = df["text"].apply(extract_person_entities)

In [None]:
from collections import Counter

all_names = [name for sublist in df["person_entities_spacy"] for name in sublist]
name_counts = Counter(all_names)

df_entity_counts = pd.DataFrame(name_counts.most_common(), columns=["Name", "Count"])
df_entity_counts.head(30)


Unnamed: 0,Name,Count
0,Jo,548
1,Maggie,517
2,Jane,498
3,Emma,476
4,Elizabeth,464
5,Tom,446
6,Dorothea,427
7,Amy,353
8,Casaubon,342
9,Laurie,331
