# Chapter 3 – Fiction: Growing Down in the Novels of Maria Edgeworth and Amelia Opie

According to most scholars, the *Bildungsroman* is the central genre of Romantic fiction. The *Bildungsroman*, at least in its ‘classical’ or Romantic form, is the novel of successful self-formation. The protagonist of the Bildungsroman goes out in the world, develops their personality, and finally settles down. Scholars have constructed a canon around a core of optimistic novels—*Wilhelm Meister’s Apprenticeship* (1795-96), *Pride and Prejudice* (1813), *Waverley* (1814)—whose protagonists successfully go out into the world and find a place within it. In such *Bildungsromane*, the protagonist is both free and constrained. They pursue their inclinations while fulfilling their duties. They express their individuality while conforming to the social order. They achieve a ‘balance of harmony with freedom’, as Karl Morgenstern put it in a classic essay from 1820.

In this Chapter, I consider two novels that contradict the concept of the *Bildungsroman*. In Amelia Opie’s *Adeline Mowbray* (1804) and Maria Edgeworth’s *Vivian* (1812), there is no balance between individuality and the social order, between freedom and harmony, between autonomy and authority. To show that these novels were not mere outliers in the period, I analyse them as part of a corpus of 40 novels from the period, including a mixture of classic *Bildungsromane* such as *Camilla* (1796), *Marriage* (1818) and *The Old Manor House* (1793), American network novels such as *Arthur Mervyn* (1799), *The Coquette* (1797) and *Hobomok* (1824), as well as a number of Gothic, Jacobin, Anti-Jacobin, Historical and National Novels. The [files](data/novel-corpus/) and [metadata](data/novel-corpus/manifest.json) can be found in this repository, with the exception of 5 files, identified in the metadata, which are under copyright.

In this Notebook, I generate the tables and figures that can be found in Chapter 3 of *Contingent Selves: Romanticism and the Challenge of Representation*.

In [1]:
from collections import Counter
from itertools import zip_longest, chain
import re

from romanticself import NovelCorpus, TargetedBigramAssocFinder, RobustBigramAssocMeasures, stopword_filter
from nltk.tokenize import wordpunct_tokenize, word_tokenize
from nltk.corpus import stopwords
from tqdm.notebook import tqdm

import pandas as pd
import numpy as np

In [2]:
corpus = NovelCorpus("data/novel-corpus", tokenizer=word_tokenize)

40 novels imported from data/novel-corpus.


## 3.1: Character: Unnecessary Beings

### Figure 3.1 Self-control was a central theme in Romantic fiction

In [6]:
def count_compound_words(corpus, stem, relative=True):
    """Count relative frequencies of words beginning with given stem in corpus.
    
    Arguments:
    - corpus (iterable): iterable of tokenised texts
    - stem (str): words must begin with this stem to be counted
    - relative (bool): divide frequencies by word length of each text?"""

    compound_words = []

    for novel in tqdm(corpus, total=len(corpus)):
        counts = Counter()
        for word in novel:
            if word.startswith(stem):
                counts[word] += 1

        if relative == True:
            wc = len(novel)
            counts = {word:(count/wc * 1000) for word,count in counts.items()}

        compound_words.append(counts)
    
    data_frame = pd.DataFrame.from_dict(compound_words)
    data_frame["title"] = [title[0] for title in corpus.yield_metadata("short_title")]
    data_frame.set_index("title", inplace=True)
    data_frame.fillna(0, inplace=True)

    return data_frame

In [7]:
self_compounds = count_compound_words(corpus, stem="self-")

HBox(children=(FloatProgress(value=0.0, max=40.0), HTML(value='')))




In [8]:
def compound_words_output(compound_words, topn=20, tocsv=False, novels=list()):
    """Extracts readable output of compound_words function.
    
    Arguments:
    - compound_words (pd.DataFrame): output of count_compound_words
    - topn (int): maximum number of words to show in table
    - tocsv (str): csv to save results to. Pass False to avoid
    - novels (iterable): titles of the novels to be included in output"""

    results = dict()
    
    # Results for each novel
    for novel in novels:
        # Get frequencies for that novel
        novel_result = compound_words.loc[novel].sort_values(ascending=False)
        # Filter out compounds that don't appear
        novel_result = novel_result[novel_result > 0]
        # Convert to list and append
        results[novel] = list(zip(novel_result.index, novel_result))

    # Corpus statistics
    corpus_means = compound_words.mean().sort_values(ascending=False)[:topn]
    results["Corpus"] = list(zip(corpus_means.index,corpus_means))
    
    # Save to csv
    if tocsv:
        with open(tocsv, "wt") as file:
            first_row = ",values,".join(results) + ",values\n"
            file.write(first_row)
            for row in zip_longest(*results.values(), fillvalue=","):
                row_str = ",".join(str(val) for val in chain(*row)) + "\n"
                file.write(row_str)

    return results

In [9]:
figure_3_1_data = compound_words_output(compound_words=self_compounds, topn=20, tocsv="figures/figure_3_1.csv", 
                                        novels=["Vivian", "Adeline Mowbray"])

In [10]:
figure_3_1_data["Vivian"]

[('self-love', 0.034636832808008035),
 ('self-willed', 0.023091221872005355),
 ('self-reproach', 0.023091221872005355),
 ('self-esteem', 0.023091221872005355),
 ('self-possession', 0.023091221872005355),
 ('self-confidence', 0.011545610936002678),
 ('self-complacency', 0.011545610936002678),
 ('self-interest', 0.011545610936002678),
 ('self-condemned', 0.011545610936002678),
 ('self-delusion', 0.011545610936002678),
 ('self-command', 0.011545610936002678)]

In [11]:
figure_3_1_data["Adeline Mowbray"]

[('self-love', 0.11361484567316796),
 ('self-command', 0.07574323044877863),
 ('self-denial', 0.04733951903048665),
 ('self-condemned', 0.02840371141829199),
 ('self-reproach', 0.02840371141829199),
 ('self-upbraidings', 0.02840371141829199),
 ('self-reproaches', 0.02840371141829199),
 ('self-congratulations', 0.01893580761219466),
 ('self-condemnation', 0.01893580761219466),
 ('self-possession', 0.00946790380609733),
 ('self-denials', 0.00946790380609733),
 ('self-conceit', 0.00946790380609733),
 ('self-upbraiding', 0.00946790380609733),
 ('self-abasement', 0.00946790380609733)]

In [12]:
figure_3_1_data["Corpus"]

[('self-love', 0.010329789412902843),
 ('self-command', 0.009571516255879124),
 ('self-denial', 0.009155094997213653),
 ('self-reproach', 0.007150806279352373),
 ('self-possession', 0.0060560685199104),
 ('self-complacency', 0.0025994641109693),
 ('self-will', 0.0025958789056707),
 ('self-willed', 0.0025867935941388956),
 ('self-approbation', 0.0025493430118947785),
 ('self-denying', 0.0025321934468357766),
 ('self-respect', 0.0024264348571764946),
 ('self-interest', 0.0024257873477357673),
 ('self-conceit', 0.002210696894631833),
 ('self-condemnation', 0.0021953124027325314),
 ('self-preservation', 0.0019320455573985592),
 ('self-reproaches', 0.0018302123893270989),
 ('self-importance', 0.001824905094586321),
 ('self-satisfied', 0.0017983250699762422),
 ('self-examination', 0.0017818770037734074),
 ('self-satisfaction', 0.0015203004908251408)]

In [13]:
# How many novels use the term 'self-denial'?
self_compounds.astype(bool).sum(axis=0).sort_values(ascending=False)[:20]

self-denial          18
self-reproach        15
self-love            15
self-command         15
self-approbation     11
self-possession      11
self-interest        10
self-delusion         8
self-willed           8
self-satisfied        8
self-denying          8
self-complacency      8
self-conceit          7
self-preservation     7
self-defence          7
self-importance       7
self-condemnation     7
self-satisfaction     7
self-reproaches       6
self-examination      6
dtype: int64

### Figure 3.2 *Vivian* and *Adeline Mowbray* are particularly focussed on the self

In [14]:
per_novel_totals = self_compounds.sum(axis=1).sort_values(ascending=False)
# Export csv for Figure 3.2
per_novel_totals.to_csv("figures/figure_3_2.csv")
# And print the data to the screen
per_novel_totals

title
Self-Control                                                 0.496172
Adeline Mowbray                                              0.435524
Camilla                                                      0.268019
Walsingham                                                   0.216167
Marriage                                                     0.211327
The Cottagers of Glenburnie                                  0.204400
Vivian                                                       0.196275
The Last of the Mohicans                                     0.188307
Memoirs of Emma Courtney                                     0.184533
Anna St. Ives                                                0.169346
Zofloya                                                      0.162507
Caleb Williams                                               0.157361
Persuasion                                                   0.156019
Maria                                                        0.155649
The Recess    

In [15]:
# What is the average use of 'self-' compounds across the corpus?
per_novel_totals.mean()

0.12883125426688008

In [16]:
# Do network novels use 'self-' compounds much?
network = [net[0] for net in corpus.yield_metadata("network")]

network_mean = self_compounds.sum(axis=1)[network].mean()
non_network_mean = self_compounds.sum(axis=1)[[not val for val in network]].mean()

print(f"Network novels' usage of compounds with 'self': {network_mean:.3f}")
print(f"Other novels' usage of compounds with 'self': {non_network_mean:.3f}")

Network novels' usage of compounds with 'self': 0.047
Other novels' usage of compounds with 'self': 0.141


## 3.2: Plot: Marriage Disrupted

### Figure 3.3 Unsanctioned 'connexions'

In [25]:
def count_words(corpus, words):
    """Counts words in corpus, returns data frame of relative frequencies.
    
    Arguments:
    - corpus (NovelCorpus): A romanticself.NovelCorpus
    - words (iterable): Iterable of words to count
    
    Returns:
    - counts (pd.DataFrame): Data frame of relative word counts for selected words"""
    
    word_counts = [len(text) for text in corpus]
    titles = [val[0] for val in corpus.yield_metadata("short_title")]
    data = {}
    
    for novel,title,wc in zip(corpus,titles,word_counts):
        data[title] = {word:novel.count(word)/wc*1000 for word in words}
    
    return pd.DataFrame.from_dict(data, orient="index")

In [50]:
connection = count_words(corpus, ["connection","connexion"])

# Now sum together the frequencies for 'connection' and 'connexion', and sort
connection["Total"] = connection.sum(axis=1)

# Export
connection.sort_values(by="Total", ascending=False).to_csv("figures/figure_3_3.csv")

# Display
connection.sort_values(by="Total", ascending=False)

Unnamed: 0,connection,connexion,Total
The Coquette,0.6425,0.0,0.6425
Vivian,0.0,0.254003,0.254003
Persuasion,0.060007,0.120015,0.180022
Adeline Mowbray,0.0,0.151486,0.151486
Arthur Mervyn,0.147129,0.0,0.147129
Emma,0.0,0.12592,0.12592
Self-Control,0.103596,0.016357,0.119954
Henry,0.111547,0.0,0.111547
Camilla,0.059246,0.047961,0.107208
The Wild Irish Girl,0.0,0.106162,0.106162


In [48]:
# Do the five 'network novels' use the terms more?
connection["Network"] = [net[0] for net in corpus.yield_metadata("network")]
connection.pivot_table(index="Network", aggfunc=np.mean)

Unnamed: 0_level_0,Total,connection,connexion
Network,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
False,0.060457,0.027174,0.033283
True,0.186686,0.157926,0.02876


### Figure 3.4 The meaning of 'connection'

Using collocation networks to study the meaning of 'connection' in the corpus.

In [3]:
# Create a new version of the corpus where 'connexion' is swapped for 'connection'
conn_tokens = []
for novel in corpus:
    conn_tokens.append(['connection' if tkn == 'connexion' else tkn for tkn in novel])

In [4]:
conn_bigrams_all = TargetedBigramAssocFinder.from_corpus(conn_tokens, target="connection", window_size=11)
conn_bigrams_all.apply_ngram_filter(stopword_filter)

In [8]:
top_20 = conn_bigrams_all.score_ngrams(RobustBigramAssocMeasures.likelihood_ratio)[1:21]

In [10]:
first_tier = {}
for bigram in tqdm(top_20):
    target = bigram[0][0]
    bigrams = TargetedBigramAssocFinder.from_corpus(conn_tokens, target=target, window_size=11)
    bigrams.apply_ngram_filter(stopword_filter)
    first_tier[target] = bigrams.score_ngrams(RobustBigramAssocMeasures.likelihood_ratio)[1:21]

HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))




In [12]:
first_tier

{'forming': [(('connection', 'forming'), 160.4711235855931),
  (('plans', 'forming'), 58.24762163314566),
  (('schemes', 'forming'), 48.12883206158964),
  (('establishment', 'forming'), 34.42070034233766),
  (('resolution', 'forming'), 33.064381097632406),
  (('contrast', 'forming'), 31.613442815634315),
  (('strong', 'forming'), 31.225640342106633),
  (('friendships', 'forming'), 29.7057929518642),
  (('rock', 'forming'), 29.607299905637884),
  (('long', 'forming'), 28.081369214717647),
  (('stratagem', 'forming'), 27.466226457316726),
  (('link', 'forming'), 27.19025416556202),
  (('pines', 'forming'), 26.039745982961218),
  (('matrimonial', 'forming'), 25.750570510004977),
  (('pillars', 'forming'), 25.394939530104033),
  (('plan', 'forming'), 25.22832518517775),
  (('conclusions', 'forming'), 24.83990160343791),
  (('root', 'forming'), 24.35267391211222),
  (('irregular', 'forming'), 23.580559227452635),
  (('unconsciously', 'forming'), 23.580559227452635)],
 'formed': [(('resoluti