# Collecting Data Assignment 4

This entire process is worked through with the assistance of "Corpus Analysis with spaCy" by Megan S. Kane. Novels were obtained through purchase or through my educational institution.

## Installing, Importing, and Preprocessing

The corpus contains a total of 8 novels, 4 by classic female romance genre writers and 4 by modern female romance genre writers. The corpus is meant to be useful for comparison between semantic and thematic elements between the early-to-mid 19th century romance literature by female authors to the 21st century. To make the dataset useful, we must first undergo the first step in the process. This includes importing the libraries that will be needed for their various functions. SpaCy is important as it allows for the performance of efficient text processing tasks.

In [3]:
# Install and import spacy and plotly.
!pip install spaCy
!pip install plotly
!pip install nbformat==5.1.2

Collecting nbformat==5.1.2
  Using cached nbformat-5.1.2-py3-none-any.whl (113 kB)
Installing collected packages: nbformat
  Attempting uninstall: nbformat
    Found existing installation: nbformat 5.9.2
    Uninstalling nbformat-5.9.2:
      Successfully uninstalled nbformat-5.9.2
Successfully installed nbformat-5.1.2


Reason for being yanked: Name generation process created inappropriate id values
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
conda-repo-cli 1.0.41 requires requests_mock, which is not installed.
conda-repo-cli 1.0.41 requires clyent==1.2.1, but you have clyent 1.2.2 which is incompatible.
conda-repo-cli 1.0.41 requires nbformat==5.4.0, but you have nbformat 5.1.2 which is incompatible.
conda-repo-cli 1.0.41 requires requests==2.28.1, but you have requests 2.31.0 which is incompatible.
jupyter-server 1.23.4 requires nbformat>=5.2.0, but you have nbformat 5.1.2 which is incompatible.


In [144]:
# Import spacy
import spacy

# Install English language model
!spacy download en_core_web_sm

# Import os to upload documents and metadata
import os

# Load spaCy visualizer
from spacy import displacy

# Import pandas DataFrame packages
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

# Import graphing package
import plotly.graph_objects as go
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.1/12.8 MB 3.3 MB/s eta 0:00:04
     - -------------------------------------- 0.4/12.8 MB 4.5 MB/s eta 0:00:03
     -- ------------------------------------- 0.7/12.8 MB 5.7 MB/s eta 0:00:03
     ---- ----------------------------------- 1.5/12.8 MB 8.6 MB/s eta 0:00:02
     ------ --------------------------------- 2.2/12.8 MB 10.1 MB/s eta 0:00:02
     ---------- ----------------------------- 3.2/12.8 MB 12.1 MB/s eta 0:00:01
     ------------- -------------------------- 4.3/12.8 MB 13.8 MB/s eta 0:00:01
     --------------- ------------------------ 5.1/12.8 MB 14.1 MB/s eta 0:00:01
     ------------------- -------------------- 6.1/12.8 MB 15.6 MB/s eta 0:00:01
     --------------------- ---------

Next, we must create locations for the data under the filename 'txt_files' to be stored within. This folder contains the content of all of the novels in this format: (FirstInitial)Lastname_NovelKeywordOrTitle.

In [115]:
import os

# Create empty lists for file names and contents
texts = []
file_names = []

# Specify the full path to the 'inaugural_txt_files' folder on the C: drive
novels = r'C:/txt_files'

if os.path.exists(folder_path) and os.path.isdir(folder_path):
    # Iterate through each file in the folder
    for _file_name in os.listdir(folder_path):
        # Look for only text files
        if _file_name.endswith('.txt'):
            # Append contents of each text file to text list
            file_path = os.path.join(folder_path, _file_name)
            texts.append(open(file_path, 'r', encoding='utf-8').read())
            # Append name of each file to file name list
            file_names.append(_file_name)
else:
    print(f"The folder '{folder_path}' does not exist.")

A dictionary is now necessary in order to create our first dataframe consisting of the file names and text within the novels, which we'll simply call "Novel". The file name will remain file_name for now so that we can later merge with the csv file.

In [116]:
d = {'file_name':file_names,'Novel':texts}

In [117]:
novels_df = pd.DataFrame(d)

Now, we can ensure that the corpus has successfully been converted into a dataframe by using the .head() command to give us the first 5 rows.

In [118]:
novels_df.head()

Unnamed: 0,file_name,Novel
0,Austen_Pride.txt,Pride and Prejudice\nby Jane Austen\nChapter ...
1,Austen_Sense.txt,SENSE AND SENSIBILITY\nby Jane Austen\n(1811)\...
2,Bailey_FixHerUp.txt,No freaking way.\nGeorgette Castle tucked the ...
3,Bailey_OneSummer.txt,The unthinkable was happening.\nHer longest re...
4,CBronte_Jane.txt,Jane Eyre\nby Charlotte Bronte\nPREFACE\nA pr...


We can now ensure that the text is cleaner by replacing line breaks \n with spaces. 

In [98]:
# Remove extra spaces from text
novels_df['Novel'] = novels_df['Novel'].str.replace('\s+', ' ', regex=True).str.strip()
novels_df.head()

Unnamed: 0,file_name,Novel
0,Austen_Pride.txt,Pride and Prejudice by Jane Austen Chapter 1 I...
1,Austen_Sense.txt,SENSE AND SENSIBILITY by Jane Austen (1811) CH...
2,Bailey_FixHerUp.txt,No freaking way. Georgette Castle tucked the s...
3,Bailey_OneSummer.txt,The unthinkable was happening. Her longest rel...
4,CBronte_Jane.txt,Jane Eyre by Charlotte Bronte PREFACE A prefac...


In order to merge the corpus dataframe with the metadata in the next step, it is imperative that the file_names in the file_name column are the same. Therefore, we must clean the titles up by removing the extension .txt.

In [99]:
# Remove .txt from title of each paper
novels_df['file_name'] = novels_df['file_name'].str.replace('.txt', '', regex=True)
novels_df.head()

Unnamed: 0,file_name,Novel
0,Austen_Pride,Pride and Prejudice by Jane Austen Chapter 1 I...
1,Austen_Sense,SENSE AND SENSIBILITY by Jane Austen (1811) CH...
2,Bailey_FixHerUp,No freaking way. Georgette Castle tucked the s...
3,Bailey_OneSummer,The unthinkable was happening. Her longest rel...
4,CBronte_Jane,Jane Eyre by Charlotte Bronte PREFACE A prefac...


We then access the metadata csv file through the proper filepath and ensure that the columns are separated by commas.

In [100]:
# Read the CSV file into a DataFrame
metadata_df = pd.read_csv('C:/Users/josie/OneDrive/Metadata_NovelsFINAL.csv', sep=', ')

# Display the first few rows of the DataFrame
metadata_df.head()
# Assuming your DataFrame is named df
metadata_df.replace('"', '', regex=True, inplace=True)

# Remove double quotes from column names
metadata_df.columns = metadata_df.columns.str.replace('"', '')

# Display the modified DataFrame
print(metadata_df)

           file_name       author_name  year_published author_type
0       Austen_Pride       Jane Austen            1813     Classic
1       Austen_Sense       Jane Austen            1811     Classic
2    Bailey_FixHerUp      Tessa Bailey            2019      Modern
3   Bailey_OneSummer      Tessa Bailey            2021      Modern
4  CBronte_Professor  Charlotte Bronte            1857     Classic
5       CBronte_Jane  Charlotte Bronte            1847     Classic
6   Hoover_November9    Colleen Hoover            2015      Modern
7    Hoover_UglyLove    Colleen Hoover            2014      Modern


  metadata_df = pd.read_csv('C:/Users/josie/OneDrive/Metadata_NovelsFINAL.csv', sep=', ')


Finally, we merge the metadata csv file and the corpus .txt files to create a dataframe and assign it to a new variable: novels_authors_df.

In [102]:
# Will only keep rows where both speech and metadata are present
novels_authors_df = metadata_df.merge(novels_df,on='file_name')
novels_authors_df.head()

Unnamed: 0,file_name,author_name,year_published,author_type,Novel
0,Austen_Pride,Jane Austen,1813,Classic,Pride and Prejudice by Jane Austen Chapter 1 I...
1,Austen_Sense,Jane Austen,1811,Classic,SENSE AND SENSIBILITY by Jane Austen (1811) CH...
2,Bailey_FixHerUp,Tessa Bailey,2019,Modern,No freaking way. Georgette Castle tucked the s...
3,Bailey_OneSummer,Tessa Bailey,2021,Modern,The unthinkable was happening. Her longest rel...
4,CBronte_Professor,Charlotte Bronte,1857,Classic,THE PROFESSOR by (AKA Charlotte Bronte) Currer...


Below are checks and bug fixes to ensure that the utf-8 formatting is uniform and is not causing any issues.

In [146]:
import csv

file_path = r'C:/Users/josie/OneDrive/Metadata_NovelsFINAL.csv'

# Open the file and print the content
with open(file_path, 'r', encoding='utf-8') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

['file_name, author_name, year_published, author_type']
['Austen_Pride, Jane Austen, 1813, Classic']
['Austen_Sense, Jane Austen, 1811, Classic']
['Bailey_FixHerUp, Tessa Bailey, 2019, Modern']
['Bailey_OneSummer, Tessa Bailey, 2021, Modern']
['CBronte_Professor, Charlotte Bronte, 1857, Classic']
['CBronte_Jane, Charlotte Bronte, 1847, Classic']
['Hoover_November9, Colleen Hoover, 2015, Modern']
['Hoover_UglyLove, Colleen Hoover, 2014, Modern']


In [106]:
file_path = r'C:/Users/josie/OneDrive/Documents/Metadata_Novels.csv'

# Read the file content and remove the BOM
with open(file_path, 'r', encoding='utf-8-sig') as file:
    content = file.read()

# Write the content back to the file without the BOM
with open(file_path, 'w', encoding='utf-8') as file:
    file.write(content)

# Now read the CSV into a DataFrame
metadata_df = pd.read_csv(file_path)

# # Text Enrichment

SpaCy contains tools that allow us to enrich the text by breaking it into pieces and assigning it tags and labels that are useful for a number of diverse purposes. Their Trained Models and Pipelines allow us to complete a number of tasks. First, we install and test the pipeline with a command.

In [107]:
# Load nlp pipeline
nlp = spacy.load('en_core_web_sm')

# Check what functions it performs
print(nlp.pipe_names)

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']


In [108]:
def process_text(text):
    return nlp(text)  

Now we can run a simple test with a sentence to see how it tags the parts of speech, one of the functions installed in the previous steps.

In [109]:
sentence = "Novels just aren't as romantic as they used to be!"

# Call the nlp model on the sentence
doc = nlp(sentence)

In [110]:
# Loop through each token in doc object
for token in doc:
    # Print text and part of speech for each
    print(token.text, token.pos_)

Novels NOUN
just ADV
are AUX
n't PART
as ADV
romantic ADJ
as SCONJ
they PRON
used VERB
to PART
be AUX
! PUNCT


Everything works properly. SpaCy's functions should work for the next tasks in the project. Because we are working with a large corpus, we'll increase the max_length limit.

In [120]:
novels_authors_df['Doc'] = novels_authors_df['Novel'].apply(process_text)

In [119]:
import spacy

# Load spaCy model
nlp = spacy.load('en_core_web_sm')

# Increase max_length limit
nlp.max_length = 1500000  # or any value that suits your text length

# Process your text
doc = nlp(novels)

# # Tokenization

Tokenization can separate a body of text to its individual words and punctuation markers. This allows the system to loop through these words and treat them each as their own entity to be classified and organized.

In [121]:
def get_token(doc):
    return [(token.text) for token in doc]

In [122]:
novels_authors_df['Tokens'] = novels_authors_df['Doc'].apply(get_token)

In [123]:
novels_authors_df['Tokens'] = novels_authors_df['Doc'].apply(get_token)
novels_authors_df.head()

Unnamed: 0,file_name,author_name,year_published,author_type,Novel,Doc,Tokens
0,Austen_Pride,Jane Austen,1813,Classic,Pride and Prejudice by Jane Austen Chapter 1 I...,"(Pride, and, Prejudice, by, Jane, Austen, Chap...","[Pride, and, Prejudice, by, Jane, Austen, Chap..."
1,Austen_Sense,Jane Austen,1811,Classic,SENSE AND SENSIBILITY by Jane Austen (1811) CH...,"(SENSE, AND, SENSIBILITY, by, Jane, Austen, (,...","[SENSE, AND, SENSIBILITY, by, Jane, Austen, (,..."
2,Bailey_FixHerUp,Tessa Bailey,2019,Modern,No freaking way. Georgette Castle tucked the s...,"(No, freaking, way, ., Georgette, Castle, tuck...","[No, freaking, way, ., Georgette, Castle, tuck..."
3,Bailey_OneSummer,Tessa Bailey,2021,Modern,The unthinkable was happening. Her longest rel...,"(The, unthinkable, was, happening, ., Her, lon...","[The, unthinkable, was, happening, ., Her, lon..."
4,CBronte_Professor,Charlotte Bronte,1857,Classic,THE PROFESSOR by (AKA Charlotte Bronte) Currer...,"(THE, PROFESSOR, by, (, AKA, Charlotte, Bronte...","[THE, PROFESSOR, by, (, AKA, Charlotte, Bronte..."


Now that we can see that the Tokens are stored in their own column, lets create a variable that stores a simpler table with only the original text and the tokens to compare the difference between the two bodies of text.

In [125]:
tokens = novels_authors_df[['Novel', 'Tokens']].copy()
tokens.head()

Unnamed: 0,Novel,Tokens
0,Pride and Prejudice by Jane Austen Chapter 1 I...,"[Pride, and, Prejudice, by, Jane, Austen, Chap..."
1,SENSE AND SENSIBILITY by Jane Austen (1811) CH...,"[SENSE, AND, SENSIBILITY, by, Jane, Austen, (,..."
2,No freaking way. Georgette Castle tucked the s...,"[No, freaking, way, ., Georgette, Castle, tuck..."
3,The unthinkable was happening. Her longest rel...,"[The, unthinkable, was, happening, ., Her, lon..."
4,THE PROFESSOR by (AKA Charlotte Bronte) Currer...,"[THE, PROFESSOR, by, (, AKA, Charlotte, Bronte..."


# # Lemmatization

Lemmatization allows us to specify a "root word" (in this example: love would be counted in the words loving, loves, loved, etc.). This allows us to achieve a proper word count which can help greatly in a thematic, stylistic, and semantic analysis. In this process, we define a function and follow a similar process.

In [126]:
# Define a function to retrieve lemmas from a doc object
def get_lemma(doc):
    return [(token.lemma_) for token in doc]

# Run the lemma retrieval function on the doc objects in the dataframe
novels_authors_df['Lemmas'] = novels_authors_df['Doc'].apply(get_lemma)

In [128]:
print(f'"love" appears in the speech tokens column ' + str(novels_authors_df['Tokens'].apply(lambda x: x.count('love')).sum()) + ' times.')
print(f'"love" appears in the lemmas column ' + str(novels_authors_df['Lemmas'].apply(lambda x: x.count('love')).sum()) + ' times.')

"love" appears in the speech tokens column 837 times.
"love" appears in the lemmas column 1076 times.


The test worked. In the lemmatization process, words are taken for their overall root word instead of simply their exact character formation, which can be immensely useful for analysis.

# # Part-of-Speech Tagging

This spaCy function tags the parts of speech of each token based on the universal standard (pronoun, adjective, verb, noun, adverb, etc.). Let's first define a function, run through the tokens, and create a list of the part-of-speech tags from our merged dataframe novels_authors_df.

In [129]:
# Define a function to retrieve lemmas from a doc object
def get_pos(doc):
    #Return the coarse- and fine-grained part of speech text for each token in the doc
    return [(token.pos_, token.tag_) for token in doc]

# Define a function to retrieve parts of speech from a doc object
novels_authors_df['POS'] = novels_authors_df['Doc'].apply(get_pos)

In [130]:
# Create a list of part of speech tags
list(novels_authors_df['POS'])

[[('NOUN', 'NN'),
  ('CCONJ', 'CC'),
  ('PROPN', 'NNP'),
  ('ADP', 'IN'),
  ('PROPN', 'NNP'),
  ('PROPN', 'NNP'),
  ('NOUN', 'NN'),
  ('NUM', 'CD'),
  ('PRON', 'PRP'),
  ('AUX', 'VBZ'),
  ('DET', 'DT'),
  ('NOUN', 'NN'),
  ('ADV', 'RB'),
  ('VERB', 'VBN'),
  ('PUNCT', ','),
  ('SCONJ', 'IN'),
  ('DET', 'DT'),
  ('ADJ', 'JJ'),
  ('NOUN', 'NN'),
  ('ADP', 'IN'),
  ('NOUN', 'NN'),
  ('ADP', 'IN'),
  ('DET', 'DT'),
  ('ADJ', 'JJ'),
  ('NOUN', 'NN'),
  ('PUNCT', ','),
  ('AUX', 'MD'),
  ('AUX', 'VB'),
  ('ADP', 'IN'),
  ('NOUN', 'NN'),
  ('ADP', 'IN'),
  ('DET', 'DT'),
  ('NOUN', 'NN'),
  ('PUNCT', '.'),
  ('ADV', 'RB'),
  ('ADJ', 'JJ'),
  ('VERB', 'VBN'),
  ('DET', 'DT'),
  ('NOUN', 'NNS'),
  ('CCONJ', 'CC'),
  ('NOUN', 'NNS'),
  ('ADP', 'IN'),
  ('DET', 'PDT'),
  ('DET', 'DT'),
  ('NOUN', 'NN'),
  ('AUX', 'MD'),
  ('AUX', 'VB'),
  ('ADP', 'IN'),
  ('PRON', 'PRP$'),
  ('ADJ', 'JJ'),
  ('VERB', 'VBG'),
  ('DET', 'DT'),
  ('NOUN', 'NN'),
  ('PUNCT', ','),
  ('DET', 'DT'),
  ('NOUN', 'NN'),
 

We can also check what a specific codename or acronym for the parts of speech are...

In [131]:
spacy.explain("PROPN")

'proper noun'

then, we can define a function that pulls all tokens of this specific type. Lets test that now.

In [132]:
# Define function to extract proper nouns from Doc object
def extract_proper_nouns(doc):
    return [token.text for token in doc if token.pos_ == 'PROPN']

# Apply function to Doc column and store resulting proper nouns in new column
novels_authors_df['Proper_Nouns'] = novels_authors_df['Doc'].apply(extract_proper_nouns)

In [134]:
list(novels_authors_df.loc[[0, 7], 'Proper_Nouns'])

[['Prejudice',
  'Jane',
  'Austen',
  'Mr.',
  'Bennet',
  'Netherfield',
  'Park',
  'Mr.',
  'Bennet',
  'Mrs.',
  'Long',
  'Mr.',
  'Bennet',
  'Mrs.',
  'Long',
  'Netherfield',
  'England',
  'Monday',
  'Mr.',
  'Morris',
  'Michaelmas',
  'Bingley',
  'Single',
  'Mr.',
  'Bennet',
  'Design',
  'Nonsense',
  'Mr.',
  'Bingley',
  'Mr.',
  'Bingley',
  'Sir',
  'William',
  'Lady',
  'Lucas',
  'Mr.',
  'Bingley',
  'Lizzy',
  'Lizzy',
  'Jane',
  'Lydia',
  'Lizzy',
  'Mr.',
  'Bennet',
  'Mr.',
  'Bennet',
  'Mr.',
  'Bennet',
  'Mr.',
  'Bingley',
  'Mr.',
  'Bingley',
  'Lizzy',
  'Mr.',
  'Bingley',
  'Elizabeth',
  'Mrs.',
  'Long',
  'Mrs.',
  'Long',
  'Mr.',
  'Bennet',
  'Mrs.',
  'Bennet',
  'Kitty',
  'Heaven',
  'Kitty',
  'Kitty',
  'Lizzy',
  'Aye',
  'Mrs.',
  'Long',
  'Mr.',
  'Bingley',
  'Mr.',
  'Bennet',
  'Mrs.',
  'Long',
  'Mrs.',
  'Bennet',
  'Mary',
  'Mary',
  'Mary',
  'Mr.',
  'Bingley',
  'Mr.',
  'Bingley',
  'Mrs.',
  'Bennet',
  'Mr.',
  'Ben

# # Named Entity Recognition

SpaCy can tag 'entities' like names, organizations, places, and dates. Lets get a description of all these labels and what they mean.

In [135]:
labels = nlp.get_pipe("ner").labels

# Print each label and its description
for label in labels:
    print(label + ' : ' + spacy.explain(label))

CARDINAL : Numerals that do not fall under another type
DATE : Absolute or relative dates or periods
EVENT : Named hurricanes, battles, wars, sports events, etc.
FAC : Buildings, airports, highways, bridges, etc.
GPE : Countries, cities, states
LANGUAGE : Any named language
LAW : Named documents made into laws.
LOC : Non-GPE locations, mountain ranges, bodies of water
MONEY : Monetary values, including unit
NORP : Nationalities or religious or political groups
ORDINAL : "first", "second", etc.
ORG : Companies, agencies, institutions, etc.
PERCENT : Percentage, including "%"
PERSON : People, including fictional
PRODUCT : Objects, vehicles, foods, etc. (not services)
QUANTITY : Measurements, as of weight or distance
TIME : Times smaller than a day
WORK_OF_ART : Titles of books, songs, etc.


Now let's define a function that will grab the labels for all the relevant words.

In [137]:
# Define function to extract named entities from doc objects
def extract_named_entities(doc):
    return [ent.label_ for ent in doc.ents]

# Apply function to Doc column and store resulting named entities in new column
novels_authors_df['Named_Entities'] = novels_authors_df['Doc'].apply(extract_named_entities)
novels_authors_df['Named_Entities']

0    [PERSON, ORDINAL, CARDINAL, PERSON, FAC, PERSO...
1    [PERSON, DATE, LAW, GPE, GPE, FAC, DATE, DATE,...
2    [PERSON, PERSON, GPE, DATE, PERSON, EVENT, ORG...
3    [DATE, PERSON, CARDINAL, PERSON, GPE, NORP, PE...
4    [PERSON, WORK_OF_ART, PERSON, ORDINAL, ORDINAL...
5    [PERSON, PERSON, ORDINAL, WORK_OF_ART, ORDINAL...
6    [ORDINAL, DATE, ORG, ORG, GPE, DATE, TIME, PER...
7    [GPE, DATE, DATE, CARDINAL, CARDINAL, DATE, OR...
Name: Named_Entities, dtype: object

In [139]:
# Define function to extract text tagged with named entities from doc objects
def extract_named_entities(doc):
    return [ent for ent in doc.ents]

# Apply function to Doc column and store resulting text in new column
novels_authors_df['NE_Words'] = novels_authors_df['Doc'].apply(extract_named_entities)
novels_authors_df['NE_Words']

0    [(Jane, Austen, Chapter), (first), (some, one)...
1    [(Jane, Austen), (1811), (CHAPTER, 1), (Dashwo...
2    [(Georgette, Castle), (Georgie), (Stephen), (J...
3    [(Three, weeks), (Piper, Bellinger), (one), (V...
4    [(Charlotte, Bronte), (Jane, Eyre), (Shirley),...
5    [(Jane, Eyre), (Charlotte, Bronte), (first), (...
6    [(First), (November, 9th), (BENTON, JAMES, KES...
7    [(n’t), (a, day), (less, than, eighty, years, ...
Name: NE_Words, dtype: object

Finally, we can truly bolster our corpus' usefulness by going though and tagging the novels. Lets first do that with a classic novel! [0] will refer to the first item on the corpus listed on the csv file.

In [140]:
# Extract the first Doc object
doc = novels_authors_df['Doc'][0]

# Visualize named entity tagging in a single paper
displacy.render(doc, style='ent', jupyter=True)

Let's also try it with one of our modern novels.

In [142]:
doc2 = novels_authors_df['Doc'][3]

displacy.render(doc2, style='ent', jupyter=True)

The final step is to save our file with an easy to reference filename!

In [141]:
# Use this step only to save csv to your computer's working directory
novels_authors_df.to_csv('Tagged_Romantic_Novels.csv')