# Novel Quotation Toolkit

## Introduction

The code below offers an approach to normalizing and parsing quotations in novels.


## Dependencies

1. *Counter* from *collections* is used to turn lists with redunant entries into count dictionaries. (i.e. ["house", "dog", "house"] --> {"house":2, "dog":1}

2. *csv* is used to read and write CSV files.

3. *math* is used for the square root function.

4. *requests* is used to download files from Project Gutenberg.

5. *spacy* is used for Depency Parsing and POS tagging.

In [None]:
from collections import Counter
import csv
import math
import requests
import spacy
nlp = spacy.load("en_core_web_sm")

In [None]:
#Takes txt files from Project Gutenberg, extracts text, and then splits text by newline characters
def response2Text(response):
    return(response.text.splitlines())

In [None]:
#Middlemarch
middlemarch = response2Text(requests.get("https://gutenberg.org/cache/epub/145/pg145.txt"))

#Dracula
dracula = response2Text(requests.get("https://gutenberg.org/cache/epub/345/pg345.txt"))

#Frankenstein
frankenstein = response2Text(requests.get("https://gutenberg.org/cache/epub/84/pg84.txt"))

#Jane Eyre
janeEyre = response2Text(requests.get("https://gutenberg.org/cache/epub/1260/pg1260.txt"))

#Mary Barton
maryBarton = response2Text(requests.get("https://gutenberg.org/cache/epub/2153/pg2153.txt"))

## Checking Quote Convention

The function below looks at each text and measures the prevalence of each quote character style. It then defines a global variable **quotes** that is referenced for the text.

If comparing multiple texts, make sure to check the quote convention for each text.

In [None]:
def checkQuoteConvention(text):
  quoteCountSlant = 0
  quoteCountStraight = 0
  for i in text:
    if i.startswith("“") or i.startswith("‘"):
      quoteCountSlant += 1
    elif i.startswith("\"") or i.startswith("\'"):
      quoteCountStraight += 1
  if quoteCountSlant > quoteCountStraight:
    print("Slant quotes are more prevalent with {0} hits compared to {1} hits".format(str(quoteCountSlant),str(quoteCountStraight)))
    return(("“", "”"))
  else:
    print("Straight quotes are more prevalent with {0} hits compared to {1} hits".format(str(quoteCountStraight),str(quoteCountSlant)))
    return(("\"", "\""))

In [None]:
checkQuoteConvention(janeEyre)

In [None]:
checkQuoteConvention(frankenstein)

In [None]:
checkQuoteConvention(middlemarch)

In [None]:
#To set global variable
quotes = checkQuoteConvention(middlemarch)

## Finding Books, Chapters, and More



### checkDiv Function

The checkDiv function looks for headings in a text. If it's successful, it will return a list with indices for each appearance of a search term (called **checkTerm**): i.e. "chapter" --> [614, 800...,4326,4599] where each index will correspond to the term: i.e ["Chapter 1", "Chapter 2"..."Chapter 56", "Chapter The Last"].

The checkDiv function works by trying to match the checkTerm with the first word in every line. Since it's possible for a term like "chapter" or "volume" to start a line and not be a header, the firstPassFirstTerms list only takes instances where the first character is uppercase. As a secondary precaution, a termCount is used to find the most common formatting of a term across the text. For example, if in firstPassFirstTerms we had ["LETTER", "LETTER", "Letter", "LETTER", "LETTER", "LETTER", "Letter"] it would return "LETTER".

In [None]:
def checkDiv(checkTerm, text):
  firstPass = [i for i in range(len(text)) if text[i].lower().startswith(checkTerm)]
  splitTerms = [text[i].split() for i in firstPass]
  firstPassFirstTerms =[i[0] for i in splitTerms if i[0][0].isupper()]
  if len(firstPassFirstTerms) > 0:
      termCount = Counter(firstPassFirstTerms)
      commonTerm =max(termCount.keys(), key=lambda key: termCount[key]) 
      result = [i for i in firstPass if text[i].startswith(commonTerm)]
      return(result)
  else:
      print(checkTerm + " is not a divider")
      return(None)

In [None]:
#Example 1
checkDiv("chapter", frankenstein)

In [None]:
#Example 2
checkDiv("book", frankenstein)

In [None]:
#Example 3
checkDiv("chapter", frankenstein)

### compareDiv Function

*Real Life Goal of Function:* See the most prevalent structuring of a text by comparing two candidate terms (called **checkTerm**). Here "volume" and "book". If neither are prevalent in the text, the function should skip this category entirely and work solely on dividing chapters.


**getDivDist** Transforms a list of term indexs [i.e. "chapter" --> [614, 800...,4326,4599]] into a list of tuples that measure the start and end of each term: [i.e. [614, 800...,4326,4599] --> [(0,613),(614,799)...(4326,4598)]. In other words if text[614] == "Chapter 1" then getDivDist covers the entirety of Chapter 1 (text[614:799]).

**checkDivDist** Returns the Standard Deviation of a list of integers. Typically Chapters and Volumes are a similar enough length that they should only deviate by fewer than 1000 characters. (NOTE: this standard deviation isn't being normalized by text length so keep this in mind when comparing different novels/authors)

**smallestDivDist** If both candidate terms are valid, the function retuns the one with the smallest standard of deviation. The intution here: i.e. when "book" --> [300, 800, 1300,1800,2199] and "vol" --> [949,1120,1837] "book will be preferred.

**sanityCheckDiv** Prints the results of **getDivDist** and lets user determine if the convention looks right.

**falsify** Returns checkTerm if **sanityCheckDiv** is passed.

**compareDiv** This is a "switch" function that evaluates both candidate terms and returns either the most prevalent structual term or an "N/A" if none of the candidates seem promising



In [None]:
def getDivDist(lst,end):
  result= []
  for i in range(len(lst)-1):
    result.append((lst[i], lst[i+1]-1))
  result.append((lst[-1],end))
  return(result)

In [None]:
def checkDivDist(lst):
  dist = [i[1]-i[0] for i in lst]
  average_dist = sum(dist)/len(dist)
  diff_dist = math.sqrt(sum([(i - average_dist) **2 for i in dist]) / len(dist))
  return(diff_dist)

In [None]:
def smallestDivDist(checkTerm1, checkTerm2, text):
  checkTerm1DivDist = checkDivDist(getDivDist(checkDiv(checkTerm1, text),len(text)))
  checkTerm2DivDist = checkDivDist(getDivDist(checkDiv(checkTerm2, text),len(text)))
  if checkTerm1DivDist < checkTerm2DivDist:
    return(checkTerm1)
  else:
    return(checkTerm2)

In [None]:
def sanityCheckDiv(checkTerm, text):
    for i in checkDiv(checkTerm,text):
        print(text[i])
    result = input("Does this look right?(Y/N)")
    if result.upper() == "Y":
        return True
    else:
        return False

In [None]:
def falsify(checkTerm,_bool):
    if _bool == True:
        return checkTerm
    else:
        return("N/A")

In [None]:
def compareDiv(checkTerm1, checkTerm2, text):
  t1 = checkDiv(checkTerm1, text)
  t2 = checkDiv(checkTerm2, text)
  if t1 != None and t2 != None:
    result = smallestDivDist(checkTerm1, checkTerm2, text)
    checker =sanityCheckDiv(result, text)
    return(falsify(result,checker))
  elif t1 == None and t2 ==None:
    return("N/A")
  elif t1 == None and t2 != None:
    checker = sanityCheckDiv(checkTerm2, text)
    return(falsify(checkTerm2,checker))
  elif t1 != None and t2 == None:
    checker = sanityCheckDiv(checkTerm1, text)
    return(falsify(checkTerm1,checker))

In [None]:
#Example 1
compareDiv("book","volume", middlemarch)

In [None]:
#Example 2
compareDiv("book", "vol", janeEyre)

In [None]:
#Example 3
compareDiv("book", "volume", janeEyre)

## Building Paragraph Structure

This section builds to the **findDiv** function, which takes text as input and attempts output a nested dictionary of *Structure*, *Chapter*, and *Paragraph* tuples.

**compareDiv**, as described above, is used to determine a novel's *Structure*, which here means whether it's divided into volumes, books or neither.  This yields *strucTerm*, the string that represents predominant, outermost division of the text. [Note: if **compareDiv** fails it will return the first line of the text later on].

**checkDiv** is used on both *StructTerm* and "chapter" to yield *strucDiv* and *chapDiv*. These are both lists of tuples that show where a *Structure* and a *Chapter* begin and end.

**combDiv**  takes the information from *strucDiv* and *chapDiv* to create a nested dictionary {structureTuple{chapterTuple{paragraphTuple}}.

**makePara** creates a new paragraph with every line break in a chapter and returns a tuple with index infromation.


In [None]:
def combDiv(strucDiv, chapDiv, text):
  result ={}
  if strucDiv == None:
    chaps = getDivDist(chapDiv, len(text))
    paraChaps = {chap: makePara(chap,text) for chap in chaps}
    result[(0,len(text))] = paraChaps
    return(result)
  else:
    strucRange = getDivDist(strucDiv, len(text))
    for i in strucRange:
        chaps = getDivDist([x for x in chapDiv if x > i[0] and x < i[1]], i[1])
        paraChaps = {chap: makePara(chap,text) for chap in chaps}
        result[i] = paraChaps
  return(result)

In [None]:
def makePara(chap,text):
  result =[]
  start = chap[0]
  for i in range(chap[0], chap[1]):
    if text[i] == "":
      result.append((start,end))
      start = i
    else:
      end = i
  return(result)

In [None]:
def findDiv(text):
  strucTerm = compareDiv("book", "volume", text)
  if strucTerm != "N/A":
      strucDiv = checkDiv(strucTerm, text)
  else:
      strucDiv = None
  chapDiv = checkDiv("chap", text)
  strucChapParaNest = combDiv(strucDiv, chapDiv, text)
  return(strucChapParaNest)

In [None]:
#Example 1
findDiv(janeEyre)

In [None]:
#Example 2
findDiv(middlemarch)

In [None]:
#Example 3
findDiv(frankenstein)

## prepCSV
This section shows how to take the nested structure-chapter-paragraph dictionary from the **findDiv** function and create a list of lists that can easily be converted to a CSV

**strucRow, chapRow,** and **paraRow** are identical functions apart from **paraRow** not having a children count; each function creates a row the describes 1.) the number of the structure, chapter, or paragraph 2.) the index for where that structure, chapter, or paragraph starts 3.) the index for where that structure, chapter, or paragraph ends and 4.) the number of children a structure or chapter have. Each function returns a list.

**allText** uses the indices from these from the **findDiv** function to provide text. I.E. if **findDiv** reads {(213,2000):{{(215:316),...:{(215,:217),...} as ["Volume I.", "Chapter I.", "It was the best of times, it was the worst of times"]

**makeHeader** generates a header using the first row of the list.

**prepCSV** goes paragraph by paragraph to add a list/row to the list of lists. List cotains results from **strucRow, chapRow, paraRow,** and **allText**.



In [None]:
def strucRow(struc, strucCount,stucChapParaNext):
  row = []
  row.append(strucCount)
  row.append(struc[0])
  row.append(struc[1]) 
  row.append(len(stucChapParaNext[struc]))
  return(row)

In [None]:
def chapRow(chap, struc, chapCount,stucChapParaNext):
  row = []
  row.append(chapCount)
  row.append(chap[0])
  row.append(chap[1])
  row.append(len(stucChapParaNext[struc][chap]))
  return(row)

In [None]:
def paraRow(para,paraCount):
  row = []
  row.append(paraCount)
  row.append(para[0])
  row.append(para[1])
  return(row)

In [None]:
def allText(struc, chap, para, text):
  row = []
  row.append(text[struc[0]])
  row.append(text[chap[0]])
  row.append(" ".join(text[para[0]:para[1]+1]).strip())
  return(row)

In [None]:
def makeHeader():
    return(["sNumber","sStart", "sStop", "sChildrenCt", "cNumber", "cStart", "cStop", "cChildrenCt", 
           "paraNum" ,"paraStart", "paraStop", "strucText", "chapText", "paraText"])

In [None]:
def prepCSV(stucChapParaNext,text):
  result = []
  strucCount,chapCount,paraCount = 0,0,0
  for struc in stucChapParaNext.keys():
    strucCount += 1
    for chap in stucChapParaNext[struc].keys():
      chapCount += 1
      for para in stucChapParaNext[struc][chap]:
        paraCount += 1
        row = []
        row += strucRow(struc, strucCount, stucChapParaNext)
        row += chapRow(chap,struc, chapCount, stucChapParaNext)
        row += paraRow(para, paraCount)
        row += allText(struc, chap, para, text)
        result.append(row)
      paraCount = 0
  result.insert(0, makeHeader())
  return(result)

In [None]:
#Example 1
prepCSV(findDiv(middlemarch),middlemarch)

In [None]:
#Example 2
prepCSV(findDiv(frankenstein),frankenstein)

In [None]:
#Example 3
prepCSV(findDiv(janeEyre),janeEyre)

## Dialogue Metrics (Basic)

This function provides basic dialogue metrics as long as a paragraph features at least one start and end quotation mark. It works by going through each paragraph, counting the quotes, and determing if a quote character starts or ends a paragraph.


In [None]:
def addDialogueMetricsBasic(makeCSVL):
    result = makeCSVL
    result[0] += ["quoteCountStart", "quoteCountEnd", "quoteStart", "quoteEnd"]
    for x in result[1:]:
        paraText = x[-1]
        row = []
        row.append(len([i for i in paraText if i == quotes[0]]))
        row.append(len([i for i in paraText if i == quotes[1]]))
        row.append(paraText.startswith(quotes[0]))
        row.append(paraText.endswith(quotes[1]))
        if row[0] > 0 and row[1]>0:
            x += row
        else:
            x += ["N/A"]*4
    return(result)

In [None]:
#Example 1
janeEyreCSV = prepCSV(findDiv(janeEyre),janeEyre)
janeEyreCSV_D = addDialogueMetricsBasic(janeEyreCSV)
print(janeEyreCSV_D)

In [None]:
#Example 2
maryBartonCSV = prepCSV(findDiv(maryBarton),maryBarton)
maryBartonCSV_D = addDialogueMetricsBasic(maryBartonCSV)
print(maryBartonCSV_D)

In [None]:
#Example 3
middlemarchCSV = prepCSV(findDiv(middlemarch),middlemarch)
middlemarchCSV_D = addDialogueMetricsBasic(middlemarchCSV)
print(middlemarchCSV_D)

In [None]:
middlemarchCSV_D[0][-5:]

## Dialogue Metrics (Advanced)

This set of functions builds on the previous Dialogue Metrics to parse dialogue in quotes from nearby descriptions.

The functions are divided in two parts: Processors and Labelers.

### Processors

**processPair** and **processAntiPairs** returns lists of lists that provide indexes for where a quote starts and stops and where description starts and stops respectively. For example, processPair("'I am hungry for a sandwich,' announced Hester.") would return [[0:25]] while processAntiPairs would return [[26:43]]

**processQuote** would take a paragraph like "'I am hungry for a sandwich,' announced Hester, 'and you will make it for me'" and return "'I am hungry for a sandwich and you will make it for me'"

**processInterjection** uses all three functions about to return three strings and one integery: a white-space concatenated quote returned from **process quote**, a "\<SEP>" concatenated quote from **processPair**, a "\<SEP>" concatenated description from **processAntiPairs**, and a count of the "\<SEP>" tens in the previous string.

### Labelers

The labeler functions are used to apply rules and labels to specific combinations of quote counts and positions. 

**advancedMetricsSwitcher** looks at whether a paragraph with a quote starts, ends, both starts and ends, or doesn't start with quote characters. It then routes the paragraph to the appropriate function.

**advancedMetricsStart, advancedMetricsEnd, advancedMetricsBoth,** and **advancedMetricsNeither** all use the **processInterjection** function if a paragraph contains more than one start and end quote characters. They apply unique rules, however, if the paragraph has exactly one pair of quotes. For instance, the paragraph "'She raced to the car'" would be an example of **advancedMetricsBoth** and would not require the quote be split from description. Conversely, the paragraph "'She raced to the car,' I told the officer.'" would be an example of **advanceMetricsStart** and would need a rule to split after the end quotation character.

### Other

**addDialogueAdvanced** is used exclusively for adding onto the prepCSV file.

[Note: the Labeler functions need to be modified to work with text that uses straight quotes]


### Processors

In [None]:
def processPair(paraText):
  result =[]
  resultPair=[]
  count = 0
  for i in range(len(paraText)):
    if paraText[i] == quotes[0]:
      count += 1
      resultPair.append(i)
    if paraText[i] == quotes[1]:
      count += 1
      resultPair.append(i)
    if count == 2:
      result.append(resultPair)
      resultPair = []
      count = 0
  return(result)

In [None]:
def processAntiPairs(pairs, paraText):
  result = []
  start = 0
  for i in pairs:
    antiPair = [start, i[0]-1]
    result.append(antiPair)
    start = i[1]+1
  result.append([start, len(paraText)-1])
  if pairs[0][0] == result[0][0]:
      result = result[1:]
  if result[-1][1] - result[-1][0] <3:
      result = result[:-1]
  return(result)

In [None]:
def processQuote(quoteString):
    result =quoteString.replace(quotes[0], "").replace(quotes[1], "")
    result = result.strip()
    result = quotes[0] + result + quotes[1]
    return(result)

In [None]:
def processInterjection(row, paraText):
    quotePairs = processPair(paraText)
    descriptionPairs = processAntiPairs(quotePairs, paraText)
    quoteJoined = processQuote(" ".join([paraText[pair[0]:pair[1]+1] for pair in quotePairs]))
    quoteSep = " <SEP> ".join([paraText[pair[0]:pair[1]+1].strip() for pair in quotePairs])
    descriptionSep = " <SEP> ".join([paraText[pair[0]:pair[1]+1].strip() for pair in descriptionPairs])
    return([quoteJoined, quoteSep, descriptionSep, len(descriptionPairs)])

### Labelers

In [None]:
def advancedMetricsBoth(row, paraText):
  if row[0] == 1 and row[1] == 1:
    return(["bothVolley", paraText, "N/A", "N/A", 0])
  else:
    dialogueDescription = processInterjection(row, paraText)
    return(["bothMidInterjection", dialogueDescription[0], dialogueDescription[1],dialogueDescription[2], dialogueDescription[3]])

In [None]:
def advancedMetricsStart(row, paraText):
  if row[0] == 1 and row[1] == 1:
    sep = paraText.index(quotes[1])
    return(["startEngine", paraText[:sep+1].strip(), "N/A", paraText[sep+1:].strip(), 0])
  else:
    dialogueDescription = processInterjection(row, paraText)
    return(["startInterjection", dialogueDescription[0], dialogueDescription[1],dialogueDescription[2], dialogueDescription[3]])

In [None]:
def advancedMetricsEnd(row, paraText):
  if row[0] == 1 and row[1] == 1:
    sep = paraText.index(quotes[0])
    return(["endCaboose", paraText[sep:], "N/A", paraText[:sep].strip(), 0])
  else:
    dialogueDescription = processInterjection(row, paraText)
    return(["endInterjection", dialogueDescription[0], dialogueDescription[1],dialogueDescription[2], dialogueDescription[3]])

In [None]:
def advancedMetricsNeither(row, paraText):
  if row[0] == 1 and row[1] == 1:
    start = paraText.index(quotes[0])
    end = paraText.index(quotes[1])
    return(["neitherSolo", paraText[start:end+1], "N/A", paraText[:start].strip() + " " + paraText[end+1:].strip(), 0])
  else:
    dialogueDescription = processInterjection(row, paraText)
    return(["neitherInterjection", dialogueDescription[0], dialogueDescription[1],dialogueDescription[2], dialogueDescription[3]])

In [None]:
def advancedMetricsSwitcher(row, paraText):
    
  if row[2] == True and row[3] == True:
    return(advancedMetricsBoth(row, paraText))
  elif row[2] == True and row[3] == False:
    return(advancedMetricsStart(row, paraText))
  elif row[2] == False and row[3] == True:
    return(advancedMetricsEnd(row, paraText)) 
  else:
    return(advancedMetricsNeither(row, paraText))

In [None]:
def addDialogueMetricsAdvanced(makeCSVL_D):
    result = makeCSVL_D
    result[0] += ["dLabel","dQuoteConcat", "dQuoteSep", "dDescriptionSep", "dDescriptionCt"]
    for x in result[1:]:
        row = x[-4:]
        if row[0]!="N/A":
            paraText = x[-5]
            x += advancedMetricsSwitcher(row,paraText)
        else:
            x+= ["N/A"]*4
    return(result)

In [None]:
#Example 1
quotes = checkQuoteConvention(middlemarch)
middlemarchCSV = prepCSV(findDiv(middlemarch),middlemarch)
middlemarchCSV_D = addDialogueMetricsBasic(middlemarchCSV)
middlemarchCSV_DA = addDialogueMetricsAdvanced(middlemarchCSV_D)
print(middlemarchCSV_DA)

In [None]:
#Example 2
quotes = checkQuoteConvention(frankenstein)
frankensteinCSV = prepCSV(findDiv(frankenstein),frankenstein)
frankensteinCSV_D = addDialogueMetricsBasic(frankensteinCSV)
frankensteinCSV_DA = addDialogueMetricsAdvanced(frankensteinCSV_D)
print(frankensteinCSV_DA)


In [None]:
#Example 3
quotes = checkQuoteConvention(janeEyre)
janeEyreCSV = prepCSV(findDiv(janeEyre),janeEyre)
janeEyreCSV_D = addDialogueMetricsBasic(janeEyreCSV)
janeEyreCSV_DA = addDialogueMetricsAdvanced(janeEyreCSV_D)
print(janeEyreCSV_DA)

## Dialogue Metrics (spaCy)

This set of funciton applies POS Tagging and Dependency Parsing to each set of descriptions.

**prepareRows** creates a list of 3-item large lists to house NounSubject-Root-DirectObject triads.

**checkSubject,checkRoot,** and **checkObject** use spaCy's parsers to check if a token qualifies as a subject, root, or object, respectively.

**spacyMetrics** splits each description by its seperator token and then checks each portion for a NounSubject-Root-DirectObject triad.

**makeSpacyHeader** looks at the max number of triads in a document and creates enough column titles to accomodate that max.

**remediateCSV** adds "N/A" placeholders for all rows that do not meet the max number of triads in the document.

**addDialogueMetricsSpacy** puts it all together!




In [None]:
def prepareRows(lenSep):
  result = [["N/A"]*3]
  for i in range(lenSep):
    result.append(["N/A"]*3)
  return(result)

In [None]:
def checkSubject(nlpToken):
  if nlpToken.dep_ == "nsubj":
    if nlpToken.pos_ == "PROPN":
      return(nlpToken.text)
    elif nlpToken.pos_ == "PRON":
      return(nlpToken.text)
  else:
    return("N/A")

In [None]:
def checkRoot(nlpToken):
  if nlpToken.dep_ == "ROOT":
    return(nlpToken.text)
  else:
    return("N/A")

In [None]:
def checkObject(nlpToken):
  if nlpToken.dep_ == "dobj":
    if nlpToken.pos_ == "PROPN":
      return(nlpToken.text)
    elif nlpToken.pos_ == "PRON":
      return(nlpToken.text)
  else:
    return("N/A")

In [None]:
def spacyMetrics(description):
  splitDesc = description.split("<SEP>")
  result = prepareRows(len(description))
  count = 0
  for desc in splitDesc:
    doc = nlp(desc)
    for i in doc:
      if result[count][0] == "N/A":
        result[count][0] = checkSubject(i)
      if result[count][1] == "N/A":
        result[count][1] = checkRoot(i)
      if result[count][2] == "N/A":
        result[count][2] = checkObject(i) 
    count+=1
  result = [i for i in result if i != ["N/A"]*3]
  return(result) 

In [None]:
def addDialogueMetricsSpacy(makeCSVL_DA):
    result = makeCSVL_DA
    maxCount = 0
    for x in result[1:]:
        if x[-2]!="N/A":
            description = x[-2]
            sDescription = spacyMetrics(description)
            if len(sDescription) > maxCount:
                maxCount =len(sDescription)
            for dobSet in sDescription:
                x += dobSet
    result[0]+=makeSpacyHeader(maxCount)
    result = remediateCSV(result)  
    return(result)

In [None]:
def makeSpacyHeader(maxCount):
    result = []
    for i in range(maxCount):
        result.append("Subj{0}".format(str(i)))
        result.append("Root{0}".format(str(i)))
        result.append("Obj{0}".format(str(i)))
    return(result)

In [None]:
def remediateCSV(result):
    maxLen = len(result[0])
    for x in result[1:]:
        x += ["N/A"] * (maxLen - len(x))
    return(result)

In [None]:
#Example1
quotes = checkQuoteConvention(middlemarch)
middlemarchCSV = prepCSV(findDiv(middlemarch),middlemarch)
middlemarchCSV_D = addDialogueMetricsBasic(middlemarchCSV)
middlemarchCSV_DA = addDialogueMetricsAdvanced(middlemarchCSV_D)
middlemarchCSV_DAS = addDialogueMetricsSpacy(middlemarchCSV_DA)
with open("middlemarchLabels.tsv", 'w') as f:
    write = csv.writer(f,delimiter='\t')
    write.writerows(middlemarchCSV_DAS)

In [None]:
#Example2
quotes = checkQuoteConvention(janeEyre)
janeEyreCSV = prepCSV(findDiv(janeEyre),janeEyre)
janeEyreCSV_D = addDialogueMetricsBasic(janeEyreCSV)
janeEyreCSV_DA = addDialogueMetricsAdvanced(janeEyreCSV_D)
janeEyreCSV_DAS = addDialogueMetricsSpacy(janeEyreCSV_DA)
with open("janeEyreLabels.tsv", 'w') as f:
    write = csv.writer(f,delimiter='\t')
    write.writerows(janeEyreCSV_DAS)

In [None]:
#Example3
quotes = checkQuoteConvention(maryBarton)
maryBartonCSV = prepCSV(findDiv(maryBarton),maryBarton)
maryBartonCSV_D = addDialogueMetricsBasic(maryBartonCSV)
maryBartonCSV_DA = addDialogueMetricsAdvanced(maryBartonCSV_D)
maryBartonCSV_DAS = addDialogueMetricsSpacy(maryBartonCSV_DA)
with open("maryBartonLabels.tsv", 'w') as f:
    write = csv.writer(f,delimiter='\t')
    write.writerows(maryBartonCSV_DAS)

In [None]:
#Example4
quotes = checkQuoteConvention(dracula)
draculaCSV = prepCSV(findDiv(dracula),dracula)
draculaCSV_D = addDialogueMetricsBasic(draculaCSV)
draculaCSV_DA = addDialogueMetricsAdvanced(draculaCSV_D)
draculaCSV_DAS = addDialogueMetricsSpacy(draculaCSV_DA)
with open("draculaLabels.tsv", 'w') as f:
    write = csv.writer(f,delimiter='\t')
    write.writerows(draculaCSV_DAS)

In [None]:
#Example5
quotes = checkQuoteConvention(frankenstein)
frankensteinCSV = prepCSV(findDiv(frankenstein),frankenstein)
frankensteinCSV_D = addDialogueMetricsBasic(frankensteinCSV)
frankensteinCSV_DA = addDialogueMetricsAdvanced(frankensteinCSV_D)
frankensteinCSV_DAS = addDialogueMetricsSpacy(frankensteinCSV_DA)
with open("frankensteinLabels.tsv", 'w') as f:
    write = csv.writer(f,delimiter='\t')
    write.writerows(frankensteinCSV_DAS)