### <A HREF="https://www.poynter.org/reporting-editing/2017/why-it-worked-a-rhetorical-analysis-of-obamas-speech-on-race-2/">Poynter article on the linguistic devices used in Obama's March 18, 2008 speech on American race relations, often referred to as A More Perfect Union</A>

### <A HREF="https://www.americanrhetoric.com/speeches/barackobamaperfectunion.htm">A More Perfect Union speech</A>

<B>Parallelism</B>
<P>
At the risk of calling to mind the worst memories of grammar class, I invoke the wisdom that parallel constructions help authors and orators make meaning memorable. To remember how parallelism works, think of equal terms to express equal ideas. So Dr. King dreamed that one day his four children "will not be judged by the color of their skin but by the content of their character." (By the content of their character is parallel to by the color of their skin.)</P>
<P>
Back to Obama: "This was one of the tasks we set forth at the beginning of this campaign — to continue the long march of those who came before us, a march for a more just, more equal, more free, more caring and more prosperous America." If you are counting, that's five parallel phrases among 43 words. </P>
<P>

And there are many more:</P>
<P>

 
"…we may not have come from the same place, but we all want to move in the same direction."</P>
<P>

 
"So when they are told to bus their children to a school across town; when they hear that an African America is getting an advantage in landing a good job or a spot in a good college because of an injustice that they themselves never committed; when they're told that their fears about crime in urban neighborhoods are somehow prejudiced, resentment builds over time."</P>
<P>

 
"…embracing the burdens of our past without becoming victims of our past."</P>
<P>
<I>Roy Peter Clark, October 20, 2017, Poynter<I>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import os
import unicodedata
import re
import stanza
from collections import defaultdict
from textblob import TextBlob
from graphviz import Source
import nltk
from nltk.parse.corenlp import CoreNLPParser
from nltk.parse.corenlp import CoreNLPDependencyParser
from nltk.tokenize import TreebankWordTokenizer
from nltk.tokenize import sent_tokenize
nltk.download('punkt')
word_token = TreebankWordTokenizer()

[nltk_data] Downloading package punkt to /home/muddy/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


### <A HREF="https://stanfordnlp.github.io/stanza/getting_started.html">Stanza quickstart guide</A>

In [None]:
# Using Stanford's CoreNLP parser with NLTK
# 1. Download CoreNLP from https://stanfordnlp.github.io/CoreNLP/download.html
# 2. make sure Java is installed, otherwise download and install Java - https://www.java.com/en/download/windows_manual.jsp
# 3. Unzip/extract CoreNLP zip file to a directory
# 4. Go to that directory and open a command terminal, and run the following command...
# 4b. on my laptop its in C:\Users\peter\stanford-corenlp-4.5.2
# 5. java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
# 6. Now for graphviz if you want to view the parse trees, download from https://graphviz.org/download/ then install
# 7. Now, can run the following python code

In [None]:
nlp = stanza.Pipeline(lang="en") # Initialize the default English pipeline

In [None]:
phrase1 = "will not be judged by the color of their skin but by the content of their character."
phrase2 = "we may not have come from the same place, but we all want to move in the same direction."
phrase3 = "embracing the burdens of our past without becoming victims of our past."
phrase4 = "That's one small step for man, one giant leap for mankind."
phrase5 = "We may not have come from the same place but we want to move in the same direction."

In [None]:
doc = nlp(phrase3)
print(*[f'id: {word.id}\tword: {word.text}\thead id: {word.head}\thead: {sent.words[word.head-1].text if word.head > 0 else "root"}\tdeprel: {word.deprel}' for sent in doc.sentences for word in sent.words], sep='\n')

In [None]:
os.environ["PATH"] += os.pathsep + 'C:/Program Files/Graphviz/bin/'

In [None]:
sdp = CoreNLPDependencyParser()
sentence = phrase4
result = list(sdp.raw_parse(sentence))
dep_tree_dot_repr = [parse for parse in result][0].to_dot()
source = Source(dep_tree_dot_repr, filename="dep_tree_p4_nopunc", format='png')
source.view()
# Opens in pop-under window... well isn't that nice!

In [None]:
# Graph image doesn't get saved, need to re-run the code
source

In [None]:
parse, = sdp.raw_parse(phrase4)
print(parse.tree())

In [None]:
for gov, dep, dependent in parse.triples():
    print(gov, dep, dependent)

In [None]:
parser = CoreNLPParser()
sent, = parser.parse_text(phrase4)
sent.pretty_print()

In [None]:
sent

In [None]:
#sent.pos()
#sent.productions()
#sent.pformat_latex_qtree() #compatible with LaTeX qtree package
#sent.height()
for level in range(sent.height()):
    print(sent[level])

In [None]:
listr = []
def iterate(tree):
    if len(tree) > 1:
        for i in range(len(tree)):
            iterate(tree[i])
    else:
        listr.append(tree)

In [None]:
iterate(sent[0])
listr

In [None]:
with open('./Data/barackobamaperfectunion.txt') as f:
    text = f.read()
sents = sent_tokenize(text)

In [None]:
def count_parallels(text):
    count = 0
    sents = sent_tokenize(text)
    for phrase in sents:
        try:
            sent, = parser.parse_text(phrase)
        except:
            #print('----- PARSE ERROR -----')
            #print(phrase)
            continue
        poss = []
        words = []
        for word in sent.pos():
            poss.append(word[1])
            words.append(word[0])
        #print(words)
        stop = False
        results = []
        for length in range(7,3,-1):
            length = min(length, len(words))
            for i in range(len(poss)-length+1):
                for j in range(len(poss)-length+1):
                    if abs(i-j) > length:
                        if poss[i:i+length]==poss[j:j+length]:
                            if length > 4 or (',' not in poss[i:i+length] and '``' not in poss[i:i+length]):
                                results.append([i,j,length])
                                count += 1
                                stop = True
                                break
                if stop: break
            if stop: break
        '''
        for result in results:
            print(words[result[0]:result[0]+result[2]])
            print(words[result[1]:result[1]+result[2]])
            print()
        '''
    return(len(sents), count)

In [None]:
oba = pd.read_csv('text_sentences_words.csv')
gwb = pd.read_csv('text_sentences_words_gwb.csv')
oba = oba.query('source == "oba"')
oba_text = oba[['text', 'date']]
oba_text['sent_count'] = 0
oba_text['parallel_count'] = 0
gwb_text = gwb[['text', 'date']]
gwb_text['sent_count'] = 0
gwb_text['parallel_count'] = 0

In [None]:
for i in range(len(gwb_text.text)):
    sent_count, parallel_count = count_parallels(gwb_text.text[i])
    gwb_text['sent_count'].iloc[i] = sent_count
    gwb_text['parallel_count'].iloc[i] = parallel_count

In [None]:
gwb_text['parallel_per_sent']=gwb_text.parallel_count/gwb_text.sent_count

In [None]:
#gwb_text.to_csv('parallelism_gwb.csv', index=False)

In [None]:
for i in range(len(oba_text.text)):
    sent_count, parallel_count = count_parallels(oba_text.text[i])
    oba_text['sent_count'].iloc[i] = sent_count
    oba_text['parallel_count'].iloc[i] = parallel_count
oba_text['parallel_per_sent']=oba_text.parallel_count/oba_text.sent_count

In [None]:
oba_text.to_csv('parallelism_oba.csv', index=False)

In [None]:
text=gwb_text.text[4]
sents = sent_tokenize(text)
count = 0
for phrase in sents:
    try:
        sent, = parser.parse_text(phrase)
    except:
        print('----- PARSE ERROR -----')
        print(phrase)
        continue
    poss = []
    words = []
    for word in sent.pos():
        poss.append(word[1])
        words.append(word[0])

    stop = False
    results = []
    for length in range(7,3,-1):
        length = min(length, len(words))
        for i in range(len(poss)-length+1):
            for j in range(len(poss)-length+1):
                if abs(i-j) > length:
                    if poss[i:i+length]==poss[j:j+length]:
                        if length > 4 or (',' not in poss[i:i+length] and '``' not in poss[i:i+length]):
                            results.append([i,j,length])
                            count += 1
                            stop = True
                            break
            if stop: break
        if stop: break
    
    for result in results:
        print(words[result[0]:result[0]+result[2]])
        print(words[result[1]:result[1]+result[2]])
        print()

In [None]:
### Add parallelism data to tidy data, maybe create an obama only tidy data

In [2]:
par_oba = pd.read_csv('parallelism_oba.csv')
par_gwb = pd.read_csv('parallelism_gwb.csv')
par_oba.drop(['text', 'sent_count'], axis=1, inplace=True)
par_gwb.drop(['text', 'sent_count'], axis=1, inplace=True)
tidy = pd.read_csv('tidy_data.csv')
tidy_gwb = pd.read_csv('tidy_data_gwb.csv')
tidy_oba = tidy.query('source == "oba"')
tidy_oba = pd.merge(tidy_oba, par_oba, how='left', on='date')
tidy_gwb = pd.merge(tidy_gwb, par_gwb, how='left', on='date')

In [3]:
set(tidy_gwb.columns) == set(tidy_oba.columns)

True

In [4]:
#tidy_oba.to_csv('tidy_data_oba.csv', index=False)
#tidy_gwb.to_csv('tidy_data_gwb.csv', index=False)