# Assessing Political Directives of the Republican and Democratic Parties

### What are the primary differences in the Democratic and Republican party's national platforms? Have these policy priorities endured over time?

Katherine Poole   
COMM 313 Final Project

<img src="img/FIGHT.png" width="600"/>

#### Background

American politics are an ideological battle field and have only become more bitter in their division as time progresses. Since the partisan realignment following the Civil War and reconstruction, the Republicans have come to be known as the party which champions traditional values, freedom, and American identity, whereas the Democrats have amassed a message primarily focused on tolerance, equality, and care for one another. These values have appear to have endured as central tenets of the American two-party system since its inception, underscoring the inherent differences between the parties as they approach nearly all policy issues. As a third-party viewer, we are inundated with Republican and Democratic party differences daily via fervent media reporting and analyses, which tend to focus on the same current talking points. However, much of the public fails to find information beyond media portrayal, which has resulted in what can be considered homogenizing message and compiling of the parties behind the media's chosen portrayal. This analysis aims to extend understanding of inter-party differences beyond what we are fed in the media as well as strive to understand if these differences have endured the test of time. What exactly is it that makes the Democrats and Republicans so different?

#### Guiding Question
*What policy opinions and directives have resulted in the hostile partisan divide between the Democrats and the Republicans? Are these differences consistent and enduring?*

#### The Corpus

The corpus that I have collected for this analysis includes both the Democratic and Republican nominee speeches from the Presidential Nomination Convention. All speeches were found at the University of California Santa Barbara's Presidential Library archives (https://www.presidency.ucsb.edu/documents/presidential-documents-archive-guidebook/nomination-acceptance-speeches). The earliest Democratic speech is from 1916 (Woodrow Wilson) and the earliest Republican nominee speech is from 1864 (Abraham Lincoln). I only collected and analyze the speeches after 1916 in order to normalize the timelines. Additionally, the Democratic and Republican parties realigned during reconstruction - early 1900's, and so 1916 is a better gauge for modern beliefs of the parties.

This analysis will be split into four distinct parts:   


**1. Descriptive analysis**  
**2. n-Gram Frequency Analyses**  
**3. Keyness Analysis**  
**4. KWIC and Sentiment Analysis**

These choices of analyses were made in order to broadly understand the corpuses and then delve deeper into their hidden occupants. Let's begin!

## Setup & Load Data

In [4]:
import os
import re
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from nltk import tokenize
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [5]:
%%capture

%run descriptive_analysis.ipynb
%run keyness_analysis.ipynb
%run KWIC&sentiment_analysis.ipynb
%run n-gram_frequency_analysis.ipynb
%run functions.ipynb


In [6]:
sid = SentimentIntensityAnalyzer()

## Democrats v. Republicans on a National Platform

### Descriptive Analysis

The two corpuses that are to be compared in the proceeding analysis each contain 23 speeches spanning the years from 1916 - 2016. The speeches were all made on a national dimension, with an intended audience of the american people at large, however more specifically geared towards each individual party's members. Let us understand some basics about the two corpuses. 

In [7]:
print(sorted(repub_all))
print('\n')
print('{}\t{}\t{}'.format("repub_all", len(repub_all), "speeches"))
print('\n')
print('{}\t{}\t{}'.format("all_speeches_r", len(all_speeches_r), "words"))

['1916_Hughes_R.txt', '1920_Harding_R.txt', '1932_Hoover_R.txt', '1940_Willkie_R.txt', '1944_Dewey_R.txt', '1948_Dewey_R.txt', '1952_Eisenhower_R.txt', '1956_Eisenhower_R.txt', '1960_Nixon_R.txt', '1964_Goldwater_R.txt', '1968_Nixon_R.txt', '1972_Nixon_R.txt', '1976_Ford_R.txt', '1980_Reagan_R.txt', '1984_Reagan_R.txt', '1988_Bush_R.txt', '1992_Bush_R.txt', '1996_Dole_R.txt', '2000_W_Bush_R.txt', '2004_W_Bush_R.txt', '2008_Mccain_R.txt', '2012_Romney_R.txt', '2016_Trump_R.txt']


repub_all	23	speeches


all_speeches_r	560637	words


In [8]:
print(sorted(dem_all))
print('\n')
print('{}\t{}\t{}'.format("dem_all", len(dem_all), "speeches"))
print('\n')
print('{}\t{}\t{}'.format("all_speeches_d", len(all_speeches_d), "words"))

['1916_Wilson_D.txt', '1928_Smith_D.txt', '1932_Roosevelt_D.txt', '1936_Roosevelt_D.txt', '1944_Roosevelt_D.txt', '1948_Truman_D.txt', '1952_Stevenson_D.txt', '1956_Stevenson_D.txt', '1960_Kennedy_D.txt', '1964_Johnson_D.txt', '1968_Humphrey_D.txt', '1972_Mcgovern_D.txt', '1976_Carter_D.txt', '1980_Carter_D.txt', '1984_Mondale_D.txt', '1988_Dukakis_D.txt', '1992_B_Clinton_D.txt', '1996_B_Clinton_D.txt', '2000_Gore_D.txt', '2004_Kerry_D.txt', '2008_Obama_D.txt', '2012_Obama_D.txt', '2016_H_Clinton_D.txt']


dem_all	23	speeches


all_speeches_d	512144	words


Despite sharing the same speech dimensions, it appears that the Republican's boast quite a few more words than the Democrats. Are the Republicans more eager to share their policy agendas? Or are the Democrats politicians of few words? Let us take a deeper look into the corpuses to understand what is happening under the surface.

### n-Gram Frequency Distributions

In [9]:
word_freq_r.most_common(30), word_freq_d.most_common(30)

([('the', 5671),
  ('and', 3705),
  ('of', 3595),
  ('to', 3145),
  ('in', 2077),
  ('a', 1787),
  ('we', 1535),
  ('that', 1530),
  ('i', 1472),
  ('our', 1315),
  ('is', 1157),
  ('for', 1128),
  ('it', 899),
  ('have', 846),
  ('this', 809),
  ('will', 802),
  ('be', 640),
  ('not', 638),
  ('are', 595),
  ('you', 545),
  ('with', 538),
  ('but', 513),
  ('on', 502),
  ('my', 488),
  ('as', 465),
  ('america', 457),
  ('their', 424),
  ('all', 418),
  ('by', 416),
  ('people', 396)],
 [('the', 5284),
  ('and', 3622),
  ('of', 3144),
  ('to', 2881),
  ('a', 1721),
  ('in', 1703),
  ('that', 1497),
  ('we', 1365),
  ('i', 1311),
  ('for', 1128),
  ('our', 1128),
  ('is', 972),
  ('it', 770),
  ('have', 764),
  ('this', 651),
  ('you', 634),
  ('will', 591),
  ('not', 575),
  ('are', 569),
  ('be', 545),
  ('with', 509),
  ('as', 501),
  ('but', 488),
  ('on', 474),
  ('all', 466),
  ('they', 453),
  ('who', 423),
  ('my', 418),
  ('by', 418),
  ('people', 390)])

The Republicans and the Democrats appear to share many similarities in terms of their most commonly used words. Let us look at bigrams and trigrams in order to assess if there is more noticeable a difference between corpuses there.

In [10]:
bigram_freq_r.most_common(30), bigram_freq_d.most_common(30)

([('of the', 728),
  ('in the', 507),
  ('to the', 346),
  ('and the', 239),
  ('of our', 218),
  ('for the', 217),
  ('the world', 216),
  ('it is', 215),
  ('we have', 192),
  ('and i', 183),
  ('we will', 172),
  ('we are', 163),
  ('on the', 148),
  ('the american', 147),
  ('i have', 146),
  ('we must', 146),
  ('is the', 140),
  ('will be', 136),
  ('the united', 135),
  ('united states', 133),
  ('to be', 130),
  ('in this', 128),
  ('in our', 119),
  ('that we', 116),
  ('i am', 115),
  ('and we', 112),
  ('i will', 111),
  ('that the', 109),
  ('in a', 107),
  ('is a', 102)],
 [('of the', 660),
  ('in the', 457),
  ('to the', 286),
  ('and the', 242),
  ('for the', 220),
  ('we have', 206),
  ('of our', 205),
  ('and i', 181),
  ('it is', 155),
  ('on the', 154),
  ('is the', 136),
  ('to be', 134),
  ('we are', 130),
  ('we will', 129),
  ('the people', 126),
  ('the world', 119),
  ('we must', 117),
  ('that the', 115),
  ('we can', 113),
  ('by the', 112),
  ('i will', 112)

In [11]:
trigram_freq_r.most_common(30), trigram_freq_d.most_common(30)

([('the united states', 131),
  ('in the world', 77),
  ('the american people', 76),
  ('of the united', 70),
  ('of the world', 43),
  ('president of the', 39),
  ('the republican party', 37),
  ('of the american', 34),
  ('my fellow americans', 32),
  ('are going to', 32),
  ('men and women', 31),
  ('it is the', 28),
  ('we are going', 27),
  ('i believe in', 26),
  ('there is no', 26),
  ('one of the', 26),
  ('united states of', 23),
  ('states of america', 23),
  ('i want to', 23),
  ('it is a', 22),
  ('in this country', 21),
  ('and i am', 21),
  ('and we will', 21),
  ('the federal government', 21),
  ('that we have', 21),
  ('the party of', 21),
  ('not going to', 21),
  ('and in the', 20),
  ('i believe that', 20),
  ("we're going to", 20)],
 [('the united states', 90),
  ('of the united', 64),
  ('the american people', 63),
  ('the democratic party', 47),
  ('i want to', 46),
  ("we're going to", 41),
  ('the people of', 40),
  ('in the world', 31),
  ('my fellow americans'

Both the bigram and trigram lists share remarkable similarities. Given the large scope of the corpuses, and their equivalent intentions and delivery settings, this occurrence is sound. Despite their difference in parties, politicians generally employ similar speaking patterns and repeat age-old phrases (e.g. "the united states", "my fellow americans", "the american people"), hence why the trigram distributions are so similar. Democratic presidential nominees and Republican presidential nominees are very similar in their goal, which is to become the President of the United States. And given their goal alignment and necessity to appeal to a broad audience (not only their party, but the entire American populace), they must employ similar speaking tactics and methods. Their differences emerge when it gets into their policy platforms and directives. Let's explore some keywords via keyness analysis to better understand.

### Keyness Analysis

Although examining frequencies of n-gram tokens did little to offer significant evidence detailing the differences between the Republican and Democrat corpuses, I hypothesize that a keyness analysis can do more to offer insight into the party's inherent differences due to the fact that keyness is able to distinguish key words rather than simply frequently repeated words (and bigrams and trigrams). Let's compare.

In [12]:
calculate_keyness(word_freq_d, word_freq_r)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
we've                    66        17        34.642
democratic               96        36        32.925
john                     42        8         27.935
family                   112       52        27.139
that's                   94        40        26.576
can                      368       268       24.069
—                        51        16        21.907
platform                 58        23        18.337
college                  40        12        18.052
working                  69        31        17.761
families                 87        44        17.745
go                       106       60        16.551
bridge                   27        6         16.055
kennedy                  27        6         16.055
care                     72        36        15.048
al                       26        6         15.002
afford                   24        5         14.988
you                      634       545       14.935
bil

The initial keyness analysis between corpuses offers multiple interesting results. Specifically, this output reports significant differences between the frequencies and keyness' of words which are traditionally aligned with the Democratic party and it's policy motives. The words in question include: **education**, **teachers**, **rights**, and **environment.**

Additionally, more frequent usage of family related words, such as **"mother"** and **"families"** is an interesting observation. Perhaps the Democrat's are more inclined to discuss their own families and family values?


In [13]:
calculate_keyness(bigram_freq_d,  bigram_freq_r)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
democratic party         49        9         33.386
the democratic           59        16        29.510
if you                   51        16        21.907
an america               32        7         19.263
can do                   44        14        18.598
we can                   113       63        18.348
i want                   83        42        16.910
by the                   112       65        16.357
you can                  37        12        15.295
we should                36        12        14.402
do it                    36        12        14.402
here tonight             26        8         11.414
for all                  70        39        11.386
health care              37        15        11.309
the old                  29        10        11.124
this country             77        45        11.031
people of                52        26        10.868
to make                  85        52        10.661
wha

Similar to the word frequency analysis, the bigram frequency analysis heavily features word pairs that have primarily become associated with the Democratic party and its values, and for many, have become the rallying cry for the Democratic party's appeal to the marginalized sects of our society. These bigrams are: **health care** and **middle class.**

Let us move on to KWIC analyses to delve deeper into the context surrounding the usage of these words.

### KWIC Analysis & Sentiment Scoring

The keyness analyses highlighted numerous key terms which were of interest and related to policy platforms and directives. Some of these that particularly sparked my interest include: 

    • "america"
    • "democrats"
    • "environment"
    
 Additionally, although the words **"abortion"** and **"immigrants"** were not specifically isolated within the keyness analysis, these are both hot button and polarized words in politics, and so I am going to include them in the proceeding analyses.  
    

#### "america"

In [14]:
print(america_kwic_r[:15])
print('\n')
print('Found {} instances of america'.format(len(america_kwic_r)))

[[['world', 'and', 'the', 'tragedy', 'of', 'disappointment', 'and', 'europes', 'misunderstanding', 'of'], 'america', ['easily', 'might', 'have', 'been', 'avoided', 'the', 'republicans', 'of', 'the', 'senate']], [['peace', 'maintained', 'one', 'may', 'readily', 'sense', 'the', 'conscience', 'of', 'our'], 'america', ['i', 'am', 'sure', 'i', 'understand', 'the', 'purpose', 'of', 'the', 'dominant']], [['to', 'defeat', 'a', 'world', 'aspiration', 'we', 'were', 'resolved', 'to', 'safeguard'], 'america', ['we', 'were', 'resolved', 'then', 'even', 'as', 'we', 'are', 'today', 'and']], [['the', 'referendum', 'to', 'the', 'american', 'people', 'on', 'the', 'preservation', 'of'], 'america', ['and', 'the', 'republican', 'party', 'pledges', 'its', 'defense', 'of', 'the', 'preserved']], [['of', 'national', 'freedom', 'in', 'the', 'call', 'of', 'the', 'conscience', 'of'], 'america', ['is', 'peace', 'peace', 'that', 'closes', 'the', 'gaping', 'wound', 'of', 'world']], [['of', 'its', 'example', 'possess

In [15]:
print(america_kwic_d[:15])
print('\n')
print('Found {} instances of america'.format(len(america_kwic_d)))

[[['and', 'not', 'enough', 'to', 'the', 'navy', 'the', 'other', 'republics', 'of'], 'america', ['distrusted', 'us', 'because', 'they', 'found', 'that', 'we', 'thought', 'first', 'of']], [['are', 'and', 'how', 'to', 'get', 'at', 'them', 'the', 'workingmen', 'of'], 'america', ['have', 'been', 'given', 'a', 'veritable', 'emancipation', 'by', 'the', 'legal', 'recognition']], [['the', 'shame', 'of', 'divisions', 'of', 'sentiment', 'and', 'purpose', 'in', 'which'], 'america', ['was', 'contemned', 'and', 'forgotten', 'it', 'is', 'part', 'of', 'the', 'business']], [['the', 'united', 'states', 'with', 'a', 'distressed', 'and', 'distracted', 'people', 'all'], 'america', ['looks', 'on', 'test', 'is', 'now', 'being', 'made', 'of', 'us', 'whether']], [['ours', 'depends', 'every', 'relationship', 'of', 'the', 'united', 'states', 'with', 'latin'], 'america', ['whether', 'in', 'politics', 'or', 'in', 'commerce', 'and', 'enterprise', 'these', 'are']], [['issues', 'of', 'the', 'politics', 'of', 'the', '

In [16]:
sum(dem_score)/len(dem_score), sum(rep_score)/len(rep_score)

(0.929173583806612, 1.0337364394198925)

Matching my intuition, "america" was found more times in the Republican corpus (468 instances) than the Democrats (373 instances). Typically, the Republicans have branded themselves as the party of the **AMERICAN** people, bald eagles and guns included. As such, they have chosen to include the term "america" more in their speeches to remain within their brand. Also, although the discrepancy is slight, Republican (1.03) use of "america" offers a mildly higher sentiment score than the Democrats (0.93).

#### "democrats"

In [17]:
print(democrats_kwic_r[:15])
print('\n')
print('Found {} instances of democrats'.format(len(democrats_kwic_r)))

[[['writers', 'not', 'to', 'mention', 'reformers', 'of', 'all', 'kinds', 'freesoilers', 'independent'], 'democrats', ['conscience', 'whigs', 'barnburners', 'soft', 'hunkers', 'teetotallers', 'vegetarians', 'and', 'transcendentalists', 'now']], [['infinitesimal', 'compared', 'to', 'the', 'gulf', 'between', 'us', 'and', 'what', 'the'], 'democrats', ['would', 'put', 'upon', 'us', 'from', 'what', 'they', 'did', 'in', 'los']], [['that', 'just', 'as', 'in', '1952', 'and', 'in', '1956', 'millions', 'of'], 'democrats', ['will', 'join', 'usnot', 'because', 'they', 'are', 'deserting', 'their', 'party', 'but']], [['world', 'is', 'involved', 'we', 'are', 'not', 'republicans', 'we', 'are', 'not'], 'democrats', ['we', 'are', 'americans', 'first', 'last', 'and', 'always', 'these', 'five', 'presidents']], [['convention', 'i', 'cant', 'tell', 'which', 'faces', 'are', 'republicans', 'which', 'are'], 'democrats', ['and', 'which', 'are', 'independents', 'i', 'cannot', 'see', 'their', 'color', 'or']], [['p

In [18]:
print(democrats_kwic_d[:15])
print('\n')
print('Found {} instances of democrats'.format(len(democrats_kwic_d)))

[[['action', 'must', 'be', 'weighed', 'against', 'destructive', 'comment', 'and', 'reaction', 'the'], 'democrats', ['either', 'have', 'or', 'have', 'not', 'understood', 'the', 'varied', 'interests', 'of']], [['in', 'the', 'record', 'what', 'is', 'that', 'record', 'what', 'were', 'the'], 'democrats', ['called', 'into', 'power', 'to', 'do', 'what', 'things', 'had', 'long', 'waited']], [['had', 'long', 'waited', 'to', 'be', 'done', 'and', 'how', 'did', 'the'], 'democrats', ['do', 'them', 'it', 'is', 'a', 'record', 'of', 'extraordinary', 'length', 'and']], [['here', 'and', 'now', 'in', 'equal', 'measure', 'i', 'warn', 'those', 'nominal'], 'democrats', ['who', 'squint', 'at', 'the', 'future', 'with', 'their', 'faces', 'turned', 'toward']], [['the', '18th', 'amendment', 'is', 'doomed', 'when', 'that', 'happens', 'we', 'as'], 'democrats', ['must', 'and', 'will', 'rightly', 'and', 'morally', 'enable', 'the', 'states', 'to']], [['of', 'the', 'united', 'states', 'have', 'transcended', 'party', '

In [19]:
sum(dem_score_dems)/len(dem_score_dems), sum(rep_score_dems)/len(rep_score_dems)

(0.9266304347826086, 0.15196078431372548)

"democrats" mentioned by democrats not only occurred much more frequently, but it was done so in a much more contextually positive manner. While the Democrats use their own party's name to refer to themselves in a positive light, the Republicans do so more disparagingly. This aligns with "playing politics" that Presidential nominees presumably do at their respective conventions in order to progress past the primary intra-party competition mindset and broaden the scope to inter-party battle.

#### "environment"

In [20]:
print(environment_kwic_r[:15])
print('\n')
print('Found {} instances of environment'.format(len(environment_kwic_r)))

[[['strive', 'to', 'cure', 'disease', 'subdue', 'and', 'make', 'fruitful', 'our', 'natural'], 'environment', ['and', 'produce', 'the', 'inventive', 'engines', 'of', 'production', 'science', 'and', 'technology']], [['knows', 'itself', 'a', 'generation', 'determined', 'to', 'preserve', 'its', 'ideals', 'its'], 'environment', ['our', 'nation', 'and', 'the', 'world', 'my', 'fellow', 'americans', 'i', 'like']], [['will', 'not', 'permit', 'the', 'safety', 'of', 'our', 'people', 'or', 'our'], 'environment', ['heritage', 'to', 'be', 'jeopardized', 'but', 'we', 'are', 'going', 'to', 'reaffirm']], [['prosperity', 'of', 'our', 'people', 'is', 'a', 'fundamental', 'part', 'of', 'our'], 'environment', ['our', 'problems', 'are', 'both', 'acute', 'and', 'chronic', 'yet', 'all', 'we']], [['by', 'race', 'and', 'color', 'has', 'made', 'america', 'a', 'more', 'dangerous'], 'environment', ['for', 'everyone', 'than', 'frankly', 'i', 'have', 'ever', 'seen', 'and', 'anybody']]]


Found 5 instances of environm

In [21]:
print(environment_kwic_d[:15])
print('\n')
print('Found {} instances of environment'.format(len(environment_kwic_d)))

[[['has', 'reconciled', 'its', 'economic', 'needs', 'with', 'its', 'desire', 'for', 'an'], 'environment', ['that', 'we', 'can', 'pass', 'on', 'with', 'pride', 'to', 'the', 'next']], [['employment', 'laws', 'safety', 'in', 'the', 'work', 'place', 'and', 'a', 'healthy'], 'environment', ['lately', 'as', 'you', 'know', 'the', 'republicans', 'have', 'been', 'quoting', 'democratic']], [['civil', 'rights', 'laws', 'you', 'did', 'not', 'vote', 'to', 'poison', 'the'], 'environment', ['you', 'did', 'not', 'vote', 'to', 'assault', 'the', 'poor', 'the', 'sick']], [['and', 'a', 'dedicated', 'staff', 'of', 'teachers', 'and', 'counselors', 'create', 'an'], 'environment', ['for', 'learning', 'at', 'the', 'george', 'washington', 'preparatory', 'high', 'school', 'in']], [['i', 'will', 'he', 'wont', 'take', 'the', 'lead', 'in', 'protecting', 'the'], 'environment', ['and', 'creating', 'new', 'jobs', 'in', 'environmental', 'technologies', 'for', 'the', '21st']], [['of', 'the', 'global', 'effort', 'to', 'pr

In [22]:
sum(dem_score_env)/len(dem_score_env), sum(rep_score_env)/len(rep_score_env)

(0.8511764705882353, 0.14666666666666658)

The Democratic nominee's use the word "environment" much more frequently than the Republicans, and also in a much more positive context. It appears that Republican use of "environment" is done so in reference to one's contextual environment, whereas Democrats tend to employ the word to refer to the natural, outdoor environment and resources. This difference aligns with the fact that the Democrats are traditionally seen as the party charged with environmental awareness and protection, leading them to a more frequent and positive use of the word in a outdoors context.

#### "abortion"

In [23]:
print(abortion_kwic_r[:15])
print('\n')
print('Found {} instances of abortion'.format(len(abortion_kwic_r)))

[[['you', 'see', 'we', 'must', 'change', 'weve', 'got', 'to', 'change', 'from'], 'abortion', ['to', 'adoption', 'and', 'let', 'me', 'tell', 'you', 'this', 'barbara', 'and']], [['opinions', 'of', 'millions', 'of', 'americans', 'is', 'crime', 'and', 'drugs', 'illegitimacy'], 'abortion', ['the', 'abdication', 'of', 'duty', 'and', 'the', 'abandonment', 'of', 'children', 'and']], [['notification', 'and', 'when', 'congress', 'sends', 'me', 'a', 'bill', 'against', 'partialbirth'], 'abortion', ['i', 'will', 'sign', 'it', 'into', 'law', 'applause', 'behind', 'every', 'goal']]]


Found 3 instances of abortion


In [24]:
print(abortion_kwic_d[:15])
print('\n')
print('Found {} instances of abortion'.format(len(abortion_kwic_d)))

[[['individual', 'conscience', 'of', 'every', 'american', 'on', 'the', 'painful', 'issue', 'of'], 'abortion', ['but', 'believe', 'as', 'a', 'matter', 'of', 'law', 'that', 'this', 'decision']], [['a', 'woman', 'her', 'conscience', 'her', 'doctor', 'and', 'her', 'god', 'but'], 'abortion', ['should', 'not', 'only', 'be', 'safe', 'and', 'legal', 'it', 'should', 'be']], [['what', 'we', 'have', 'to', 'restore', 'we', 'may', 'not', 'agree', 'on'], 'abortion', ['but', 'surely', 'we', 'can', 'agree', 'on', 'reducing', 'the', 'number', 'of']]]


Found 3 instances of abortion


In [25]:
sum(dem_score_abort)/len(dem_score_abort), sum(rep_score_abort)/len(rep_score_abort)

(0.5155555555555555, -0.32500000000000007)

The word "abortion" is an incredibly divisive factor in American politics. Both surprisingly and unsurprisingly, the word is used fairly few times in each corpus: only 3 instances each. I presume that this comes as a result of the intensely polarizing nature inherent to the word – even some die-hard Democrats are still uncomfortable endorsing the pro-choice movement despite the Democratic party's alignment with it. As a result, given the national audience of the nomination speeches, I am sure that the speakers are playing politics and refraining from mentioning the word more than absolutely necessary and are more so choosing to focus their attention on policy platforms which are able to appeal to the masses. It is important to note, however, the vast difference in contexts that the word "abortion" is used between the Democrats and the Republicans. One positive and progressive, and one incredibly negative and prohibitive.

#### "immigrants"

In [26]:
print(immigrants_kwic_r[:15])
print('\n')
print('Found {} instances of immigrants'.format(len(immigrants_kwic_r)))

[[['that', 'burned', 'with', 'zeal', 'in', 'the', 'hearts', 'of', 'millions', 'of'], 'immigrants', ['from', 'every', 'corner', 'of', 'the', 'earth', 'who', 'came', 'here', 'in']], [['immigration', 'and', 'yet', 'when', 'the', 'blood', 'of', 'the', 'sons', 'of'], 'immigrants', ['and', 'the', 'grandsons', 'of', 'slaves', 'fell', 'on', 'foreign', 'fields', 'it']], [['moms', 'struggling', 'to', 'feed', 'the', 'kids', 'and', 'pay', 'the', 'rent'], 'immigrants', ['starting', 'a', 'hard', 'life', 'in', 'a', 'new', 'world', 'children', 'without']], [['what', 'brought', 'us', 'to', 'america', 'we', 'are', 'a', 'nation', 'of'], 'immigrants', ['we', 'are', 'the', 'children', 'and', 'grandchildren', 'and', 'greatgrandchildren', 'of', 'the']], [['special', 'kinship', 'with', 'the', 'future', 'when', 'every', 'new', 'wave', 'of'], 'immigrants', ['looked', 'up', 'and', 'saw', 'the', 'statue', 'of', 'liberty', 'or', 'knelt']], [['percent', 'compared', 'to', 'this', 'point', 'last', 'year', 'nearly', '

In [27]:
print(immigrants_kwic_d[:15])
print('\n')
print('Found {} instances of immigrants'.format(len(immigrants_kwic_d)))

[[['the', 'opportunities', 'offered', 'by', 'america', 'the', 'rugged', 'qualities', 'of', 'our'], 'immigrants', ['have', 'helped', 'to', 'develop', 'our', 'country', 'and', 'their', 'children', 'have']], [['i', 'can', 'assure', 'you', '—', 'tonight', 'as', 'a', 'son', 'of'], 'immigrants', ['with', 'a', 'wonderful', 'wife', 'and', 'now', 'with', 'lisa', 'our', 'lovely']], [['that', 'you', 'make', 'to', 'yours', 'a', 'promise', 'that', 'has', 'led'], 'immigrants', ['to', 'cross', 'oceans', 'and', 'pioneers', 'to', 'travel', 'west', 'a', 'promise']], [['more', 'than', 'are', 'welfare', 'recipients', 'or', 'corporations', 'or', 'unions', 'or'], 'immigrants', ['or', 'gays', 'or', 'any', 'other', 'group', 'were', 'told', 'to', 'blame']], [['and', 'well', 'build', 'a', 'path', 'to', 'citizenship', 'for', 'millions', 'of'], 'immigrants', ['who', 'are', 'already', 'contributing', 'to', 'our', 'economy', 'we', 'will', 'not']], [['jobs', 'i', 'believe', 'that', 'when', 'we', 'have', 'millions', 

In [28]:
sum(dem_score_imm)/len(dem_score_imm), sum(rep_score_imm)/len(rep_score_imm)

(0.9800000000000001, -0.6833333333333333)

Democrats have traditionally been the party that is more sympathetic towards the plight of immigrants, which results in its positive sentiment score. The Republican score for "immigrant" is shockingly low, and underscores their assumed perversion towards immigration and immigrants within our country. Although this phenomenon might seem like a recent development within the party, perhaps there is a deep rooted anti-immigrant sentiment housed by the party.

## Conclusion: Democrats v. Republicans

The Republican and Democratic parties often represent polar opposite views and priorities on problems which face our nation. Initially, I wanted to believe that these inherent ideological differences would be immediately reflected in the descriptive make-up of the corpuses, however the research indicated that there really are no broad significant differences. Through the frequency lists, we notice that there are nearly identical lines of words for tokens, bigrams, and trigrams.

Only when we delved deeper into the corpuses and approached the keyness and KWIC analyses did the significant differences become illuminated. Specifically, the keyness analyses were able to isolate significant words shared by the two parties, and proceeding KWIC analyses identified the context and contextual sentiment of usage of these words. The Democrat's traditional policy directives (environment, abortion, immigrants) were, as expected, featured more prominently and positively in the Democrat's speeches, while the Republicans referred to their political counterparts within a primarily negative context and leaned on their patriotic identity, referencing "america" an overwhelming amount. KWIC analyses provided a crucial tool to highlight the more minute differences found between the corpuses that were unable to emerge in the broader analyses.

Despite this analysis and blog's good intentions, it appears that there are in fact less apparent differences between the nomination convention speeches between the Democrats and the Republicans. This might come as a result of the intended audiences of the speeches, which are essentially the entire United States, which would mean the speakers are at least somewhat altering their speeches to cater to a broad audience with diverse interests. In order to move past this barrier and hopefully better understand party differences, a further analysis might find it beneficial to use more pointed speeches within the corpus in order to unveil politician's actual directive and motivations rather than those which are able to gratify a national audience.