# U.S.A. Presidential Vocabulary

Whenever a United States of America president is elected or re-elected, an inauguration ceremony takes place to mark the beginning of the president’s term. During the ceremony, the president gives an [inaugural address](https://en.wikipedia.org/wiki/United_States_presidential_inauguration) to the nation, dictating the tone and focus of the next four years of leadership.

In this project you will have the chance to analyze the inaugural addresses of the presidents of the United States of America, as collected by the [Natural Language Toolkit](https://www.nltk.org/book/ch02.html), using word embeddings.

By training sets of word embeddings on subsets of inaugural address versus the collection of presidents as a whole, we can learn about the different ways in which the presidents use language to convey their agenda.

Let’s get started!




Theme:

Create word embeddings on the corpus of all the presidents’ speeches, we need to read the text data from each file, separate the files into sentences on a word by word basis, and then merge all the sentences across the speeches into one big list of lists.

process_speeches() function takes a list of strings as an argument and returns a list of lists. Each inner list represents one inaugural address and is a list of lists as well. Each inner list of the inaugural address list represents a sentence of that address, and each item in the sentence list is a word token in that sentence (see the hint for further explanation of this structure).

In order to build a custom set of word embeddings using gensim, we need to convert our data into a list of lists, where each inner list is a sentence and each item in the inner list is a word token. So we use merge_speeches

merge_speeches() takes a list of all our processed speeches and returns a list of lists where each inner list is a sentence and each item in the inner list is a word token.

In [2]:
# President Speech Analytics

import os
import gensim
import spacy
from president_helper import read_file, process_speeches, merge_speeches, get_president_sentences, get_presidents_sentences, most_frequent_words

# get list of all speech files
files = sorted([file for file in os.listdir() if file[-4:] == '.txt'])
print(files)


# read each speech file using list comprehension
speeches = [read_file(file) for file in files]

# preprocess each speech
processed_speeches = process_speeches(speeches)

# merge speeches
all_sentences = [merge_speeches(speech) for speech in processed_speeches]

# view most frequently used words
most_freq_words = most_frequent_words(all_sentences)
print(most_freq_words)

# create gensim model of all speeches
all_prez_embeddings = gensim.models.Word2Vec(all_sentences, size=96, window=5, min_count=1, workers=2, sg=1)


# view words similar to freedom
similar_to_freedom = all_prez_embeddings.most_similar("freedom", topn=20)
print("Total Similar to Freedom is\n", similar_to_freedom)

# get President Roosevelt sentences
roosevelt_sentences = get_president_sentences("franklin-d-roosevelt")

# view most frequently used words of Roosevelt
roosevelt_most_freq_words = most_frequent_words(roosevelt_sentences)
print(roosevelt_most_freq_words)

# create gensim model for Roosevelt
roosevelt_embeddings = gensim.models.Word2Vec(roosevelt_sentences, size=96, window=5, min_count=1, workers=2, sg=1)

# view words similar to freedom for Roosevelt
roosevelt_similar_to_freedom = roosevelt_embeddings.most_similar("freedom", topn=20)
print("roosevelt_similar_to_freedom is\n", roosevelt_similar_to_freedom)

# get sentences of multiple presidents
rushmore_prez_sentences = get_presidents_sentences(["washington","jefferson","lincoln","theodore-roosevelt"])

# view most frequently used words of presidents
rushmore_most_freq_words = most_frequent_words(rushmore_prez_sentences)
print(rushmore_most_freq_words)

# create gensim model for the presidents
rushmore_embeddings = gensim.models.Word2Vec(rushmore_prez_sentences, size=96, window=5, min_count=1, workers=2, sg=1)

# view words similar to freedom for presidents
rushmore_similar_to_freedom = rushmore_embeddings.wv.most_similar("freedom", topn=20)
print("rushmore_similar_to_freedom is\n", rushmore_similar_to_freedom)


['1789-Washington.txt', '1793-Washington.txt', '1797-John-Adams.txt', '1801-Jefferson.txt', '1805-Jefferson.txt', '1809-Madison.txt', '1813-Madison.txt', '1817-Monroe.txt', '1821-Monroe.txt', '1825-John-Q-Adams.txt', '1829-Jackson.txt', '1833-Jackson.txt', '1837-VanBuren.txt', '1841-William-Harrison.txt', '1845-Polk.txt', '1849-Taylor.txt', '1853-Pierce.txt', '1857-Buchanan.txt', '1861-Lincoln.txt', '1865-Lincoln.txt', '1869-Grant.txt', '1873-Grant.txt', '1877-Hayes.txt', '1881-Garfield.txt', '1885-Cleveland.txt', '1889-Benjamin-Harrison.txt', '1893-Cleveland.txt', '1897-McKinley.txt', '1901-McKinley.txt', '1905-Theodore-Roosevelt.txt', '1909-Taft.txt', '1913-Wilson.txt', '1917-Wilson.txt', '1921-Harding.txt', '1925-Coolidge.txt', '1929-Hoover.txt', '1933-Franklin-D-Roosevelt.txt', '1937-Franklin-D-Roosevelt.txt', '1941-Franklin-D-Roosevelt.txt', '1945-Franklin-D-Roosevelt.txt', '1949-Truman.txt', '1953-Eisenhower.txt', '1957-Eisenhower.txt', '1961-Kennedy.txt', '1965-Lyndon-Johnson.tx



Total Similar to Freedom is
 [('purposes', 0.9826847314834595), ('influence', 0.9814714193344116), ('institutions', 0.9814691543579102), ('matters', 0.9782785177230835), ('human', 0.9779976606369019), ('south', 0.9779378175735474), ('respect', 0.9775665998458862), ('dignity', 0.9771591424942017), ('officers', 0.9770955443382263), ('equality', 0.977080225944519), ('welfare', 0.9768416881561279), ('individual', 0.9767642617225647), ('domestic', 0.9765135049819946), ('defense', 0.9762694239616394), ('independence', 0.9762485027313232), ('benefits', 0.9762150049209595), ('peoples', 0.9761053323745728), ('maintenance', 0.9758332967758179), ('order', 0.9756816625595093), ('faith', 0.9753823280334473)]
[('the', 375), ('of', 321), ('and', 179), ('to', 158), ('we', 131), ('a', 121), ('in', 119), ('that', 102), ('our', 90), ('it', 71), ('is', 67), ('have', 56), ('for', 47), ('be', 41), ('i', 40), ('this', 40), ('not', 40), ('by', 38), ('will', 35), ('as', 33), ('all', 33), ('are', 32), ('which',



roosevelt_similar_to_freedom is
 [('is', 0.9989413619041443), ('that', 0.998931884765625), ('and', 0.9989100694656372), ('the', 0.9989092946052551), ('in', 0.9988806843757629), ('to', 0.9988800287246704), ('will', 0.9988728761672974), ('with', 0.9988665580749512), ('an', 0.9988554120063782), ('a', 0.9988448619842529), ('on', 0.9988362789154053), ('all', 0.9988216757774353), ('by', 0.998821496963501), ('of', 0.9988157749176025), ('must', 0.9988040924072266), ('they', 0.99880051612854), ('for', 0.9987898468971252), ('shall', 0.9987871646881104), ('we', 0.9987812042236328), ('who', 0.9987763166427612)]
[('the', 779), ('of', 500), ('and', 391), ('to', 385), ('in', 202), ('that', 163), ('be', 155), ('a', 138), ('which', 128), ('it', 124), ('by', 115), ('i', 113), ('is', 109), ('with', 99), ('as', 87), ('all', 85), ('our', 85), ('have', 84), ('not', 84), ('we', 72), ('this', 70), ('for', 68), ('will', 67), ('on', 59), ('no', 57), ('or', 57), ('from', 56), ('their', 55), ('but', 53), ('them',

rushmore_similar_to_freedom is
 [('first', 0.9995087385177612), ('is', 0.9994827508926392), ('will', 0.9994697570800781), ('without', 0.9994688630104065), ('than', 0.9994671940803528), ('such', 0.999462902545929), ('most', 0.9994590282440186), ('he', 0.9994571805000305), ('his', 0.9994518756866455), ('they', 0.9994438290596008), ('law', 0.9994420409202576), ('administration', 0.9994399547576904), ('some', 0.9994398355484009), ('other', 0.9994380474090576), ('should', 0.9994351863861084), ('that', 0.9994345307350159), ('under', 0.9994276762008667), ('with', 0.99942547082901), ('shall', 0.9994221925735474), ('war', 0.9994159936904907)]
