# Analysis of US Presidential Inaugural Speeches

<img src = "https://upload.wikimedia.org/wikipedia/commons/8/83/Barack_Obama%27s_2013_inaugural_address_at_the_U.S._Capitol.jpg">

The inauguration of the president of the United States is a ceremony to mark the commencement of a new four-year term of the president of the United States. This ceremony takes place for each new presidential term, even if the president is continuing in office for another term.

# Importing Libraries

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.style as style
import re
import string
import itertools
import collections
from bs4 import BeautifulSoup
from wordcloud import WordCloud
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, RegexpTokenizer
style.use(['fivethirtyeight'])

# Importing Dataset

In [None]:
speech_DF = pd.read_csv('../input/presidentialaddress/inaug_speeches.csv', encoding= 'latin1')

In [None]:
speech_DF.head()

In [None]:
speech_DF = speech_DF.drop(columns = 'Unnamed: 0')

In [None]:
speech_DF.head()

In [None]:
speech_DF.isnull().sum()

There are no missing values in the dataset.

# Inaugural Addresses

In [None]:
plt.figure(figsize = (7,7))
sns.countplot(data = speech_DF, x = 'Inaugural Address')
plt.xticks(rotation = 90)
plt.show()

In [None]:
speech_DF['Inaugural Address'].value_counts()

In [None]:
print("Total Number of Presidential Inaugural Addresses in US History: ", speech_DF.shape[0])

In [None]:
print("US Presidents Who Have Given Inaugural Speeches:\n\n", speech_DF['Name'].unique())

In [None]:
print("Total Number of US Presidents Who Have Given Inaugural Speeches: ", speech_DF['Name'].unique().size)

# Presidents With No Inaugural Addresses

As of July 2020, there have been 45 presidents in the White House and 58 inaugural addresses. However, only 39 presidents have given their inaugural addresses because a few vice-presidents ascended to the presidency due to the sudden demises or resignations of their predecessors and were not re-elected for another term, and hence the following presidents are not mentioned in this dataset:

* <img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/a/ab/John_Tyler_%28cropped_3x4%29.png/800px-John_Tyler_%28cropped_3x4%29.png" height = 160px width = 160px> **John Tyler**: He succeeded William Henry Harrison, the 9th president of the United States, who passed away 31 days into his term due to pneumonia. Harrison became the first president to die in office and had the shortest tenure.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/7/7e/Millard_Fillmore.jpg/220px-Millard_Fillmore.jpg" height = 160px width = 160px> **Millard Fillmore**: He was the 13th president of the United States, replacing Zachary Taylor who died while in office due to stomach disease on July 9, 1850.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Andrew_Johnson_photo_portrait_head_and_shoulders%2C_c1870-1880-Edit1.jpg/220px-Andrew_Johnson_photo_portrait_head_and_shoulders%2C_c1870-1880-Edit1.jpg"  height = 160px width = 160px> **Andrew Johnson**: He was the 17th president of the United States. He replaced Abraham Lincoln, who was assassinated by John Wilkes Booth, while attending a play named at the Ford's Theatre.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/7/79/Chester_Alan_Arthur.jpg/220px-Chester_Alan_Arthur.jpg"  height = 160px width = 160px> **Chester A. Arthur**: He was the 21st president of the United States. He replaced James A. Garfield, who was shot by Charles J. Guiteau on July 2, 1881 and died 79 days later.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/36/Gerald_Ford_presidential_portrait_%28cropped%29.jpg/220px-Gerald_Ford_presidential_portrait_%28cropped%29.jpg"  height = 160px width = 160px> **Gerald Ford**: He was the 38th president of the United States. He replaced Richard M. Nixon, who resigned from office on August 9, 1974 due to his involvement in the Watergate scandal. Ford's 895 day-long presidency is the shortest in US history for any president who did not die in office. He is the only person to have served as both vice president and president without being elected to either office by the Electoral College.

# Presidents With One Inaugural Address

In [None]:
speech_DF.Name[speech_DF['Inaugural Address'] == 'Inaugural Address']

## The following presidents were elected into office but did not complete their term:

* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/c/c5/William_Henry_Harrison_daguerreotype_edit.jpg/220px-William_Henry_Harrison_daguerreotype_edit.jpg" height = 160px width = 160px> **William Henry Harrison**: Harrison, the 9th president of the United States, died 31 days into his presidency due to pneumonia.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/5/51/Zachary_Taylor_restored_and_cropped.jpg/220px-Zachary_Taylor_restored_and_cropped.jpg" height = 160px width = 160px> **Zachary Taylor**: He was the 12th president of the United States. He died 1 year, 4 months and 5 days into his term due to a stomach disease on July 9, 1850.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/James_Abram_Garfield%2C_photo_portrait_seated.jpg/220px-James_Abram_Garfield%2C_photo_portrait_seated.jpg" height = 160px width = 160px> **James A. Garfield**: He was the 20th president of the United States. He was assassinated 6 months and 15 days into his term and died 79 days later.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Warren_G_Harding-Harris_%26_Ewing.jpg/220px-Warren_G_Harding-Harris_%26_Ewing.jpg" height = 160px width = 160px> **Warren G. Harding**: He was the 29th president of the United States. He died 2 years, 4 months and 29 days into his term due to a heart attack.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/John_F._Kennedy%2C_White_House_color_photo_portrait.jpg/220px-John_F._Kennedy%2C_White_House_color_photo_portrait.jpg" height = 160px width = 160px> **John F. Kennedy**: He was the 35th president of the United States. He was assassinated 2 years, 10 months and 10 days into his term on November 22, 1963.

## The following presidents have served a partial term and were re-elected to serve a full term:

* <img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1c/President_Roosevelt_-_Pach_Bros.jpg/220px-President_Roosevelt_-_Pach_Bros.jpg" height = 160px width = 160px> **Theodore Roosevelt**: He was the 26th president of the United States who served a partial term of 3 years, 5 months and 18 days followed by a full term. He succeeded William McKinley, who was assassinated by Leon Czolgosz on September 6, 1901.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a3/Calvin_Coolidge_cph.3g10777_%28cropped%29.jpg/220px-Calvin_Coolidge_cph.3g10777_%28cropped%29.jpg" height = 160px width = 160px> **Calvin Coolidge**: He was the 30th presdient of the United States who served a partial term of 1 year, 7 months and 2 days followed by a full term. He succeeded Warren G Harding, who died of a sudden heart attack.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/TRUMAN_58-766-06_%28cropped%29.jpg/220px-TRUMAN_58-766-06_%28cropped%29.jpg" height = 160px width = 160px> **Harry S. Truman**: He was the 33rd president of the United States who served a partial term of 3 years, 9 months and 8 days, after the death of the previous president Franklin D. Roosevelt, and continued on for a full term.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/37_Lyndon_Johnson_3x4.jpg/220px-37_Lyndon_Johnson_3x4.jpg" height = 160px width = 160px> **Lyndon Baines Johnson**: He was the 36th presdient of the United States who served a partial term of 1 year, 1 month and 29 days, due to the assassination of the previous president John F Kennedy on November 22, 1963. He continued on for another term.

# Presidents With Two (Or More) Inaugural Addresses

In [None]:
speech_DF.Name[speech_DF['Inaugural Address'] == 'Second Inaugural Address']

<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Grover_Cleveland_-_NARA_-_518139_%28cropped%29.jpg/220px-Grover_Cleveland_-_NARA_-_518139_%28cropped%29.jpg" height = 300px width = 250px>

Out of all these presidents, Grover Cleveland (picture above) is the only one to have served 2 non-consecutive terms (1885–1889 and 1893–1897).

## The following presidents did not complete their second term:

* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/ab/Abraham_Lincoln_O-77_matte_collodion_print.jpg/220px-Abraham_Lincoln_O-77_matte_collodion_print.jpg" height = 160px width = 160px> **Abraham Lincoln**: He was the 16th president of the United States who was shot and died 1 month and 11 days into his second term.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Mckinley.jpg/220px-Mckinley.jpg" height = 160px width = 160px> **William McKinley**: He was the 25th president of the United States who was shot and died 6 months and 10 days into his second term.


* <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Richard_Nixon_presidential_portrait.jpg/220px-Richard_Nixon_presidential_portrait.jpg" height = 160px width = 160px> **Richard M. Nixon**: He was the 37th president of the United States who resigned 1 year, 6 months, and 20 days into his second term, due to his involvement in the Watergate scandal.

In [None]:
speech_DF.Name[speech_DF['Inaugural Address'] == 'Third Inaugural Address']

<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/FDR_1944_Color_Portrait.jpg/800px-FDR_1944_Color_Portrait.jpg" height = 300px width = 250px>

Franklin D. Roosevelt (picture above) is the only US president to date to have served more than 2 terms. He passed away three months into his fourth term in office due to this ailing health. There have been no presidents who have served more than 2 terms  before and after him - the latter due to the ratification of the 22nd Amendment to the United States Constitution that states that no individual can serve as president for more than 2 terms.

# Speech Length (With Stop Words)

Stop words usually refers to the most common words in a language such as *the, a, an, at, is, are, on, in, I, we,* and so on.

In [None]:
speech_DF_1 = speech_DF.copy()

## Counting the Number of Words in Each Speech

In [None]:
speech_DF_1["Speech Length"] = speech_DF_1["text"].apply(lambda w : len(re.findall(r'\w+', w)))

## Extracting Year from the Date

In [None]:
speech_DF_1["Speech Year"] = pd.DatetimeIndex(speech_DF_1["Date"]).year

In [None]:
speech_DF_1.head()

In [None]:
plt.figure(figsize = (10,4))
sns.boxplot(data = speech_DF_1, x = "Speech Length", color = '#d43131')
plt.xlabel("Number of Words")
plt.show()

In [None]:
speech_DF_1['Speech Length'].describe()

An average inaugural speech has around 2362 words, which can run for around 18 minutes (assuming a speech delivery rate of 130 words per minute). The shortest speech is 135 words, which can be clocked at 1 minute. The longest speech is 8507 words, which can go on for 1 hour and 5 minutes.

## Speech Length Over The Years

In [None]:
plt.figure(figsize = (13, 7))
sns.lineplot(data = speech_DF_1, x = "Speech Year", y = "Speech Length")
plt.ylabel("Number of Words")
plt.ylim(0,9000)
plt.title("Length of Inaugural Addresses Over The Years")
plt.show()

There has been significant reduction in the length of the inaugural addresses over the years, with speeches not crossing 3000 words since 1950.

## Longest Speech

In [None]:
print("Longest Speech:\n\n", speech_DF_1.iloc[speech_DF_1["Speech Length"].idxmax(axis=1), [0,1,2,4]])

<img src = "https://upload.wikimedia.org/wikipedia/commons/c/c5/William_Henry_Harrison_daguerreotype_edit.jpg" height = 300px width = 250px>

The 9th president of the United States, William Henry Harrison, has the longest US inaugural speech of all time, adding up to 8507 words including stop words. In fact, the length of the speech factored in with the cold temperature that day and him choosing not to wear any winter clothes led to him contracting pneunomia, that led to his demise 31 days into his presidency.

## Shortest Speech

In [None]:
print("Shortest Speech:\n\n", speech_DF_1.iloc[speech_DF_1["Speech Length"].idxmin(axis=1), [0,1,2,4]])

<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/George_Washington_1795.jpg/800px-George_Washington_1795.jpg" height = 300px width = 250px>

The first president of the United States, George Washington, gave the shortest ever speech of 135 words at his 2nd inaugural address.

# Text Cleaning

In [None]:
speech_DF_clean = speech_DF_1.copy()
speech_DF_clean = speech_DF_clean.drop(columns = "Speech Length")

## Converting Each Speech to String

In [None]:
for s in speech_DF_clean['text']:
     s = str(s)

## Removal of Unicode Characters

In [None]:
def remove_u(s):
    no_u = re.sub(r'<.*?>', ' ', s)
    return no_u

speech_DF_clean['text'] = speech_DF_clean['text'].apply(lambda x: remove_u(x))

## Removal of Punctuation

In [None]:
def remove_punc(s):
    no_punc = "".join([i for i in s if i not in string.punctuation])
    return no_punc

speech_DF_clean['text'] = speech_DF_clean['text'].apply(lambda x: remove_punc(x))

## Removal of Extra Spaces

In [None]:
def remove_space(s):
    soup = BeautifulSoup(s, 'lxml')
    no_space = soup.get_text(strip = True)
    no_space = no_space.replace(u'\xa0', u'')
    return no_space

speech_DF_clean['text'] = speech_DF_clean['text'].apply(lambda x: remove_space(x))

## Tokenizing and Conversion to Lower Case

In [None]:
tokenizer = RegexpTokenizer(r'\w+')
speech_DF_clean['text'] = speech_DF_clean['text'].apply(lambda x: tokenizer.tokenize(x.lower()))

## Removal of Stop Words

In [None]:
def remove_stop_words(s):
    no_stop = [i for i in s if i not in stopwords.words('english')]
    return no_stop

speech_DF_clean['text'] = speech_DF_clean['text'].apply(lambda x: remove_stop_words(x))

In [None]:
def joining(s):
    joined_words = " ".join([i for i in s])
    return joined_words

speech_DF_clean['text'] = speech_DF_clean['text'].apply(lambda x: joining(x))

After all the alterations conducted above, the following is an example of a how the speech text would look.

In [None]:
speech_DF_clean.text[1]

# Wordcloud

In [None]:
speeches = pd.Series(speech_DF_clean['text'].tolist()).astype(str)
plt.figure(figsize = (9, 9))
wcloud_all = WordCloud(width = 900, height = 900, colormap = 'magma', max_words = 150).generate(''.join(speeches))
plt.imshow(wcloud_all)
plt.tight_layout(pad = 0.2)
plt.axis('off')
plt.show()

In [None]:
speech_DF_token = speech_DF_clean.copy()

In [None]:
speech_DF_token['text'] = speech_DF_token['text'].apply(lambda x: tokenizer.tokenize(x.lower()))

In [None]:
speech_DF_token.head()

In [None]:
speech_list = list(itertools.chain.from_iterable(speech_DF_token['text']))

In [None]:
word_freq = collections.Counter(speech_list)

In [None]:
word_freq_DF = pd.DataFrame(word_freq.most_common(15), columns=['Words', 'Count'])

word_freq_DF.head(15)

In [None]:
plt.figure(figsize = (12, 7))
sns.barplot(data = word_freq_DF, x = "Words", y = "Count")
plt.ylabel("Frequency")
plt.ylim(0,600)
plt.xticks(rotation = 90)
plt.title("15 Most Frequent Words in Inaugural Speeches Overall")
plt.show()

# Inaugural Speeches - 18th Century

In [None]:
speech_DF_token_1700 = speech_DF_token[speech_DF_token['Speech Year'] <= 1799]

In [None]:
print("Number of Inaugural Addresses in the 18th Century: ", speech_DF_token_1700.shape[0])
print("Number of Presidents in the 18th Century: ", speech_DF_token_1700['Name'].unique().size)

In [None]:
speech_list_1700 = list(itertools.chain.from_iterable(speech_DF_token_1700['text']))

In [None]:
word_freq_1700 = collections.Counter(speech_list_1700)
word_freq_DF_1700 = pd.DataFrame(word_freq_1700.most_common(10), columns=['Words', 'Count'])
word_freq_DF_1700.head(10)

In [None]:
plt.figure(figsize = (12, 7))
sns.barplot(data = word_freq_DF_1700, x = "Words", y = "Count")
plt.ylabel("Frequency")
plt.xticks(rotation = 90)
plt.title("10 Most Frequent Words in Inaugural Speeches in the 18th Century")
plt.show()

# Inaugural Speeches - 19th Century

In [None]:
speech_DF_token_1800 = speech_DF_token[speech_DF_token['Speech Year'].between(1800, 1899, inclusive = True)]

In [None]:
print("Number of Inaugural Addresses in the 19th Century: ", speech_DF_token_1800.shape[0])
print("Number of Presidents in the 19th Century: ", speech_DF_token_1800['Name'].unique().size)

In [None]:
speech_list_1800 = list(itertools.chain.from_iterable(speech_DF_token_1800['text']))

In [None]:
word_freq_1800 = collections.Counter(speech_list_1800)
word_freq_DF_1800 = pd.DataFrame(word_freq_1800.most_common(10), columns=['Words', 'Count'])
word_freq_DF_1800.head(10)

In [None]:
plt.figure(figsize = (12, 7))
sns.barplot(data = word_freq_DF_1800, x = "Words", y = "Count")
plt.ylabel("Frequency")
plt.xticks(rotation = 90)
plt.title("10 Most Frequent Words in Inaugural Speeches in the 19th Century")
plt.show()

# Inaugural Speeches - 20th Century

In [None]:
speech_DF_token_1900 = speech_DF_token[speech_DF_token['Speech Year'].between(1900, 1999, inclusive = True)]

In [None]:
print("Number of Inaugural Addresses in the 20th Century: ", speech_DF_token_1900.shape[0])
print("Number of Presidents in the 20th Century: ", speech_DF_token_1900['Name'].unique().size)

In [None]:
speech_list_1900 = list(itertools.chain.from_iterable(speech_DF_token_1900['text']))

In [None]:
word_freq_1900 = collections.Counter(speech_list_1900)
word_freq_DF_1900 = pd.DataFrame(word_freq_1900.most_common(10), columns=['Words', 'Count'])
word_freq_DF_1900.head(10)

In [None]:
plt.figure(figsize = (12, 7))
sns.barplot(data = word_freq_DF_1900, x = "Words", y = "Count")
plt.ylabel("Frequency")
plt.xticks(rotation = 90)
plt.title("10 Most Frequent Words in Inaugural Speeches in the 20th Century")
plt.show()

# Inaugural Speeches - 21st Century

In [None]:
speech_DF_token_2000 = speech_DF_token[speech_DF_token['Speech Year'] >= 2000]

In [None]:
print("Number of Inaugural Addresses in the 21st Century: ", speech_DF_token_2000.shape[0])
print("Number of Presidents in the 21st Century: ", speech_DF_token_2000['Name'].unique().size)

In [None]:
speech_list_2000 = list(itertools.chain.from_iterable(speech_DF_token_2000['text']))

In [None]:
word_freq_2000 = collections.Counter(speech_list_2000)
word_freq_DF_2000 = pd.DataFrame(word_freq_2000.most_common(10), columns=['Words', 'Count'])
word_freq_DF_2000.head(10)

In [None]:
plt.figure(figsize = (12, 7))
sns.barplot(data = word_freq_DF_2000, x = "Words", y = "Count")
plt.ylabel("Frequency")
plt.xticks(rotation = 90)
plt.title("10 Most Frequent Words in Inaugural Speeches in the 21st Century")
plt.show()

# Conclusion From Inaugural Speeches

* The word 'government' is not mentioned as frequently in the 2000s as it is used to be in the previous centuries.
* The word 'peace' is most frequently mentioned in the 20th century. A possible explanation could be the rise in active involvement of the United States in international conflicts such as World War I & II, Korean War, Laotian Civil War, Vietnam War and many more; internal conflicts and the rising tensions with the Soviet Union from the late-40s to the early-90s. Hence, the word might have been repeated by the incumbent presidents to promise no conflicts.

# Future Developments

* Stemming and Lemmatization
* Sentiment Analysis