**Analysis of Presidential speech and election data**

This notebook scrapes [The American Presidency Project](http://www.presidency.ucsb.edu) and downloads the campagin speeches of all 2016 presidential candidates.  It then builds a markov chain out each president's data capable of generating sentences in the style of their campaign speeches. 

In [1]:
import pandas as pd
import numpy as np
import requests
from lxml import html
from bs4 import BeautifulSoup
import markovify
import os.path

In [2]:

def getCandidateSpeechLinks(url):
    allCandidatePage = requests.get(url)
    allCandidatePageSoup = BeautifulSoup(allCandidatePage.text, 'lxml')
    links={}
    table = allCandidatePageSoup.find('table', width=680)
    for area in table.findAll('td', class_='doctext'):
        for a in area.findAll('a'):
            if ('campaign' in a.text.lower()):
                links[area.find('span', class_='roman').text] = a['href']
    return links

def scrapeCampaignSpeechesToFile(url, path):
    allSpeechPages = requests.get(url)
    allSpeechSoup=BeautifulSoup(allSpeechPages.text, 'lxml')
    root = 'http://www.presidency.ucsb.edu/'
    table = allSpeechSoup.find('table', width=700)
    links = []
    for link in table.findAll('a'):
        if('interview' not in link.text.lower()):
            links.append(root+(link['href'])[3:])

    speechPages = [requests.get(link , 'lxml')for link in links]
    speechesSoup = [BeautifulSoup(speechPage.text, 'lxml') for speechPage in speechPages]

    with open(path, "w+", encoding='utf-8') as outFile:
        outFile.seek(0)
        for i,speech in enumerate(speechesSoup):            
            outFile.write(speechesSoup[i].find('span', class_='displaytext').text+'\n')

def trainMarkov(path):

    # Get raw text as string.
    with open(path, encoding='utf-8') as f:
        text = f.read()

    # Build the model.
    text_model = markovify.Text(text)
    return text_model

def campaignLinkToBots(url, year):
    
    dataFolder = './Campaign Speeches/'+ str(year) +'/'
    
    if not os.path.exists(dataFolder):
        os.makedirs(dataFolder)
    
    #Create the dictionary of each candidate's name and link to their campaign speech page    
    campaignSpeechLinkDict = getCandidateSpeechLinks(url)
    
    root = 'http://www.presidency.ucsb.edu/'
    
    #Loops through the campagin speech links, puts each candidate's campagin speeches into individual files
    for name, url in campaignSpeechLinkDict.items():
        path = dataFolder + name.replace(' ', '-') + '.txt'
        if not os.path.isfile(path):
            scrapeCampaignSpeechesToFile(root + url, path)
    
    #Train the bots and store them in a dictionary
    bots = {}
    for pres in campaignSpeechLinkDict.keys():
        bots[pres] = trainMarkov(dataFolder + pres.replace(' ', '-') + '.txt')
    
    #return the bot dictionary
    return bots

In [3]:
bots = campaignLinkToBots('http://www.presidency.ucsb.edu/2016_election.php', 2016)

for name,bot in bots.items():
    print('\n' + name + ': ')
    for i in range(10):
        print(bot.make_short_sentence(max_chars=140))


Scott Walker: 
This is fundamentally important to the people is usually the best.
I love America.As kids, my brother David and I like to shop at Kohl's.
And that ladies and gentlemen is why we should move power and money out of Washington and send it to be leaders in unusual ways.
It was then my honor to be for real reform in Washington.Our big, bold reforms in Wisconsin than ever before.
Because in America, we celebrate our independence from the mighty hand of the United States.
Then, I take out the flyer that we were tired of high unemployment, budget deficits, stifling taxes and rising college tuition.
Most of all I want to be leaders in unusual ways.
Now, more than 100,000 protesters who occupied our state and local government.
Tonette and I when we were making it harder to get the job done.
My grandparents were farmers who didn't have indoor plumbing until my mom worked as a part-time secretary and bookkeeper.

John Kasich: 
And ironically, I met him at the core Judeo-Christian W

In [4]:
bots = campaignLinkToBots('http://www.presidency.ucsb.edu/2008_election.php', 2008)

for name,bot in bots.items():
    print('\n' + name + ': ')
    for i in range(10):
        print(bot.make_short_sentence(max_chars=140))


Hillary Clinton : 
Part of the people in Washington today.
It felt like the doors of higher education.
Our economy is not with us because I remember very well thrust us into war in Iraq and once you are invisible.
In fact, the DNC rules.
And just like what you said, and, obviously, I turn to her during this campaign is not enough to cover employees.
And we need to do to make sure it is part of my projects up in these next primaries are another test.
This is totally affordable, everything I can not be here, and millions of Americans to do that.
I was proud to say about what I was raised.
We have thirty thousand young Americans are actually pro-labor.
I have been to Gilo and seen stacks of papers that have unfortunately shifted so much at stake in our history.

Joseph Biden : 
Or she might raise Abu Ghraib to recruit additional terrorists.
Consider all this talk of war is not the dollar-a-day opium farmers.
This war in Iraq every three weeks.Last year Afghanistan produced 92% of the war

In [9]:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(min_df=1)
with open('./Campaign Speeches/2016/Donald-Trump.txt', "r+", encoding='utf-8') as file:
    X = vectorizer.fit_transform(file)
X

<74x8085 sparse matrix of type '<class 'numpy.int64'>'
	with 61650 stored elements in Compressed Sparse Row format>

In [15]:
X.toarray().shape

(74, 8085)

In [16]:
import tensorflow as tf