**Analysis of Presidential speech and election data**

This notebook scrapes [The American Presidency Project](http://www.presidency.ucsb.edu) and downloads the campagin speeches of all 2016 presidential candidates.  It then builds a markov chain out each president's data capable of generating sentences in the style of their campaign speeches. 

In [1]:
import pandas as pd
import numpy as np
import requests
from lxml import html
from bs4 import BeautifulSoup
import markovify
import os.path

In [2]:

def getCandidateSpeechLinks(url):
    allCandidatePage = requests.get(url)
    allCandidatePageSoup = BeautifulSoup(allCandidatePage.text, 'lxml')
    links={}
    table = allCandidatePageSoup.find('table', width=680)
    for area in table.findAll('td', class_='doctext'):
        for a in area.findAll('a'):
            if ('campaign' in a.text.lower()):
                links[area.find('span', class_='roman').text] = a['href']
    return links

def scrapeCampaignSpeechesToFile(url, path):
    allSpeechPages = requests.get(url)
    allSpeechSoup=BeautifulSoup(allSpeechPages.text, 'lxml')
    root = 'http://www.presidency.ucsb.edu/'
    table = allSpeechSoup.find('table', width=700)
    links = []
    for link in table.findAll('a'):
        if('interview' not in link.text.lower()):
            links.append(root+(link['href'])[3:])

    speechPages = [requests.get(link , 'lxml')for link in links]
    speechesSoup = [BeautifulSoup(speechPage.text, 'lxml') for speechPage in speechPages]

    with open(path, "w+", encoding='utf-8') as outFile:
        outFile.seek(0)
        for i,speech in enumerate(speechesSoup):            
            outFile.write(speechesSoup[i].find('span', class_='displaytext').text+'\n')

def trainMarkov(path):

    # Get raw text as string.
    with open(path, encoding='utf-8') as f:
        text = f.read()

    # Build the model.
    text_model = markovify.Text(text)
    return text_model

def campaignLinkToBots(url, year):
    
    dataFolder = './Campaign Speeches/'+ str(year) +'/'
    
    if not os.path.exists(dataFolder):
        os.makedirs(dataFolder)
    
    #Create the dictionary of each candidate's name and link to their campaign speech page    
    campaignSpeechLinkDict = getCandidateSpeechLinks(url)
    
    root = 'http://www.presidency.ucsb.edu/'
    
    #Loops through the campagin speech links, puts each candidate's campagin speeches into individual files
    for name, url in campaignSpeechLinkDict.items():
        path = dataFolder + name.replace(' ', '-') + '.txt'
        if not os.path.isfile(path):
            scrapeCampaignSpeechesToFile(root + url, path)
    
    #Train the bots and store them in a dictionary
    bots = {}
    for pres in campaignSpeechLinkDict.keys():
        bots[pres] = trainMarkov(dataFolder + pres.replace(' ', '-') + '.txt')
    
    #return the bot dictionary
    return bots

In [3]:
bots = campaignLinkToBots('http://www.presidency.ucsb.edu/2016_election.php', 2016)

for name,bot in bots.items():
    print('\n' + name + ': ')
    for i in range(10):
        print(bot.make_short_sentence(max_chars=140))


Bernie Sanders: 
That has led to the doctor and not just the start of a dialogue about the solvency of itself, and therefore, they're not in U.S. territory?
They want a better world.
What is Donald Trump, we are not standing on the table, no, we are Latino or poor or working class — from Wall Street.
Unemployment and underemployment is at 55 percent.
That situation is not easy.
Now it's Wall Street's time to bring about major criminal justice system.What does that mean?
It costs about $70 billion a year ago, 63 percent of all of the reasons they would have given up looking for work.
We have got to undo the damage that it did not vote.
Many of the percentage may be lower but the people below, you are doing.
Talk to us, if you have a Super PAC.And by the rules, work hard...UNKNOWN: Yes, sir.

Donald Trump: 
This will be used to be ruling?
What have we seen such a short time ago, you'll see everything I said — thank you very proud of those places?
And number two, he doesn't build them he

In [4]:
bots = campaignLinkToBots('http://www.presidency.ucsb.edu/2008_election.php', 2008)

for name,bot in bots.items():
    print('\n' + name + ': ')
    for i in range(10):
        print(bot.make_short_sentence(max_chars=140))


Hillary Clinton : 
And I am so proud to co-sponsor Senator Dodd and Senator Dodd agrees with me again.
They have followed a policy that took the call, which was the bank lobby's dream.
Now, I applaud you for helping those who seek to serve.
And that's what our competitors are doing.
And I would like to do that?
I am running for president back in the White House will welcome you.
Because the conversation going.
But I always tell them they can fulfill those dreams.
Or the mom who can't take it back, when it acts on behalf of peace and tolerance.
Because, you know, I have concrete, detailed plans exist.

Mitt Romney: 
Well some people who don't have to speak Farsi or Arabic or Chinese.
Look how that changes as projected by the culture that surrounds them today.
Get up here, Jim Merrill.
And if you want to talk about cutting spending by government.
But it means that two of the threat to world civilization – these challenges will define our generation.
My guess is so would enable the very 