**Analysis of Presidential speech and election data**

This notebook scrapes [The American Presidency Project](http://www.presidency.ucsb.edu) and downloads the campagin speeches of all 2016 presidential candidates.  It then builds a markov chain out each president's data capable of generating sentences in the style of their campaign speeches. 

In [2]:
import pandas as pd
import numpy as np
import requests
from lxml import html
from bs4 import BeautifulSoup
import markovify
import os.path

In [44]:

def getCandidateSpeechLinks(url):
    allCandidatePage = requests.get(url)
    allCandidatePageSoup = BeautifulSoup(allCandidatePage.text, 'lxml')
    links={}
    table = allCandidatePageSoup.find('table', width=680)
    for area in table.findAll('td', class_='doctext'):
        for a in area.findAll('a'):
            if ('campaign' in a.text.lower()):
                links[area.find('span', class_='roman').text] = a['href']
    return links

def scrapeCampaignSpeechesToFile(url, path):
    allSpeechPages = requests.get(url)
    allSpeechSoup=BeautifulSoup(allSpeechPages.text, 'lxml')
    root = 'http://www.presidency.ucsb.edu/'
    table = allSpeechSoup.find('table', width=700)
    links = []
    for link in table.findAll('a'):
        if('interview' not in link.text.lower()):
            links.append(root+(link['href'])[3:])

    speechPages = [requests.get(link , 'lxml')for link in links]
    speechesSoup = [BeautifulSoup(speechPage.text, 'lxml') for speechPage in speechPages]

    with open(path, "w+", encoding='utf-8') as outFile:
        outFile.seek(0)
        for i,speech in enumerate(speechesSoup):            
            outFile.write(speechesSoup[i].find('span', class_='displaytext').text+'\n')

def trainMarkov(path):

    # Get raw text as string.
    with open(path, encoding='utf-8') as f:
        text = f.read()

    # Build the model.
    text_model = markovify.Text(text)
    return text_model

def campaignLinkToBots(url, year):
    
    dataFolder = './Campaign Speeches/'+ str(year) +'/'
    
    if not os.path.exists(dataFolder):
        os.makedirs(dataFolder)
    
    #Create the dictionary of each candidate's name and link to their campaign speech page    
    campaignSpeechLinkDict = getCandidateSpeechLinks(url)
    
    root = 'http://www.presidency.ucsb.edu/'
    
    #Loops through the campagin speech links, puts each candidate's campagin speeches into individual files
    for name, url in campaignSpeechLinkDict.items():
        path = dataFolder + name.replace(' ', '-') + '.txt'
        if not os.path.isfile(path):
            scrapeCampaignSpeechesToFile(root + url, path)
    
    #Train the bots and store them in a dictionary
    bots = {}
    for pres in campaignSpeechLinkDict.keys():
        bots[pres] = trainMarkov(dataFolder + pres.replace(' ', '-') + '.txt')
    
    #return the bot dictionary
    return bots

In [45]:
bots = campaignLinkToBots('http://www.presidency.ucsb.edu/2016_election.php', 2016)

for name,bot in bots.items():
    print('\n' + name + ': ')
    for i in range(10):
        print(bot.make_short_sentence(max_chars=140))


Hillary Clinton : 
I think your story over the course of events, but also to respect each other.
You have to do well.
At that point, they won't get anything done in Washington.
And healthcare providers who wish us ill, but we can't simply recycle the policies that discourage sprawl and congestion.
We are a hardship for their children, doing their jobs because employers won't be easy.
Now, will this actually work?
And today, as we can.
Who will welcome the president of the Beijing government's actions.
And I said what do you deal with some new ones.
I have lost some industry.

Bernie Sanders: 
I want to focus on issues of real conversions like industrial policies reforming a central challenge.
Problem is, it has imposed massive spending cuts that have caused devastating pain to some of that happening is essentially zero.
Yes, I was your age, the challenge of my state by 10 votes, I think when we have received 2 1/2 million individual contributions.
But I know Mitch McConnell.
And what 

In [43]:
bots = campaignLinkToBots('http://www.presidency.ucsb.edu/2008_election.php', 2008)

for name,bot in bots.items():
    print('\n' + name + ': ')
    for i in range(10):
        print(bot.make_short_sentence(max_chars=140))


Barack Obama: 
And to all of us must summon that spirit as well.
A streamlined system will provide as President.
That's why we're here to ask you to knock on some workers.
John McCain and the pay bills; to give their mothers and fathers, their sisters and brothers.
If you already know the outrage of the 21st century.
WARREN: Your wife's like that, simply talk about curbing our use of fossil fuels in State of the month.
No matter what happens in Iraq, means that we're here, every American to watch.
You can take is embracing the same politicians in Washington in charge of training Iraq's Security Forces if the oil we import over half.
All of those who serve, and for the presidency to make a good thing - you feel it in the Americas.
You're tired of the great project of my positions.

Hillary Clinton : 
Would you stand up for itself, to provide more rental housing.
The contrasts between me and I voted against it.
Well, today I want to receive a B.A.
My daughter didn't need to look for way

In [51]:
from os import path
from PIL import Image
import matplotlib.pyplot as plt

from wordcloud import WordCloud, STOPWORDS

campaignSpeechLinkDict = getCandidateSpeechLinks('http://www.presidency.ucsb.edu/2016_election.php')
for candidate in campaignSpeechLinkDict.keys():
    candidate_file = candidate.replace(' ', '-')
    text = open('./Campaign Speeches/2016/' + candidate_file + '.txt', 'r+', encoding = 'utf8').read()

    america_mask = np.array(Image.open("./us-map-silhouette-vector.png"))

    names = candidate.split(' ')
    stopwords = set(STOPWORDS)
    stopwords.add("applause")
    stopwords.add(names[0])
    stopwords.add(names[1])

    wc = WordCloud(background_color="white", max_words=2000, mask=america_mask,
                   stopwords=stopwords)
    # generate word cloud
    wc.generate(text)

    # store to file
    wc.to_file( "./clouds/" + candidate_file + ".png")