## Table of Contents

1. Introduction
2. Install & Import Packages
3. Access HTML and Create Selector Object
4. Scrape Data and Create Dataframes for States, Polls Closing, and Races to Watch
5. Concatenate & Style

## 1. Introduction

It's hard to believe that it's finally Election Day. To reduce anxiety while watching results, it would be great to have poll closing times and key races to watch by state all in one place. We've scraped information from FiveThirtyEight (https://projects.fivethirtyeight.com/election-results-timing/) using scrapy and css, converted the Selector lists to dataframes, cleaned where necessary, including using .string, .split, and .get functions for string columns, and created a concatenated dataframe. Hopefully, it will help me avoid stress-binging 8-packs of Ferrero Rocher.  

## 2. Install & Import Packages

In [1]:
import pandas as pd

!pip install scrapy
import scrapy
from scrapy import Selector
import requests



## 3. Access HTML and Create Selector Object

In [2]:
# Url containing html
url = "https://projects.fivethirtyeight.com/election-results-timing/"

# Get html source code using requests.get and .content and store in string html
html = requests.get(url).content

# Create the Selector object sel from html. Remember Selector returns a list
sel = Selector(text = html)

# Check number of html elements. We use xpath here, a space means all generations and * is a wildcard for any child elements
print("Number of elements in html document: ", len(sel.css(' *')))

Number of elements in html document:  1775


## 4. Scrape Data and Create Dataframes for States, Polls Closing, and Races to Watch

### States

In [3]:
# Scrape text of state names (inspecting html - in class state-name), Selector returns list
states = sel.css('.state-name::text').extract()

# Convert to dataframe
states = pd.DataFrame(states, columns=['States'])

### Poll Closing

In [4]:
# Scrape text of poll closing (inspecting html, in p element class info-text within info-box last_polls_close class, use . for space in class), Selector returns list
polls_close = sel.css('.info-box.last_polls_close > p.info-text::text').extract()

# Convert to dataframe
polls_close = pd.DataFrame(polls_close, columns=['Polls Close (EST)'])

# Let's check
polls_close.head()

Unnamed: 0,Polls Close (EST)
0,8 p.m. Eastern
1,1 a.m. Eastern
2,9 p.m. Eastern
3,8:30 p.m. Eastern
4,11 p.m. Eastern


In [5]:
# Remove redundant 'Eastern' by splitting original string on ' Eastern' and taking first string using str.get(0)
polls_close['Polls Close (EST)'] = polls_close['Polls Close (EST)'].str.split(' Eastern').str.get(0)
polls_close.head()

Unnamed: 0,Polls Close (EST)
0,8 p.m.
1,1 a.m.
2,9 p.m.
3,8:30 p.m.
4,11 p.m.


### Races to Watch

In [6]:
# Scrape text of races to watch (inspecting html, in p element class info-text within info-box races_to_watch class, use . for space in class), Selector returns list
races_to_watch = sel.css('.info-box.races_to_watch > p.info-text::text').extract()

# Convert to dataframe
races_to_watch = pd.DataFrame(races_to_watch, columns=['Races to Watch'])

## 5. Concatenate & Style

In [7]:
# Dataframe has alot of companies and text so let's set row and column display options 
pd.set_option('display.max_colwidth', None)

# Concatenate 3 dataframes to create full dataframe 
state_polls = pd.concat([states,polls_close,races_to_watch], axis=1)

# Left justify the text using .style.set_properties. Left justify the column name using set_table_styles([dict(selector='th', props=[('text-align', 'left')
state_polls.style.set_properties(**{'text-align': 'left'}).set_table_styles([dict(selector='th', props=[('text-align', 'left')])])

Unnamed: 0,States,Polls Close (EST),Races to Watch
0,Alabama,8 p.m.,Senate
1,Alaska,1 a.m.,"President, Senate, House"
2,Arizona,9 p.m.,President; Senate; 1st and 6th congressional districts
3,Arkansas,8:30 p.m.,2nd Congressional District
4,California,11 p.m.,"1st, 4th, 10th, 21st, 25th, 39th, 42nd, 45th, 48th and 50th congressional districts"
5,Colorado,9 p.m.,"Senate, 3rd Congressional District"
6,Connecticut,8 p.m.,No major races
7,Delaware,8 p.m.,No major races
8,District of Columbia,8 p.m.,No major races
9,Florida,8 p.m.,"President; 15th, 16th, 18th, 26th and 27th congressional districts"
