In [1]:
import pandas as pd

# Instructions

1. Head back to the [data sets page](https://github.com/michaelschung/bc-data-processing/tree/master/datasets), and pick one data set that you find particularly interesting.
2. Download the file and use Pandas to load that data set. Before moving on, make sure that it downloaded as a `.csv` file! Check the assignment posting on Classroom if you're having issues.
3. Explore the data by querying the DataFrame. Use this document to record your explorations according to the specifications below.

_Note: There are tons of free data sets out there for you to explore, and in the future you'll be able to use any that you like. For this assignment, however, pick one of the data sets that I've provided for you._

---

### Recording Explorations

Make at least three _thoughtful queries_ into the DataFrame. A thoughtful query consists of:

- A concrete question (in English) that you're trying to answer
- The query itself (in Python, using Pandas)
- 2-3 sentences (in English) that use the query results to answer the initial question

_Note: these instructions are intentionally vague. It's up to you to take this as seriously as you want, but I strongly encourage you to dig deep and learn something from the data you choose to look at. If you think you've asked all possible interesting questions, ask one more._

# Your "Thoughtful Queries"

In [2]:
# Load your CSV file here, and preview the first 10-15 rows
data = pd.read_csv('house_state.csv', index_col=0)
data.head(10)

Unnamed: 0_level_0,current_votes,total_votes,percent
district,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Delaware at large,488270,488270,99
Florida’s 1st district,438389,468000,94
Florida’s 2nd district,328203,355000,92
Florida’s 3rd district,390349,433000,90
Florida’s 4th district,504915,519000,97
Florida’s 5th district,336912,384000,88
Florida’s 6th district,438231,468000,94
Florida’s 7th district,406331,447000,91
Florida’s 8th district,459871,495000,93
Florida’s 9th district,429208,439000,98


## Thoughtful Query #1

Question: Do the districts with the lowest turnout have something in common, like population or state?

In [3]:
# Query 1 here
print(data[data['percent'] < 60])
print('\n\n')
print(data[(data['percent'] >= 60) & (data['percent'] < 70)])

                              current_votes  total_votes  percent
district                                                         
Florida’s 25th district                   0            0        0
Massachusetts’s 7th district         150204       302000       50
New York’s 12th district             194520       338000       58
Alabama’s 5th district               122516       310000       40



                                current_votes  total_votes  percent
district                                                           
Massachusetts’s 8th district           261515       383000       68
New Jersey’s 8th district              172114       262000       66
New York’s 5th district                182481       270000       68
New York’s 10th district               175271       289000       61
New York’s 16th district               195338       309000       63
New York’s 25th district               251039       363000       69
North Carolina’s 13th district         314114       45500

Insight: Almost all these districts are in the Northeast, with the rest in the Southeast, except for one in Mid-Atlantic North Carolina. However, I don't see a correlation to total voting population, since NJ-8 has 262k voters and NC-13 had 455k: almost twice as many.

## Thoughtful Query #2

Question: Do the *most populous* districts have anything in common, like high turnout, location, or political leaning?

In [4]:
# Query 2 here
print(data[data['total_votes'] > 470000])

                               current_votes  total_votes  percent
district                                                          
Delaware at large                     488270       488270       99
Florida’s 4th district                504915       519000       97
Florida’s 8th district                459871       495000       93
Florida’s 11th district               475023       506000       94
Florida’s 16th district               484121       492000       98
Florida’s 18th district               449616       475000       95
Montana at large                      598549       590000       99
North Carolina’s 2nd district         493825       497000       99
North Carolina’s 4th district         489685       516000       95
Texas’s 21st district                 445349       471000       95
Virginia’s 10th district              474990       481000       99
Washington’s 7th district             464037       474000       98
Wisconsin’s 2nd district              456789       500000     

Insight: Surprisingly, it seems like most of these heavily populated districts are in swing states, like Florida, Wisconsin, and Colorado, making the winner of those races that much more important. This is reflected in their high turnout. But they *are* spread all over the country, so my expectation that they'd mostly be huddled around the populous Northwest corridor or southern California has been proven incorrect.

## Thoughtful Query #3

Question: I've heard that states like Ohio and Virginia are "bellwether" states, meaning the party they favor is generally favored nationwide in that year's elections. Are there any "turnout bellwether districts?" (I'll use "United States's 0th district" as the thing to compare to: 93% turnout.) If so, what do they have in common?

In [5]:
# Query 3 here
bellwethers = data[data['percent'] == 93]
print(bellwethers)

                              current_votes  total_votes  percent
district                                                         
Florida’s 8th district               459871       495000       93
Florida’s 21st district              403171       433000       93
Florida’s 27th district              343276       369000       93
Georgia’s 6th district               397042       428000       93
Georgia’s 13th district              359703       386000       93
Maryland’s 5th district              378366       408000       93
Minnesota’s 5th district             396776       427000       93
Minnesota’s 8th district             393058       421000       93
Nevada’s 2nd district                375414       405000       93
New Hampshire’s 1st district         400691       432000       93
Ohio’s 16th district                 389721       421000       93
Pennsylvania’s 16th district         349704       375000       93
Texas’s 5th district                 279622       301000       93
Texas’s 30

Insight: As I expected, there are several turnout bellwether districts, from Maryland to California. However, I don't see anything major they've got in common, like size, which ranges from TX-30 with 285k to FL-8 with 495k.