This notebook seeks to replicate some of the results in this paper by Duchin and others:

<b>Moon Duchin et al. “Locating the representational baseline: Republicans in Massachusetts”. In: (Oct.2018).</b>

They focused on the state of Massachusetts and I want to see how their methods work on the state of Oregon.

This notebook is organized into sections..

## Oregon Data

Here are the citations for the main datasets I used:

<b>MIT Election Data and Science Lab.U.S. President Precinct-Level Returns 2016. Version V11.2018.doi:10.7910/DVN/LYWX3D.url:https://doi.org/10.7910/DVN/LYWX3D.</b>

<b>MIT  Election  Data  and  Science  Lab.U.S.  Senate  Precinct-Level  Returns 2016.  Ver-sion V11.2018.doi:10.7910/DVN/NLTQAD.url:https://doi.org/10.7910/DVN/NLTQAD.</b>

Note that these datasets are very large and that is why they are not included in this repository.

### Section 1: Extracting Only the Data we Need

In [1]:
# pandas for working with csv files
import pandas as pd

In [2]:
# Location of the dataset on your machine, downloaded from the sources above
file_path_2016Pres = "C:/Users/Keith/Desktop/Election Data/2016-precinct-president.csv"
file_path_2016Sen  = "C:/Users/Keith/Desktop/Election Data/2016-precinct-senate.csv"

In [3]:
# These are the original datasets downloaded from the sources above
df_2016Pres_original = pd.read_csv(file_path_2016Pres, encoding = "ISO-8859-1", low_memory=False)
df_2016Sen_original = pd.read_csv(file_path_2016Sen, encoding = "ISO-8859-1", low_memory=False)

# Dropping columns that we won't need
df_2016Pres = df_2016Pres_original.drop(columns = ["state_fips", "state_icpsr", "county_fips", "county_ansi", "candidate_opensecrets", "candidate_wikidata",
                  "candidate_party", "candidate_last", "candidate_first", "candidate_middle", "candidate_full",
                  "candidate_suffix", "candidate_nickname", "candidate_fec", "candidate_fec_name", "candidate_google",
                  "candidate_govtrack", "candidate_icpsr", "candidate_maplight", "writein", "county_lat", "county_long",
                       "candidate"], axis = 1)
df_2016Sen = df_2016Sen_original.drop(columns = ["state_fips", "state_icpsr", "county_fips", "county_ansi", "candidate_opensecrets", "candidate_wikidata",
                  "candidate_party", "candidate_last", "candidate_first", "candidate_middle", "candidate_full",
                  "candidate_suffix", "candidate_nickname", "candidate_fec", "candidate_fec_name", "candidate_google",
                  "candidate_govtrack", "candidate_icpsr", "candidate_maplight", "writein", "county_lat", "county_long",
                       "candidate"], axis = 1)

# Pulling out only Oregon data
ore_2016Pres = df_2016Pres.loc[df_2016Pres["state"] == "Oregon"]
ore_2016Sen  = df_2016Sen.loc[df_2016Sen["state"] == "Oregon"]

In the paper by Duchin and others cited above the authors provide a link to the GitHub repository with the code they used for the analyses. Here is a citation for that repository:

<b>Metric Geometry and Gerrymandering Group, Massachusetts election data repository,
https://github.com/gerrymandr/Massachusetts_underperformance.</b>

The code in this cell (below) takes the data from MIT Election and Data Science Lab and extracts the information necesary to replicate the results of Duchin and others.

In [37]:
def extract_important_data(df):
    precincts = []
    rVotes = []
    dVotes = []
    totalVotes = []

    # Loop through each precinct in the DataFrame
    for precinct in set(df["precinct"].values):
        # Extracting the data associated with this precinct only
        dfPrecinct = df.loc[df["precinct"] == precinct]

        # Saving the information we will need later
        precincts.append(precinct)
        rVotes.append(sum(dfPrecinct.loc[dfPrecinct["party"] == "republican"]["votes"].values))
        dVotes.append(sum(dfPrecinct.loc[dfPrecinct["party"] == "democrat"]["votes"].values))
        totalVotes.append(sum(dfPrecinct["votes"].values))

    # Check to make sure the lenghts aren't off before creating the new DataFrame
    print(len(precincts) == len(rVotes) == len(dVotes) == len(totalVotes))

    # Creating a new DataFrame with this information we have extracted
    df_cleaned = pd.DataFrame({"Precinct": precincts, "Republican_Votes": rVotes, "Democrat_Votes": dVotes, "Total_Votes": totalVotes})

    return df_cleaned

In [43]:
orePres16 = extract_important_data(ore_2016Pres)

True


In [44]:
oreSen16 = extract_important_data(ore_2016Sen)

True


Uncomment this cell below to save this DataFrame to a `csv` in your current working directory.

In [45]:
# orePres16.to_csv("orePres16.csv")
# oreSen16.to_csv("oreSen16.csv")

### Section 2: Is there a "Portland Effect"?

As Duchin and others did with Boston, we want to see if the precinct level Republican two-way vote share (Democrats being the other party) in Portland is reliably lower than the average for precincts in the state of Oregon.

Note that I did not find an way to decipher which precincts are specifically within Portland city limits so I just look at Multnomah county as a whole.

In [6]:
# Extracting data relating only to Multnomah County
multnomah_county_2016Pres = ore_2016Pres.loc[ore_2016Pres["county_name"] == 'Multnomah County']
multnomah_county_2016Sen  = ore_2016Sen.loc[ore_2016Sen["county_name"] == 'Multnomah County']

# Precincts that make up Multnomah County
multnomah_county_precincts_2016Pres = set(multnomah_county_2016Pres["precinct"].values)
multnomah_county_precincts_2016Sen = set(multnomah_county_2016Sen["precinct"].values)

In [7]:
# Quick check to make sure there are no difference in precincts
multnomah_county_precincts_2016Pres == multnomah_county_precincts_2016Sen

True

In [31]:
def mean_share_precinct_republican_state(state, feedback = False):
    """Calculates and returns the mean precinct level Republican two-way vote share
    with respect to an entire state.

    Keyword arguments:
    state -- DataFrame with data pertaining to some single state
    feedback -- boolean indicating weather or not you want to print feedback (default False)
    """
    
    republican_state = state.loc[state["party"] == "republican"]
    precincts = set(republican_state["precinct"].values)
    rep_vote_shares = []
    
    for precinct in precincts:
        total_votes = sum((state.loc[state["precinct"] == precinct])["votes"].values)
        rep_votes = sum((republican_state.loc[republican_state["precinct"] == precinct])["votes"].values)
        
        try:
            rep_vote_shares.append(int(rep_votes) / int(total_votes))
        except ZeroDivisionError:
            if feedback:
                print("Precinct: {} \t\t Total Votes: {}\t Republican Votes: {}".format(precinct, total_votes, rep_votes))
        
    return (sum(rep_vote_shares) / len(rep_vote_shares))

def mean_share_precinct_republican_county(county, feedback = False):
    """Calculates and returns the mean precinct level Republican two-way vote share
    with respect to an entire county.

    Keyword arguments:
    county -- DataFrame with data pertaining to some single county
    feedback -- boolean indicating weather or not you want to print feedback (default False)
    """
    
    republican_county = county.loc[county["party"] == "republican"]
    precincts = set(county["precinct"].values)
    rep_vote_shares = []
    
    for precinct in precincts:
        total_votes = sum((county.loc[county["precinct"] == precinct])["votes"].values)
        rep_votes = sum((republican_county.loc[republican_county["precinct"] == precinct])["votes"].values)
        
        try:
            rep_vote_shares.append(int(rep_votes) / int(total_votes))
        except ZeroDivisionError:
            if feedback:
                print("Precinct: {} \t\t Total Votes: {}\t Republican Votes: {}".format(precinct, total_votes, rep_votes))
        
    return (sum(rep_vote_shares) / len(rep_vote_shares))

def rep_state_share(state):
    """Calculates and returns the mean state wide Republican two-way vote share.

    Keyword arguments:
    state -- DataFrame with data pertaining to some single state
    """
    republican_state = state.loc[state["party"] == "republican"]
    
    return sum(republican_state["votes"].values) / sum(state["votes"].values)

def mean_population_info(unit, precincts):
    """Calculates and returns a String with information about the total voting population
    of the given given geographical unit and the average number of voters per precinct.

    Keyword arguments:
    unit -- DataFrame with data pertaining to some single geographical unit
    precincts -- list of precincts
    """
    populations = []

    for precinct in precincts:
        populations.append(sum((unit.loc[unit["precinct"] == precinct])["votes"].values))

    return("Total (Voting) Population: {}\tAverage Population per Precinct: {}".format(sum(populations), sum(populations) / len(populations)))

#### Portland Effect: 2016 Presidential Election

In [9]:
mean_share_precinct_republican_state(ore_2016Pres)

0.42679386818679305

In [10]:
mean_share_precinct_republican_county(multnomah_county_2016Pres)

0.22452520034360654

In [11]:
rep_state_share(ore_2016Pres)

0.38767353407923505

Precinct Level Population Information

In [32]:
print(mean_population_info(multnomah_county_2016Pres, multnomah_county_precincts_2016Pres))

Total (Voting) Population: 399103	Average Population per Precinct: 3531.8849557522126


State Level Population Information

In [33]:
print(mean_population_info(ore_2016Pres, set(ore_2016Pres["precinct"].values)))

Total (Voting) Population: 2001336	Average Population per Precinct: 2348.9859154929577


#### Portland Effect: 2016 Senate Election

In [12]:
mean_share_precinct_republican_state(ore_2016Sen)

0.36471269975015114

In [13]:
mean_share_precinct_republican_county(multnomah_county_2016Sen)

0.1927538624440809

In [14]:
rep_state_share(ore_2016Sen)

0.33347691163583487

Precinct Level Population Information

In [34]:
print(mean_population_info(multnomah_county_2016Sen, multnomah_county_precincts_2016Sen))

Total (Voting) Population: 386684	Average Population per Precinct: 3421.9823008849557


State Level Population Information

In [35]:
print(mean_population_info(ore_2016Sen, set(ore_2016Sen["precinct"].values)))

Total (Voting) Population: 1952477	Average Population per Precinct: 2291.639671361502
