# 2011 VA House of Delegates Election

## Election Results

These precinct-level election results come directly from the Virginia Department of Elections, and they require significant cleaning. 

In [15]:
library(sf)
library(ggplot2)
library(dplyr)
library(tibble)
library(magrittr)

df <- read.csv(file = "../data/official-VA-2005-2019/2011-general.csv")
head(df, 1)
print(nrow(df))

Unnamed: 0_level_0,CandidateUid,FirstName,MiddleName,LastName,Suffix,TOTAL_VOTES,Party,WriteInVote,LocalityUid,LocalityCode,...,PrecinctName,DistrictUid,DistrictType,DistrictName,OfficeUid,OfficeTitle,ElectionUid,ElectionType,ElectionDate,ElectionName
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<int>,<chr>,<int>,...,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,,,,WRITE IN VOTES,,1,,1,{15B7E141-2D1D-44C2-A50A-AAE021BC9B7D},1,...,# AB - Central Absentee Precinct,,,,{60E9BA28-D184-4DAD-9A44-1F405822F9F4},Commissioner of Revenue,{EB178FD6-875D-4B0D-A295-900A0482F523},General,2011-11-08 00:00:00,2011 November General


[1] 57228


Now, this csv file includes the results by precinct for every single election that took place in Virginia in 2011 at 11, so I have some serious filtering to do. Currently, there are around 57,000 records. 

Filters:
- `DistrictType` = "House of Delegates" -> 7002 records
- `Party` = Democratic or Republican (sorry third parties) -> 3868 records
- `PrecinctName` != "# AB - Central Absentee Precinct" or "## Provisional" -> 3376 records
    - Provisional ballots and absentee ballots aren't assigned a precinct, so I can't use them to measure precinct-level election results

In [16]:
#df <- df[df$DistrictType == "House of Delegates",]
df <- df %>% 
    filter(DistrictType == "House of Delegates") %>%
    filter(Party %in% c("Democratic", "Republican")) %>%
    filter(!(PrecinctName %in% c("# AB - Central Absentee Precinct", "## Provisional")))
print(nrow(df))

[1] 3376


Now I have 3376 records, where each record is one candidate running in one precinct. What I would like to do is produce a pivot table, where:
- index = `PrecinctName`
- columns
    - `G11DHOD` = all votes for Democratic candidates in that precinct
    - `G11RHOD` = all votes for Republican candidates in that precinct

In [27]:
df_votes <- df %>%
    group_by(PrecinctName) %>%
    summarise(G11DHOD = sum(TOTAL_VOTES[Party == "Democratic"]),
              G11RHOD = sum(TOTAL_VOTES[Party == "Republican"])) %>%
    distinct()
write.csv(df_votes, "../mcmc/va-official-2011/2011-precinct-results.csv")
print(df_votes)

`summarise()` ungrouping output (override with `.groups` argument)



[90m# A tibble: 2,359 x 3[39m
   PrecinctName                     G11DHOD G11RHOD
   [3m[90m<chr>[39m[23m                              [3m[90m<int>[39m[23m   [3m[90m<int>[39m[23m
[90m 1[39m [90m"[39m001 - ARLINGTON[90m"[39m                    345       0
[90m 2[39m [90m"[39m001 - CENTRAL[90m"[39m                        0     585
[90m 3[39m [90m"[39m001 - CHESAPEAKE[90m"[39m                   157     630
[90m 4[39m [90m"[39m001 - DEAN[90m"[39m                           0    [4m1[24m014
[90m 5[39m [90m"[39m001 - EAST[90m"[39m                           0     307
[90m 6[39m [90m"[39m001 - EAST PRECINCT[90m"[39m                344     231
[90m 7[39m [90m"[39m001 - EAST WARD[90m"[39m                      0     514
[90m 8[39m [90m"[39m001 - EMANUEL A. M. E. CHURCH [90m"[39m     296       0
[90m 9[39m [90m"[39m001 - FIRST[90m"[39m                          0     210
[90m10[39m [90m"[39m001 - FIRST WARD[90m"[39m         

Ok, so now I've calculated the votes for the respective candidates by precinct. The next step will be to add the population and voting-age population by precinct, using the IPUMS. 

## Matching to shapefiles and demographic data

I'm going to put a pause on aquiring the most accurate demographic data for each year, since that is more of a "nice to have," and instead focus on adding in the shapefiles. For that, I'm using the "VA_precincts" file prepared by MGGG, since it already matches demographic data to precincts. 

To pair them together, I've noticed that the field `precinct` in "VA_precincts" and the text component of `PrecinctName` seem to match up one-to-one. 

The first thing I need to do is split up the current field called `PrecinctName` into `precinctID` and `precinct`. 

In [28]:
library(tidyverse)

df_votes <- df_votes[order(df_votes$PrecinctName),] %>%
    separate(col = PrecinctName, sep = " - ", into = c("precinctID", "precinct")) %>%
head(df_votes)

"Expected 2 pieces. Additional pieces discarded in 9 rows [427, 786, 910, 1136, 1486, 1550, 1563, 2229, 2297]."
"Expected 2 pieces. Missing pieces filled with `NA` in 7 rows [94, 240, 245, 810, 949, 1428, 2097]."


precinctID,precinct,G11DHOD,G11RHOD
<chr>,<chr>,<int>,<int>
1,ARLINGTON,345,0
1,CENTRAL,0,585
1,CHESAPEAKE,157,630
1,DEAN,0,1014
1,EAST,0,307
1,EAST PRECINCT,344,231


In [31]:
df_votes <- df_votes[order(df_votes$precinct),]
head(df_votes)

precinctID,precinct,G11DHOD,G11RHOD
<chr>,<chr>,<int>,<int>
601,EAST LEBANON,480,0
203,WEST,0,376
101,1A,194,544
201,2A,228,653
301,3A,189,196
302,3B,89,68


In [34]:
df_votes[df_votes$precinct] <- str_trim(df_votes$precinct, side = "both")

ERROR: Error: Can't use NA as column index in a tibble for assignment.
