# 2011 VA House of Delegates Election

## Election Results

These precinct-level election results come directly from the Virginia Department of Elections, and they require significant cleaning. 

In [1]:
library(sf)
library(ggplot2)
library(dplyr)
library(tibble)
library(magrittr)

df <- read.csv(file = "C:/Users/madie/OneDrive/data/official-VA-2005-2019/2011-general.csv")
head(df, 1)
print(nrow(df))

Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




Unnamed: 0_level_0,CandidateUid,FirstName,MiddleName,LastName,Suffix,TOTAL_VOTES,Party,WriteInVote,LocalityUid,LocalityCode,...,PrecinctName,DistrictUid,DistrictType,DistrictName,OfficeUid,OfficeTitle,ElectionUid,ElectionType,ElectionDate,ElectionName
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<int>,<chr>,<int>,...,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,,,,WRITE IN VOTES,,1,,1,{15B7E141-2D1D-44C2-A50A-AAE021BC9B7D},1,...,# AB - Central Absentee Precinct,,,,{60E9BA28-D184-4DAD-9A44-1F405822F9F4},Commissioner of Revenue,{EB178FD6-875D-4B0D-A295-900A0482F523},General,2011-11-08 00:00:00,2011 November General


[1] 57228


Now, this csv file includes the results by precinct for every single election that took place in Virginia in 2011 at 11, so I have some serious filtering to do. Currently, there are around 57,000 records. 

Filters:
- `DistrictType` = "House of Delegates" -> 7002 records
- `Party` = Democratic or Republican (sorry third parties) -> 3868 records
- `PrecinctName` != "# AB - Central Absentee Precinct" or "## Provisional" -> 3376 records
    - Provisional ballots and absentee ballots aren't assigned a precinct, so I can't use them to measure precinct-level election results

In [2]:
#df <- df[df$DistrictType == "House of Delegates",]
df <- df %>% 
    filter(DistrictType == "House of Delegates") %>%
    filter(Party %in% c("Democratic", "Republican")) %>%
    filter(!(PrecinctName %in% c("# AB - Central Absentee Precinct", "## Provisional")))
print(nrow(df))

[1] 3376


Now I have 3376 records, where each record is one candidate running in one precinct. What I would like to do is produce a pivot table, where:
- index = `PrecinctName`
- columns
    - `G11DHOD` = all votes for Democratic candidates in that precinct
    - `G11RHOD` = all votes for Republican candidates in that precinct

In [3]:
df_votes <- df %>%
    group_by(PrecinctName) %>%
    summarise(G11DHOD = sum(TOTAL_VOTES[Party == "Democratic"]),
              G11RHOD = sum(TOTAL_VOTES[Party == "Republican"])) %>%
    distinct()
write.csv(df_votes, "2011-precinct-results.csv")
print(df_votes)

`summarise()` ungrouping output (override with `.groups` argument)



[90m# A tibble: 2,359 x 3[39m
   PrecinctName                     G11DHOD G11RHOD
   [3m[90m<chr>[39m[23m                              [3m[90m<int>[39m[23m   [3m[90m<int>[39m[23m
[90m 1[39m [90m"[39m001 - ARLINGTON[90m"[39m                    345       0
[90m 2[39m [90m"[39m001 - CENTRAL[90m"[39m                        0     585
[90m 3[39m [90m"[39m001 - CHESAPEAKE[90m"[39m                   157     630
[90m 4[39m [90m"[39m001 - DEAN[90m"[39m                           0    [4m1[24m014
[90m 5[39m [90m"[39m001 - EAST[90m"[39m                           0     307
[90m 6[39m [90m"[39m001 - EAST PRECINCT[90m"[39m                344     231
[90m 7[39m [90m"[39m001 - EAST WARD[90m"[39m                      0     514
[90m 8[39m [90m"[39m001 - EMANUEL A. M. E. CHURCH [90m"[39m     296       0
[90m 9[39m [90m"[39m001 - FIRST[90m"[39m                          0     210
[90m10[39m [90m"[39m001 - FIRST WARD[90m"[39m         

Ok, so now I've calculated the votes for the respective candidates by precinct. The next step will be to add the population and voting-age population by precinct, using the IPUMS. 

## Matching to shapefiles and demographic data

I'm going to put a pause on aquiring the most accurate demographic data for each year, since that is more of a "nice to have," and instead focus on adding in the shapefiles. For that, I'm using the "VA_precincts" file prepared by MGGG, since it already matches demographic data to precincts. 

To pair them together, I've noticed that the field `precinct` in "VA_precincts" and the text component of `PrecinctName` seem to match up one-to-one. 

The first thing I need to do is split up the current field called `PrecinctName` into `precinctID` and `precinct`. 

In [4]:
library(tidyverse)

df_votes <- df_votes[order(df_votes$PrecinctName),] %>%
    separate(col = PrecinctName, sep = " - ", into = c("precinctID", "precinct"))
head(df_votes)

-- [1mAttaching packages[22m ------------------------------------------------------------------------------- tidyverse 1.3.0 --

[32mv[39m [34mtidyr  [39m 1.1.2     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 1.4.0     [32mv[39m [34mforcats[39m 0.5.0
[32mv[39m [34mpurrr  [39m 0.3.4     

-- [1mConflicts[22m ---------------------------------------------------------------------------------- tidyverse_conflicts() --
[31mx[39m [34mtidyr[39m::[32mextract()[39m   masks [34mmagrittr[39m::extract()
[31mx[39m [34mdplyr[39m::[32mfilter()[39m    masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m       masks [34mstats[39m::lag()
[31mx[39m [34mpurrr[39m::[32mset_names()[39m masks [34mmagrittr[39m::set_names()

"Expected 2 pieces. Additional pieces discarded in 9 rows [427, 786, 910, 1136, 1486, 1550, 1563, 2229, 2297]."
"Expected 2 pieces. Missing pieces filled with `NA` in 7 rows [94, 240, 245, 810, 949, 1428, 

precinctID,precinct,G11DHOD,G11RHOD
<chr>,<chr>,<int>,<int>
1,ARLINGTON,345,0
1,CENTRAL,0,585
1,CHESAPEAKE,157,630
1,DEAN,0,1014
1,EAST,0,307
1,EAST PRECINCT,344,231


In [5]:
df_votes <- df_votes[order(df_votes$precinct),]
head(df_votes)

precinctID,precinct,G11DHOD,G11RHOD
<chr>,<chr>,<int>,<int>
601,EAST LEBANON,480,0
203,WEST,0,376
101,1A,194,544
201,2A,228,653
301,3A,189,196
302,3B,89,68


In [6]:
df_votes$precinct <- str_trim(df_votes$precinct, side = "both")
head(df_votes)

precinctID,precinct,G11DHOD,G11RHOD
<chr>,<chr>,<int>,<int>
601,EAST LEBANON,480,0
203,WEST,0,376
101,1A,194,544
201,2A,228,653
301,3A,189,196
302,3B,89,68


In [7]:
df_votes <- df_votes[order(df_votes$precinct),]
head(df_votes, 10)

precinctID,precinct,G11DHOD,G11RHOD
<chr>,<chr>,<int>,<int>
101,1A,194,544
201,2A,228,653
301,3A,189,196
302,3B,89,68
401,4A,253,711
501,5A,216,753
101,ABBS VALLEY,119,68
101,ABERDEEN,574,0
22,ABINGDON,567,0
601,ACCOMAC,640,0


In [8]:
df_shp <- st_read("C:/Users/madie/OneDrive/data/VA-2017/VA_precincts/VA_precincts.shp")
head(df_shp)

Reading layer `VA_precincts' from data source `C:\Users\madie\OneDrive\data\VA-2017\VA_precincts\VA_precincts.shp' using driver `ESRI Shapefile'
Simple feature collection with 2439 features and 56 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -373531.2 ymin: 60026.37 xmax: 380257.5 ymax: 385298.6
projected CRS:  Lambert_Conformal_Conic


Unnamed: 0_level_0,precinct,locality,loc_prec,district,G18DHOR,G18DSEN,G18OHOR,G18OSEN,G18RHOR,G18RSEN,geometry,...,ASIANVAP,NHPIVAP,OTHERVAP,X2MOREVAP,CD_12,CD_16,HDIST_11,HDIST_REM,SENDIST,geometry
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,...,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<MULTIPOLYGON [m]>,Unnamed: 22_level_1
1,Glenkirk,Prince William County,Prince William County Glenkirk,Congressional District 1,980.0,1044.0,0.0,32.0,950.0,856.0,MULTIPOLYGON (((163199.1 30...,...,401,1.0,4,41,1,1,13,13,13,MULTIPOLYGON (((163199.1 30...
2,Buckland Mills,Prince William County,Prince William County Buckland Mills,Congressional District 1,938.0,978.0,0.0,31.0,766.0,701.0,MULTIPOLYGON (((162078.8 30...,...,449,1.190457e-07,6,73,1,1,13,13,13,MULTIPOLYGON (((162078.8 30...
3,Limestone,Prince William County,Prince William County Limestone,Congressional District 1,1471.0,1562.0,0.0,55.0,1232.0,1079.0,MULTIPOLYGON (((163554 3082...,...,369,1.828088e-08,9,59,1,1,13,13,13,MULTIPOLYGON (((163554 3082...
4,Mullen,Prince William County,Prince William County Mullen,Congressional District 1,1312.0,1333.0,0.0,40.0,344.0,293.0,MULTIPOLYGON (((171765.7 31...,...,343,1.0,15,124,1,1,13,13,13,MULTIPOLYGON (((171765.7 31...
5,Sudley,Prince William County,Prince William County Sudley,Congressional District 1,727.0,737.0,0.0,44.0,520.0,467.0,MULTIPOLYGON (((174200.2 31...,...,134,2.0,8,49,1,1,13,13,29,MULTIPOLYGON (((174200.2 31...
6,Ben Lomond,Prince William County,Prince William County Ben Lomond,Congressional District 1,1131.0,1156.0,0.0,45.0,449.0,386.0,MULTIPOLYGON (((171651 3097...,...,479,2.0,7,114,1,1,13,13,29,MULTIPOLYGON (((171651 3097...


Since I'm planning on matching the fields called `precinct` in the two different data frames, I need to make the column in `df_shp` match `df_votes` by making it all caps. 

In [9]:
df_shp$precinct = toupper(df_shp$precinct)
head(df_shp)

Unnamed: 0_level_0,precinct,locality,loc_prec,district,G18DHOR,G18DSEN,G18OHOR,G18OSEN,G18RHOR,G18RSEN,geometry,...,ASIANVAP,NHPIVAP,OTHERVAP,X2MOREVAP,CD_12,CD_16,HDIST_11,HDIST_REM,SENDIST,geometry
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,...,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<MULTIPOLYGON [m]>,Unnamed: 22_level_1
1,GLENKIRK,Prince William County,Prince William County Glenkirk,Congressional District 1,980.0,1044.0,0.0,32.0,950.0,856.0,MULTIPOLYGON (((163199.1 30...,...,401,1.0,4,41,1,1,13,13,13,MULTIPOLYGON (((163199.1 30...
2,BUCKLAND MILLS,Prince William County,Prince William County Buckland Mills,Congressional District 1,938.0,978.0,0.0,31.0,766.0,701.0,MULTIPOLYGON (((162078.8 30...,...,449,1.190457e-07,6,73,1,1,13,13,13,MULTIPOLYGON (((162078.8 30...
3,LIMESTONE,Prince William County,Prince William County Limestone,Congressional District 1,1471.0,1562.0,0.0,55.0,1232.0,1079.0,MULTIPOLYGON (((163554 3082...,...,369,1.828088e-08,9,59,1,1,13,13,13,MULTIPOLYGON (((163554 3082...
4,MULLEN,Prince William County,Prince William County Mullen,Congressional District 1,1312.0,1333.0,0.0,40.0,344.0,293.0,MULTIPOLYGON (((171765.7 31...,...,343,1.0,15,124,1,1,13,13,13,MULTIPOLYGON (((171765.7 31...
5,SUDLEY,Prince William County,Prince William County Sudley,Congressional District 1,727.0,737.0,0.0,44.0,520.0,467.0,MULTIPOLYGON (((174200.2 31...,...,134,2.0,8,49,1,1,13,13,29,MULTIPOLYGON (((174200.2 31...
6,BEN LOMOND,Prince William County,Prince William County Ben Lomond,Congressional District 1,1131.0,1156.0,0.0,45.0,449.0,386.0,MULTIPOLYGON (((171651 3097...,...,479,2.0,7,114,1,1,13,13,29,MULTIPOLYGON (((171651 3097...


In [10]:
# remove precinctID since it's actually not a unique identifier
df_votes <- df_votes %>% subset(select = -precinctID)
head(df_votes)

precinct,G11DHOD,G11RHOD
<chr>,<int>,<int>
1A,194,544
2A,228,653
3A,189,196
3B,89,68
4A,253,711
5A,216,753


In [11]:
# remove all the unnecessary election results cols from df_shp
rem_cols <- names(df_shp) %in% c('G18DHOR','G18DSEN','G18OHOR','G18OSEN','G18RHOR','G18RSEN','G17DGOV','G17DLTG','G17DATG',
              'G17RGOV','G17RLTG','G17RATG','G17OGOV','G16DPRS','G16RPRS','G16OPRS','G16DHOR','G16RHOR',
              'G16OHOR')
df_shp <- df_shp[!rem_cols]
colnames(df_shp)

In [12]:
df_net <- inner_join(df_votes, df_shp, by = "precinct", suffix = c(".v", ".s"))
head(df_net)

precinct,G11DHOD,G11RHOD,locality,loc_prec,district,G17DHOD,G17RHOD,G17OHOD,TOTPOP,...,ASIANVAP,NHPIVAP,OTHERVAP,X2MOREVAP,CD_12,CD_16,HDIST_11,HDIST_REM,SENDIST,geometry
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,...,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<MULTIPOLYGON [m]>
1A,194,544,Northumberland County,Northumberland County 1A,Congressional District 1,276.0,564.0,0.0,2519,...,4.0,0,0,25,1,1,99,99,4,MULTIPOLYGON (((261705.9 22...
2A,228,653,Northumberland County,Northumberland County 2A,Congressional District 1,313.0,724.0,0.0,2521,...,3.0,0,1,16,1,1,99,99,4,MULTIPOLYGON (((258909.1 21...
3A,189,196,Northumberland County,Northumberland County 3A,Congressional District 1,337.0,198.0,0.0,1544,...,8.0,0,1,9,1,1,99,99,4,MULTIPOLYGON (((271256.2 20...
3B,89,68,Northumberland County,Northumberland County 3B,Congressional District 1,191.0,58.0,0.0,836,...,2.324068e-08,0,0,9,1,1,99,99,4,MULTIPOLYGON (((271763.6 20...
4A,253,711,Northumberland County,Northumberland County 4A,Congressional District 1,418.0,947.0,0.0,2454,...,6.0,0,0,18,1,1,99,99,4,MULTIPOLYGON (((281272.6 19...
5A,216,753,Northumberland County,Northumberland County 5A,Congressional District 1,375.0,886.0,0.0,2456,...,8.0,2,2,6,1,1,99,99,4,MULTIPOLYGON (((284502.4 20...


In [20]:
st_write(df_net, "./2011_shp/VA_precincts_2011.shp")

Writing layer `VA_precincts_2011' to data source `./2011_shp/VA_precincts_2011.shp' using driver `ESRI Shapefile'
Writing 2906 features with 39 fields and geometry type Multi Polygon.
