# Voter Enrollment

The purpose of this notebook is to figure out registered voter totals for each City Council district in New York City for the Get to Know Your City Council District news app, available here: https://projects.thecity.nyc/new-york-city-council-district/district-1/
<br>
<br>
For each district, the app shows the number of registered voters, the percentage of voters that align with the state's major parties, and turnout for the 2021 mayoral primary and general elections. 
<br>
<br>
The data that you'll need:
- November 2023 voter enrollment counts. 

### Files included in 2023 Know Your District:
- Voter enrollments by borough as of February 2023: https://elections.ny.gov/enrollment-county?f%5B0%5D=filter_term%3A36
    - Columns: 'county', 'status', 'dem', 'rep', 'con', 'wor', 'oth', 'blank', 'total', 'pct_dem', 'pct_rep', 'pct_con', 'pct_wor', 'pct_oth', 'pct_blank'
- Total voters enrollments (active+ inactive) by election districts as of February 2023: https://elections.ny.gov/enrollment-election-district?f%5B0%5D=filter_term%3A126
    - Columns: 'ed', 'cd', 'dem', 'rep', 'con', 'wor', 'oth', 'blank', 'total'
- Total active voters enrollments by election districts as of Feb 2023 in geojson format
    - Columns: 'ed', 'cd', 'dem', 'rep', 'con', 'wor', 'oth', 'blank', 'total', 'geometry'
### Files to include in 2025 Know Your District:
- DONE: Voter enrollments by borough as of February 2025: https://elections.ny.gov/enrollment-county?f%5B0%5D=filter_term%3A596
- DONE: Total voters enrollments (active+ inactive) by election districts as of February 2025: https://elections.ny.gov/enrollment-election-district?q=/enrollment-election-district%3Fq%3D/enrollment-election-district%3Ff%5B0%5D%3Dfilter_term%3A126&f%5B0%5D=filter_term%3A601
- DONE: Total active voters enrollments by election districts as of Feb 2023 in geojson format


### Voter enrollments by borough as of February 2025:

In [62]:
# import libraries

import pandas as pd
import csv
import json
import geopandas as gpd

In [2]:
## set display settings

pd.options.display.max_rows = 500

In [3]:
## start with borough enrollment, pull in excel file
## this was already cleaned

boro_enrollment = pd.read_excel("../input/voter_enrollment/voters_boro_feb25.xlsx")

In [4]:
## take a peak at the data

boro_enrollment.head()

Unnamed: 0,county,status,dem,rep,con,wor,oth,blank,total
0,Richmond,Active,119569,101396,4111,1136,7522,78998,312732
1,Richmond,InActive,7883,6006,228,64,575,4485,19241
2,Richmond,Total,127452,107402,4339,1200,8097,83483,331973
3,Bronx,Active,509008,53768,3352,3375,10232,141854,721589
4,Bronx,InActive,40166,3062,228,255,967,10075,54753


In [5]:
## find the percentage of voters in each borough that are enrolled under a particular party
## assign and create new columns to store the data

boro_enrollment["pct_dem"] = (boro_enrollment["dem"]/boro_enrollment["total"])*100
boro_enrollment["pct_rep"] = (boro_enrollment["rep"]/boro_enrollment["total"])*100
boro_enrollment["pct_con"] = (boro_enrollment["con"]/boro_enrollment["total"])*100
boro_enrollment["pct_wor"] = (boro_enrollment["wor"]/boro_enrollment["total"])*100
boro_enrollment["pct_oth"] = (boro_enrollment["oth"]/boro_enrollment["total"])*100
boro_enrollment["pct_blank"] = (boro_enrollment["blank"]/boro_enrollment["total"])*100

In [6]:
## take a peak

boro_enrollment.head(10)

Unnamed: 0,county,status,dem,rep,con,wor,oth,blank,total,pct_dem,pct_rep,pct_con,pct_wor,pct_oth,pct_blank
0,Richmond,Active,119569,101396,4111,1136,7522,78998,312732,38.233695,32.422649,1.314544,0.36325,2.405254,25.260607
1,Richmond,InActive,7883,6006,228,64,575,4485,19241,40.969804,31.214594,1.18497,0.332623,2.98841,23.309599
2,Richmond,Total,127452,107402,4339,1200,8097,83483,331973,38.392279,32.352631,1.307034,0.361475,2.439054,25.147527
3,Bronx,Active,509008,53768,3352,3375,10232,141854,721589,70.539878,7.451333,0.46453,0.467718,1.417982,19.658559
4,Bronx,InActive,40166,3062,228,255,967,10075,54753,73.358537,5.592388,0.416416,0.465728,1.766113,18.400818
5,Bronx,Total,549174,56830,3580,3630,11199,151929,776342,70.738669,7.320227,0.461137,0.467577,1.442534,19.569855
6,Kings,Active,1014572,140687,4722,7687,20234,294585,1482487,68.43716,9.489931,0.318519,0.518521,1.364869,19.871001
7,Kings,InActive,83138,8867,318,543,2049,21342,116257,71.512253,7.627068,0.273532,0.467069,1.762475,18.357604
8,Kings,Total,1097710,149554,5040,8230,22283,315927,1598744,68.660774,9.354468,0.315247,0.514779,1.393782,19.76095
9,New York,Active,691185,74910,2019,3080,14339,200382,985915,70.105942,7.598018,0.204784,0.3124,1.454385,20.32447


In [7]:
## save as a csv file in the output folder for this project

boro_enrollment.to_csv("../output/voters/boro_enrollment_25.csv")

### Total voters enrollments (active+ inactive) by election districts as of February 2025: 

In [8]:
## now work on the election district level data, pull in the excel sheets

richmond = pd.read_excel("../input/voter_enrollment/richmonded_feb25.xlsx")
bronx = pd.read_excel("../input/voter_enrollment/bronxed_feb25.xlsx")
kings = pd.read_excel("../input/voter_enrollment/kingsed_feb25.xlsx")
ny = pd.read_excel("../input/voter_enrollment/new-yorked_feb25.xlsx")
queens = pd.read_excel("../input/voter_enrollment/queensed_feb25.xlsx")

In [9]:
## add those variables to a list, name it boros

boros = [richmond, bronx, kings, ny, queens]

In [10]:
## concat them together

combined_boros = pd.concat(boros, ignore_index = True)

In [11]:
## take a peak

combined_boros.head()

Unnamed: 0,county,ed,status,dem,rep,con,wor,oth,blank,total
0,Richmond,61001,Active,971,146,20,10,27,420,1594
1,Richmond,61001,Inactive,47,10,2,0,3,27,89
2,Richmond,61001,Total,1018,156,22,10,30,447,1683
3,Richmond,61002,Active,930,116,2,7,29,419,1503
4,Richmond,61002,Inactive,74,2,0,1,4,27,108


In [12]:
## change dtype of the ed column

combined_boros['ed'] = combined_boros['ed'].astype('object')

In [49]:
## import crosswalk

ed_cd_crosswalk = gpd.read_file("../input/crosswalks/cd23_ed25_crosswalk.geojson")

In [50]:
## take a peak

ed_cd_crosswalk.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 4345 entries, 0 to 4344
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   ed        4345 non-null   object  
 1   cd        4345 non-null   object  
 2   geometry  4345 non-null   geometry
dtypes: geometry(1), object(2)
memory usage: 102.0+ KB


In [51]:
ed_cd_crosswalk.tail()

Unnamed: 0,ed,cd,geometry
4340,53020,34,"POLYGON ((1003515.745 193865.686, 1003587.174 ..."
4341,56045,36,"POLYGON ((1001545.493 187375.761, 1001613.711 ..."
4342,56056,36,"POLYGON ((1004060.701 187208.051, 1003249.209 ..."
4343,56046,36,"POLYGON ((1000597.559 188323.768, 1000637.815 ..."
4344,56047,36,"POLYGON ((1002357.519 187498.176, 1001598.965 ..."


In [52]:
## checking for and removing whitespace
combined_boros['ed'] = combined_boros['ed'].astype(str).str.strip().str.replace('\xa0', '', regex=True)
ed_cd_crosswalk['ed'] = ed_cd_crosswalk['ed'].astype(str).str.strip().str.replace('\xa0', '', regex=True)

In [53]:
## merge the dfs together on the election district column

merged = combined_boros.merge(ed_cd_crosswalk[['ed','cd','geometry']],
                  on = 'ed',
                  how = 'left'
                  )

In [54]:
merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12168 entries, 0 to 12167
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   county    12168 non-null  object  
 1   ed        12168 non-null  object  
 2   status    12168 non-null  object  
 3   dem       12168 non-null  int64   
 4   rep       12168 non-null  int64   
 5   con       12168 non-null  int64   
 6   wor       12168 non-null  int64   
 7   oth       12168 non-null  int64   
 8   blank     12168 non-null  int64   
 9   total     12168 non-null  int64   
 10  cd        12142 non-null  object  
 11  geometry  12142 non-null  geometry
dtypes: geometry(1), int64(7), object(4)
memory usage: 1.1+ MB


In [55]:
## filter for all voters

voters_feb25 = merged[merged["status"] == "Total"]
voters_feb25

Unnamed: 0,county,ed,status,dem,rep,con,wor,oth,blank,total,cd,geometry
2,Richmond,61001,Total,1018,156,22,10,30,447,1683,49,"POLYGON ((961044.549 162903.268, 961299.053 16..."
5,Richmond,61002,Total,1004,118,2,8,33,446,1611,49,"POLYGON ((962506.514 163551.747, 962540.681 16..."
8,Richmond,61003,Total,1301,78,7,8,31,352,1777,49,"POLYGON ((962072.995 164016.225, 962021.691 16..."
11,Richmond,61004,Total,1232,102,12,7,29,342,1724,49,"POLYGON ((961047.583 165071.59, 961318.847 165..."
14,Richmond,61005,Total,628,262,11,2,32,344,1279,49,"POLYGON ((961202.635 168348.543, 961058.02 168..."
...,...,...,...,...,...,...,...,...,...,...,...,...
12155,Queens,40043,Total,585,229,10,1,19,317,1161,19,"POLYGON ((1037021.517 219311.494, 1037282.045 ..."
12158,Queens,40044,Total,282,82,4,0,5,164,537,19,"POLYGON ((1035290.536 219952.178, 1035332.232 ..."
12161,Queens,40045,Total,498,205,7,4,26,257,997,19,"POLYGON ((1037685.564 220698.673, 1037649.803 ..."
12164,Queens,40046,Total,234,116,5,2,12,112,481,19,"POLYGON ((1039401.04 219576.486, 1039349.901 2..."


In [None]:
## group by city council district
grouped_enrollment = voters_feb25.groupby("cd")[["dem","rep","con","wor","oth","blank"]].sum().reset_index()

In [59]:
## take a peak
grouped_enrollment.head()

Unnamed: 0,cd,dem,rep,con,wor,oth,blank
0,1,69308,9839,221,275,1511,26604
1,10,88529,6761,291,450,1417,19651
2,11,65211,8008,408,386,1608,18934
3,12,76135,4440,295,414,1518,16822
4,13,57601,14970,979,378,1777,21791


In [60]:
## write to a csv
grouped_enrollment.to_csv("../output/voters/voters_by_cd_25.csv")

### Total active voters enrollments by election districts as of Feb 2025 in geojson format:

In [None]:
## now move onto working with only active voter data
active_voters = merged[merged["status"] == "Active"]
active_voters

Unnamed: 0,county,ed,status,dem,rep,con,wor,oth,blank,total,cd,geometry
0,Richmond,61001,Active,971,146,20,10,27,420,1594,49,"POLYGON ((961044.549 162903.268, 961299.053 16..."
3,Richmond,61002,Active,930,116,2,7,29,419,1503,49,"POLYGON ((962506.514 163551.747, 962540.681 16..."
6,Richmond,61003,Active,1252,78,5,8,31,343,1717,49,"POLYGON ((962072.995 164016.225, 962021.691 16..."
9,Richmond,61004,Active,1166,97,11,7,28,327,1636,49,"POLYGON ((961047.583 165071.59, 961318.847 165..."
12,Richmond,61005,Active,583,243,11,2,32,328,1199,49,"POLYGON ((961202.635 168348.543, 961058.02 168..."
...,...,...,...,...,...,...,...,...,...,...,...,...
12153,Queens,40043,Active,560,217,10,1,16,302,1106,19,"POLYGON ((1037021.517 219311.494, 1037282.045 ..."
12156,Queens,40044,Active,276,81,4,0,5,156,522,19,"POLYGON ((1035290.536 219952.178, 1035332.232 ..."
12159,Queens,40045,Active,469,199,6,4,25,251,954,19,"POLYGON ((1037685.564 220698.673, 1037649.803 ..."
12162,Queens,40046,Active,219,109,5,2,12,106,453,19,"POLYGON ((1039401.04 219576.486, 1039349.901 2..."


In [None]:
## now save as geodataframe
active_gdf = gpd.GeoDataFrame(active_voters,
                              geometry = active_voters.geometry,
                              crs = 4326)

In [None]:
## write to file
active_gdf.to_file('../output/voters/active_voters_by_district_25.geojson', driver = 'GeoJSON')

### 2023 voters by city council district

In [None]:
## read in voter files
council_richmond = pd.read_excel("../input/voter_enrollment/richmonded_nov23.xlsx", dtype = {"ed": "str"})
council_bronx = pd.read_excel("../input/voter_enrollment/bronxed_nov23.xlsx", dtype = {"ed": "str"})
council_kings = pd.read_excel("../input/voter_enrollment/kingsed_nov23.xlsx", dtype = {"ed": "str"})
council_ny = pd.read_excel("../input/voter_enrollment/newyorked_nov23.xlsx", dtype = {"ed": "str"})
council_queens = pd.read_excel("../input/voter_enrollment/queensed_nov23.xlsx", dtype = {"ed": "str"})

In [77]:
## read in council district file
ed23_cd23_crosswalk = pd.read_csv("../input/crosswalks/ed23-to-cd23-crosswalk.csv", dtype = {'ed':'str'})

In [78]:
## add to a list 
nov23_voters = [council_richmond,council_bronx, council_kings,council_ny,council_queens]

In [79]:
## create a single df

all_voters_nov23 = pd.concat(nov23_voters, ignore_index = True)
all_voters_nov23.head()

Unnamed: 0,county,ed,status,dem,rep,con,wor,oth,blank,total,"NYSVoter Enrollment by Election District, Party Affiliation and Status",Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9
0,Richmond,61001,Active,947.0,128.0,17.0,9.0,30.0,396.0,1527.0,,,,,,,,,,
1,Richmond,61002,Active,879.0,87.0,2.0,6.0,33.0,351.0,1358.0,,,,,,,,,,
2,Richmond,61003,Active,1246.0,61.0,6.0,8.0,33.0,307.0,1661.0,,,,,,,,,,
3,Richmond,61004,Active,1298.0,95.0,10.0,10.0,35.0,368.0,1816.0,,,,,,,,,,
4,Richmond,61005,Active,722.0,277.0,12.0,4.0,42.0,354.0,1411.0,,,,,,,,,,


In [80]:
## merge crosswalk with district data

voters_by_council_district = pd.merge(ed23_cd23_crosswalk,
                                      all_voters_nov23,
                                      left_on = "ed",
                                      right_on = "ed")

In [None]:
voters_by_council_district.loc[:,
                               ["ed",
                                "cd",
                                "status",
                                "total"]]

Unnamed: 0,ed,cd,status,total
0,23001,32,Active,1620.0
1,23002,32,Active,1627.0
2,23003,32,Active,541.0
3,23004,32,Active,1656.0
4,23005,32,Active,1609.0
...,...,...,...,...
3459,26037,19,Active,1193.0
3460,62058,50,Active,655.0
3461,62059,50,Active,1373.0
3462,61019,49,Active,1011.0


In [81]:
voters_by_council_district.to_csv("../output/voters/voter_by_district_23.csv")