# 2020 EAC Survey
2020 Election Assistance Commission survey: UOCAVA ballots, email return etc.

See EAC's comprehensive report at https://www.eac.gov/sites/default/files/document_library/files/2020_EAVS_Report_Final_508c.pdf

See column definitions etc at https://www.eac.gov/sites/default/files/Research/2020EAVS.pdf

TODO:
* Look for data on non-UOCAVA non-paper ballot return, e.g. for disabilities, emergencies
* Look at returned vs counted UOCAVA ballots, by method of return
* replace -88.0 with 0 and re-run the analysis
* eliminate UserWarning

E.g. for what do -99, -88 ,-77 etc mean in B11a:

* If the question is not applicable to your state/jurisdiction—for example, if your state does
not have permanent by-mail voters—please enter -88 (negative 88) as the response to
question C2a.
* If the question is applicable to your state but your jurisdiction does not have the data
necessary to answer the question—for example, if your state does have permanent bymail voting but your jurisdiction does not track those data—please enter -99 (negative 99)
as the response to the question.
* *[But what does -77 mean?]*

And for 2nd-to-last row, what is NaN in B12a? Comes from "valid skip" or "data not available" or "Does not apply"? Why don't they clean that out or validate it up-front??

See also 
* https://www.fvap.gov/uploads/FVAP/Reports/FVAP-2020-Report-to-Congress_20210916_FINAL.pdf
* [2019 Annual Report of Washington State Elections](https://www.sos.wa.gov/_assets/elections/research/2019%20elections%20report%20final.pdf)

In [1]:
import pandas as pd
import numpy as np

In [2]:
flag_values = {-99.0, -88.0, -77.0}

In [3]:
def invalid_data(vals):
    """Look for flag_values in the given array. Return True if any array value is flagged as invalid

    >>> invalid_data([0, 1, 2.2])
    False
    >>> invalid_data([-77])
    True
    >>> invalid_data([-3])
    True
    >>> invalid_data([float('nan')])
    True
    """

    return (np.isnan(sum(vals)) or
                any([val < 0 for val in vals]) or
                (set(vals) & flag_values) != set() )

In [4]:
import doctest
doctest.testmod()

TestResults(failed=0, attempted=4)

In [5]:
id_cols = ["FIPSCode", "Jurisdiction_Name", "State_Abbr"]

In [6]:
uocava_cols = ["B9a", "B10a", "B11a", "B12a", "checkB9_12"]

In [7]:
show_cols = id_cols + ["F1a"] + uocava_cols + ["B9_B12Comments"]

## Read in EAC survey data

In [8]:
try:
    df = pd.read_csv("2020_EAVS_for_Public_Release_nolabel_V2.csv")
except OSError:
    df = pd.read_csv("https://www.eac.gov/sites/default/files/EAVS%202020/2020_EAVS_for_Public_Release_nolabel_V2.csv")

Do a quick-and-dirty job of replacing some of the data to make more data validation checks work.

TODO: come back and do this in a more principled and careful way.

In [9]:
df = df.replace(-88.0, 0.0)

In [10]:
df = df.replace(-99.0, 0.0)

In [11]:
df = df.replace(-77.0, 0.0)

In [12]:
df = df.replace(float('nan'), 0.0)

In [13]:
# Total minus sum of return methonds should equal zero
df["checkB9_12"] = (df.B9a - df.B10a - df.B11a - df.B12a)

Random fixes for some bad sums

In [14]:
df.loc[df['Jurisdiction_Name'] == "BERNALILLO COUNTY"][show_cols]

Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Abbr,F1a,B9a,B10a,B11a,B12a,checkB9_12,B9_B12Comments
3065,3500100000,BERNALILLO COUNTY,NM,319199.0,2437.0,264.0,2172.0,0.0,1.0,0.0


In [15]:
df.loc[df['Jurisdiction_Name'] == "BERNALILLO COUNTY", "B12a"] += 1.0

In [16]:
df.loc[df['Jurisdiction_Name'] == "PIMA COUNTY"][show_cols]

Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Abbr,F1a,B9a,B10a,B11a,B12a,checkB9_12,B9_B12Comments
79,401900000,PIMA COUNTY,AZ,526214.0,4472.0,459.0,3930.0,67.0,16.0,B9 TOTALS WILL NOT ADD UP TO B9A DUE TO 16 BAL...


In [17]:
df.iloc[79].B9_B12Comments

'B9 TOTALS WILL NOT ADD UP TO B9A DUE TO 16 BALLOTS RECEIVED AFTER THE DEADLINE CANNOT BE BROKEN DOWN BY B9B OR B9C'

In [18]:
df.loc[df['Jurisdiction_Name'] == "PIMA COUNTY", "B12a"] += 16.0

In [19]:
df.loc[df['FIPSCode'] == 2005900000][show_cols]

Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Abbr,F1a,B9a,B10a,B11a,B12a,checkB9_12,B9_B12Comments
1058,2005900000,FRANKLIN COUNTY,KS,12561.0,1509.0,43.0,1468.0,0.0,-2.0,0.0


In [20]:
# FRANKLIN COUNTY, KS
df.loc[df['FIPSCode'] == 2005900000, "B10a"] += 2

In [21]:
# recalculate after above modifications
df["checkB9_12"] = (df.B9a - df.B10a - df.B11a - df.B12a)

In [22]:
df

Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Full,State_Abbr,A1a,A1b,A1c,A1Comments,A2a,A2b,...,F11d_5,F5_F11Comments,F12a,F12b,F12c,F12d,F12e,F12Comments,F13,checkB9_12
0,100100000,AUTAUGA COUNTY,ALABAMA,AL,43695,41088,2607,0.0,0,0.0,...,0.0,0.0,2.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0
1,100300000,BALDWIN COUNTY,ALABAMA,AL,176668,165925,10743,0.0,0,0.0,...,0.0,0.0,2.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0
2,100500000,BARBOUR COUNTY,ALABAMA,AL,17850,16827,1023,0.0,0,0.0,...,0.0,0.0,2.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0
3,100700000,BIBB COUNTY,ALABAMA,AL,15014,14370,644,0.0,0,0.0,...,0.0,0.0,2.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0
4,100900000,BLOUNT COUNTY,ALABAMA,AL,41927,40432,1495,0.0,0,0.0,...,0.0,0.0,2.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6455,5603700000,SWEETWATER COUNTY,WYOMING,WY,18738,18738,0,0.0,3382,2650.0,...,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0
6456,5603900000,TETON COUNTY,WYOMING,WY,16419,16419,0,0.0,1273,564.0,...,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0
6457,5604100000,UINTA COUNTY,WYOMING,WY,10112,10112,0,0.0,2115,1620.0,...,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0
6458,5604300000,WASHAKIE COUNTY,WYOMING,WY,4311,4311,0,0.0,441,441.0,...,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,26.0


In [23]:
df[show_cols]

Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Abbr,F1a,B9a,B10a,B11a,B12a,checkB9_12,B9_B12Comments
0,100100000,AUTAUGA COUNTY,AL,27813.0,57.0,40.0,17.0,0.0,0.0,0.0
1,100300000,BALDWIN COUNTY,AL,110214.0,311.0,173.0,138.0,0.0,0.0,0.0
2,100500000,BARBOUR COUNTY,AL,10560.0,20.0,14.0,6.0,0.0,0.0,0.0
3,100700000,BIBB COUNTY,AL,9630.0,14.0,10.0,3.0,1.0,0.0,0.0
4,100900000,BLOUNT COUNTY,AL,27665.0,33.0,23.0,7.0,3.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...
6455,5603700000,SWEETWATER COUNTY,WY,16698.0,67.0,37.0,27.0,3.0,0.0,FAX
6456,5603900000,TETON COUNTY,WY,14787.0,200.0,16.0,184.0,0.0,0.0,FAX
6457,5604100000,UINTA COUNTY,WY,9459.0,20.0,17.0,3.0,0.0,0.0,FAX
6458,5604300000,WASHAKIE COUNTY,WY,4032.0,26.0,0.0,0.0,0.0,26.0,0.0


In [24]:
print(list(df.columns))

['FIPSCode', 'Jurisdiction_Name', 'State_Full', 'State_Abbr', 'A1a', 'A1b', 'A1c', 'A1Comments', 'A2a', 'A2b', 'A2c', 'A2Comments', 'A3a', 'A3b', 'A3c', 'A3d', 'A3e', 'A3f', 'A3g', 'A3h_Other', 'A3h', 'A3i_Other', 'A3i', 'A3j_Other', 'A3j', 'A3Comments', 'A4a', 'A4b', 'A4c', 'A4d', 'A4e', 'A4f', 'A4g', 'A4h', 'A4i', 'A4j_Other', 'A4j', 'A4k_Other', 'A4k', 'A4l_Other', 'A4l', 'A5a', 'A5b', 'A5c', 'A5d', 'A5e', 'A5f', 'A5g', 'A5h', 'A5i', 'A5j_Other', 'A5j', 'A5k_Other', 'A5k', 'A5l_Other', 'A5l', 'A6a', 'A6b', 'A6c', 'A6d', 'A6e', 'A6f', 'A6g', 'A6h', 'A6i', 'A6j_Other', 'A6j', 'A6k_Other', 'A6k', 'A6l_Other', 'A6l', 'A7a', 'A7b', 'A7c', 'A7d', 'A7e', 'A7f', 'A7g', 'A7h', 'A7i', 'A7j_Other', 'A7j', 'A7k_Other', 'A7k', 'A7l_Other', 'A7l', 'A4_A7Comments', 'A8a', 'A8b', 'A8c', 'A8d', 'A8e', 'A8f_Other', 'A8f', 'A8g_Other', 'A8g', 'A8h_Other', 'A8h', 'A8Comments', 'A9a', 'A9b', 'A9c', 'A9d', 'A9e', 'A9f', 'A9g', 'A9h_Other', 'A9h', 'A9i_Other', 'A9i', 'A9j_Other', 'A9j', 'A9Comments', 'B1a

# Clean data

Validate some of the values were're interested in.

Question B9a: total number of UOCAVA ballots returned

Question B10a: total number of UOCAVA ballots returned via postal service

Question B11a: total number of UOCAVA ballots returned via email

Question B12a: total number of UOCAVA ballots returned via other methods

For B9 thru B12, "a" is used for totals, "b" for uniformed, and "c" for civilian

p.17 of [2020 Election Administration Policy Survey Instrument] (https://www.eac.gov/sites/default/files/Research/2020EAVS.pdf):

For questions B10–B12, divide the total number of UOCAVA
absentee ballots received (as reported in B9) into the following categories of types of voters and
modes of transmission. The amounts should sum to the total provided in B9.
```
Thus:
  The sum of B11b–c should equal B11a
  The sum of B10a, B11a, and B12a should equal B9a
  The sum of B10b, B11b, and B12b should equal B9b
  The sum of B10c, B11c, and B12c should equal B9c
  B16a cannot exceed B11a [rejections?] etc.
```

Note also Question F1a: total number of cast and counted ballots

## Look thru rows of data for invalid data

add *flagged* column to identify rows with any individual "NA" values: not applicable or available or invalid

After rewriting -99 and -88 values, there were still 629 flagged rows here.

Most of those were resolved by rewriting -77.0, whatever that is supposed to mean.

The remainder were resolved by rewriting NaN with 0, assuming most of those are comments.

In [25]:
df["flagged"] = df[["B9a", "B10a", "B11a", "B12a"]].apply(invalid_data, axis=1, raw=True)

In [26]:
df[df.flagged][show_cols]

Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Abbr,F1a,B9a,B10a,B11a,B12a,checkB9_12,B9_B12Comments


## Find rows which are not flagged, but do fail a multi-column validation check.

Total number of returned UOCAVA ballot should equal sum of return methods

In [27]:
badtot = ~df.flagged & ((df.B9a - df.B10a - df.B11a - df.B12a) != 0.0)

Top counties with email or other returns we're still missing in most of the analysis

In [28]:
df[badtot][show_cols].sort_values("B11a", ascending=False).head(10)

Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Abbr,F1a,B9a,B10a,B11a,B12a,checkB9_12,B9_B12Comments
1058,2005900000,FRANKLIN COUNTY,KS,12561.0,1509.0,45.0,1468.0,0.0,-4.0,0.0
1101,2014500000,PAWNEE COUNTY,KS,1460.0,305.0,30.0,273.0,1.0,1.0,0.0
1043,2002900000,CLOUD COUNTY,KS,4279.0,159.0,37.0,117.0,0.0,5.0,0.0
100,503300000,CRAWFORD COUNTY,AR,23920.0,49.0,49.0,49.0,49.0,-98.0,0.0
1042,2002700000,CLAY COUNTY,KS,4221.0,25.0,3.0,24.0,0.0,-2.0,0.0
1056,2005500000,FINNEY COUNTY,KS,11893.0,21.0,2.0,18.0,0.0,1.0,0.0
1088,2011900000,MEADE COUNTY,KS,1847.0,0.0,2.0,12.0,0.0,-14.0,0.0
4103,4903900000,SANPETE COUNTY,UT,13072.0,25.0,9.0,7.0,8.0,1.0,0.0
1038,2001900000,CHAUTAUQUA COUNTY,KS,1653.0,23.0,8.0,5.0,9.0,1.0,0.0
1045,2003300000,COMANCHE COUNTY,KS,932.0,0.0,0.0,3.0,0.0,-3.0,0.0


In [29]:
df[badtot][show_cols].sort_values("B12a", ascending=False).head(10)

Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Abbr,F1a,B9a,B10a,B11a,B12a,checkB9_12,B9_B12Comments
1829,2403100000,MONTGOMERY COUNTY,MD,536455.0,7470.0,5187.0,0.0,2293.0,-10.0,RE B11A-B11C: MD STATE LAW DOES NOT PERMIT UOC...
199,608100000,SAN MATEO COUNTY,CA,380203.0,3425.0,1616.0,0.0,1808.0,1.0,DIFFERENT DATA SETS USED FOR UOCAVA DATA CAUSI...
1816,2400300000,ANNE ARUNDEL COUNTY,MD,311658.0,3134.0,2357.0,0.0,786.0,-9.0,RE B11A-B11C: MD STATE LAW DOES NOT PERMIT UOC...
468,1203100000,DUVAL COUNTY,FL,495840.0,7870.0,7870.0,0.0,727.0,-727.0,0.0
1838,2451000000,BALTIMORE CITY,MD,241424.0,1473.0,868.0,0.0,619.0,-14.0,RE B11A-B11C: MD STATE LAW DOES NOT PERMIT UOC...
1827,2402700000,HOWARD COUNTY,MD,184340.0,1355.0,1006.0,0.0,351.0,-2.0,RE B11A-B11C: MD STATE LAW DOES NOT PERMIT UOC...
1817,2400500000,BALTIMORE COUNTY,MD,418366.0,1820.0,1554.0,0.0,268.0,-2.0,RE B11A-B11C: MD STATE LAW DOES NOT PERMIT UOC...
3473,4014300000,TULSA COUNTY,OK,267804.0,1082.0,863.0,0.0,218.0,1.0,0.0
181,604500000,MENDOCINO COUNTY,CA,44272.0,257.0,80.0,0.0,176.0,1.0,0.0
1826,2402500000,HARFORD COUNTY,MD,149619.0,652.0,478.0,0.0,176.0,-2.0,RE B11A-B11C: MD STATE LAW DOES NOT PERMIT UOC...


Total number of UOCAVA email and other ballots not included in rest of analysis

In [30]:
df[badtot][show_cols].sort_values("B11a", ascending=False).B11a.sum()

1979.0

In [31]:
df[badtot][show_cols].sort_values("B11a", ascending=False).B12a.sum()

8398.0

In [32]:
df[badtot][uocava_cols].sum()

B9a           71431.0
B10a          30845.0
B11a           1979.0
B12a           8398.0
checkB9_12    30209.0
dtype: float64

In [33]:
goodtot = ~df.flagged & ((df.B9a - df.B10a - df.B11a - df.B12a) == 0.0)

In [34]:
df[goodtot][show_cols]

Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Abbr,F1a,B9a,B10a,B11a,B12a,checkB9_12,B9_B12Comments
0,100100000,AUTAUGA COUNTY,AL,27813.0,57.0,40.0,17.0,0.0,0.0,0.0
1,100300000,BALDWIN COUNTY,AL,110214.0,311.0,173.0,138.0,0.0,0.0,0.0
2,100500000,BARBOUR COUNTY,AL,10560.0,20.0,14.0,6.0,0.0,0.0,0.0
3,100700000,BIBB COUNTY,AL,9630.0,14.0,10.0,3.0,1.0,0.0,0.0
4,100900000,BLOUNT COUNTY,AL,27665.0,33.0,23.0,7.0,3.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...
6454,5603500000,SUBLETTE COUNTY,WY,5000.0,15.0,12.0,3.0,0.0,0.0,FAX
6455,5603700000,SWEETWATER COUNTY,WY,16698.0,67.0,37.0,27.0,3.0,0.0,FAX
6456,5603900000,TETON COUNTY,WY,14787.0,200.0,16.0,184.0,0.0,0.0,FAX
6457,5604100000,UINTA COUNTY,WY,9459.0,20.0,17.0,3.0,0.0,0.0,FAX


In [35]:
uocava_email_good = df[goodtot].B11a.sum(); uocava_email_good

208937.0

In [36]:
uocava_email_notgood = df[df.B11a>0].B11a.sum(); uocava_email_notgood

210916.0

In [37]:
uocava_email_notgood - uocava_email_good

1979.0

Show rows filterd out of goodtot, sorted by email returns. Oftne because the checkB9_12 column is not zero.

In [38]:
df[~goodtot][df.B11a>0].sort_values("B11a", ascending=False)[show_cols].head(20)

  df[~goodtot][df.B11a>0].sort_values("B11a", ascending=False)[show_cols].head(20)


Unnamed: 0,FIPSCode,Jurisdiction_Name,State_Abbr,F1a,B9a,B10a,B11a,B12a,checkB9_12,B9_B12Comments
1058,2005900000,FRANKLIN COUNTY,KS,12561.0,1509.0,45.0,1468.0,0.0,-4.0,0.0
1101,2014500000,PAWNEE COUNTY,KS,1460.0,305.0,30.0,273.0,1.0,1.0,0.0
1043,2002900000,CLOUD COUNTY,KS,4279.0,159.0,37.0,117.0,0.0,5.0,0.0
100,503300000,CRAWFORD COUNTY,AR,23920.0,49.0,49.0,49.0,49.0,-98.0,0.0
1042,2002700000,CLAY COUNTY,KS,4221.0,25.0,3.0,24.0,0.0,-2.0,0.0
1056,2005500000,FINNEY COUNTY,KS,11893.0,21.0,2.0,18.0,0.0,1.0,0.0
1088,2011900000,MEADE COUNTY,KS,1847.0,0.0,2.0,12.0,0.0,-14.0,0.0
4103,4903900000,SANPETE COUNTY,UT,13072.0,25.0,9.0,7.0,8.0,1.0,0.0
1038,2001900000,CHAUTAUQUA COUNTY,KS,1653.0,23.0,8.0,5.0,9.0,1.0,0.0
1045,2003300000,COMANCHE COUNTY,KS,932.0,0.0,0.0,3.0,0.0,-3.0,0.0


How many UOCAVA email returns are we missing?

In [39]:
df[~goodtot][df.B11a>0].B11a.sum()

  df[~goodtot][df.B11a>0].B11a.sum()


1979.0

# Analyze counties with ballots returned by email

I'm having trouble understanding where EAC got the numbers in their PDF report.

Here are some excerpts from EAC's reported analysis.

[2020 EAC comprehensive report](https://www.eac.gov/sites/default/files/document_library/files/2020_EAVS_Report_Final_508c.pdf)

p. 177: UOCAVA Ballots Transmitted
In 2020, election offices in the 50 states, five U.S. territories, and the District of Columbia reported
transmitting 1,249,601 ballots to UOCAVA voters.19 Figure 3 shows the number of ballots sent out
from election offices or transmitted for each state. The states colored in dark blue represent the
states that distributed the most ballots to UOCAVA voters. The states colored in light blue are the
states that distributed the fewest ballots to UOCAVA voters.

p. 182
States reported 911,614 regular absentee ballots: 73% of those transmitted to voters (through any
mode) were returned and submitted for counting by UOCAVA voters for the 2020 general election.24
This is a 39% increase over 2016, when 655,844 regular absentee ballots were returned by UOCAVA
voters.25 Figure 7 shows the UOCAVA ballot return totals by state in 2020.

p. 198 Table 3 - UOCAVA ballots transmitted and returned by state.
State
UOCAVA
Ballots
Transmitted
UOCAVA
Ballots
Returned
UOCAVA Ballots Counted UOCAVA Ballots Rejected
Total % of Returned Total % of Returned

U.S. Total 1,249,601 911,614 889,837 97.6 19,060 2.1

p. iv: States reported transmitting more than 1.2 million ballots to UOCAVA voters—a population that
includes members of the uniformed services absent from their voting residence, their eligible family
members, and U.S. citizens living overseas who receive special protections under the federal
UOCAVA law. Of those transmitted ballots, more than 900,000 were returned by voters and nearly
890,000 were counted in the election. 

And on p. 183 we see some numbers which should add up to 100%, but don't:

"Overall, 63.8% of absentee ballots returned and submitted for counting by UOCAVA voters were returned to the election office via postal mail, 37.7% were returned by email, and 16.5% were returned through some other mode (e.g., fax or an online system)."

=> 63.8 + 37.7 + 16.5 = 118.0 but should be 100%, as required by p.17 of [2020 Election Administration Policy Survey Instrument](https://www.eac.gov/sites/default/files/Research/2020EAVS.pdf)] 

The percentages of UOCAVA ballots returned by mode overall were calculated as B10a/B9a x 100 for postal mail and B11a/B9a x 100 for email.

In [40]:
def uocava_methods(df):
    "Return and print UOCAVA ballot data by method of return"

    cast = df.F1a.sum()
    uocava_returns = df.B9a.sum()
    postal_returns = df.B10a.sum()
    email_returns = df.B11a.sum()
    other_returns = df.B12a.sum()
    postal_returns_pct = postal_returns * 100 / uocava_returns
    email_returns_pct = email_returns * 100 / uocava_returns
    other_returns_pct = other_returns * 100 / uocava_returns
    print(f'{postal_returns=:.0f}: {postal_returns_pct:.1f}%  '
             f'{email_returns=:.0f}: {email_returns_pct:.1f}%  '
             f'{other_returns=:.0f}: {other_returns_pct:.1f}%  '
             f'Total: {postal_returns+email_returns+other_returns:.1f}:'
             f' {postal_returns_pct+email_returns_pct+other_returns_pct:.0f}% (should be 100)')
    return [postal_returns, email_returns, other_returns, postal_returns_pct, email_returns_pct, other_returns_pct,
            postal_returns_pct+email_returns_pct+other_returns_pct,
            (email_returns + other_returns) * 100 / cast]
                

Overall, out of those counties with good data, we see these percentages of return by postal, email and other methods:

In [41]:
allr = uocava_methods(df[goodtot]); allr

postal_returns=532624: 63.4%  email_returns=208937: 24.9%  other_returns=98622: 11.7%  Total: 840183.0: 100% (should be 100)


[532624.0,
 208937.0,
 98622.0,
 63.39380825367807,
 24.86803470196374,
 11.738157044358193,
 100.0,
 0.2025071328424642]

Note that that we're ignoring a bunch of UOCAVA ballot returns, mostly those where the numbers don't add up properly.

There are 911,614 total UOCAVA returns, but the "goodtot" set has more like 840183 UOCAVA returns.

But it has nearly all the counties which had some email returns.

In [42]:
print(f'Total analyzed here: {sum(allr[0:3])}, vs {df[df.B9a >0].B9a.sum()}')

Total analyzed here: 840183.0, vs 911614.0


In [43]:
# States plus the territories represented here
states = [ 'AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA',
           'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME',
           'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM',
           'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
           'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY',
           'PR', 'AS', 'GU', 'MP', 'VI']

Confirm that we're not missing any


In [44]:
statesset = df["State_Abbr"].unique()

In [45]:
set(statesset).difference(states)

set()

In [46]:
r = {state: uocava_methods(df[goodtot][df.State_Abbr==state]) for state in states}

  r = {state: uocava_methods(df[goodtot][df.State_Abbr==state]) for state in states}
  postal_returns_pct = postal_returns * 100 / uocava_returns
  email_returns_pct = email_returns * 100 / uocava_returns
  other_returns_pct = other_returns * 100 / uocava_returns
  (email_returns + other_returns) * 100 / cast]


postal_returns=0: nan%  email_returns=0: nan%  other_returns=0: nan%  Total: 0.0: nan% (should be 100)
postal_returns=2713: 52.7%  email_returns=2058: 40.0%  other_returns=373: 7.3%  Total: 5144.0: 100% (should be 100)
postal_returns=1700: 84.1%  email_returns=277: 13.7%  other_returns=45: 2.2%  Total: 2022.0: 100% (should be 100)
postal_returns=2984: 16.1%  email_returns=5661: 30.6%  other_returns=9838: 53.2%  Total: 18483.0: 100% (should be 100)
postal_returns=53931: 57.8%  email_returns=0: 0.0%  other_returns=39432: 42.2%  Total: 93363.0: 100% (should be 100)
postal_returns=8866: 29.9%  email_returns=20286: 68.5%  other_returns=479: 1.6%  Total: 29631.0: 100% (should be 100)
postal_returns=0: nan%  email_returns=0: nan%  other_returns=0: nan%  Total: 0.0: nan% (should be 100)
postal_returns=157: 3.1%  email_returns=4833: 96.9%  other_returns=0: 0.0%  Total: 4990.0: 100% (should be 100)
postal_returns=1045: 43.0%  email_returns=1200: 49.4%  other_returns=184: 7.6%  Total: 2429.0: 100

In [47]:
dfr = df.from_records(r).T

Here we analyze state-by-state results, for "good" (validated or adjusted) results only.

Note that there are 17 states with values for the "email" column over 3,000.

The "other" column (2 and 5) is high (over 3000) only in CA, FL, WA and AZ. 

More work needs to be done on cleaning up the data. Here is a possibly outdated sense of the work done so far.

We were missing about 21711 total email returns, often because one of the other columns is labeled invalid.
After blindly zeroing out all -88 and -99 values, that was down to 8259. It was further reduced by zeroing out all -77 values, despite there being no explanation for that code.

There are three counties with "bad" results which have
over a thousand email UOCAVA returns: Pima County AZ (3930) and Franklin County KS (1468) and Bernalillo County	NM (2172). After making small adjustments to those 3 counties to make the the totals match (but not be quite correct), the missing values are down to 689.

But that all presumably introduces other inaccuracies. TODO: figure out a better route.

Also TODO: bring in more "bad" values when we can make good guesses.


In [48]:
dfr.columns='postal,email,other,postal_pct,email_pct,other_pct,tot_pct,non_paper_pct'.split(',')

In [49]:
dfr.sum()

postal           532624.000000
email            208937.000000
other             98622.000000
postal_pct         2919.709706
email_pct          1800.509870
other_pct           379.780424
tot_pct            5100.000000
non_paper_pct        11.619715
dtype: float64

This should match the postal total above.

In [50]:
df[goodtot].B10a.sum()

532624.0

# Table of states sorted by percentage of UOCAVA email or other (non-paper) returns, out of all ballots cast.

As noted above, this excludes a few thousand , but only about 1979 UOCAVA email returns, and 8398 UOCAVA "other" returns.

In [51]:
dfr.sort_values('non_paper_pct', ascending=False)

Unnamed: 0,postal,email,other,postal_pct,email_pct,other_pct,tot_pct,non_paper_pct
DC,157.0,4833.0,0.0,3.146293,96.853707,0.0,100.0,1.394841
WA,27904.0,19861.0,16867.0,43.17366,30.72936,26.09698,100.0,0.892311
MA,1486.0,23363.0,41.0,5.970269,93.865006,0.164725,100.0,0.639802
CO,8866.0,20286.0,479.0,29.921366,68.462084,1.61655,100.0,0.625337
NM,980.0,5310.0,2.0,15.575334,84.39288,0.031786,100.0,0.572272
ME,1124.0,4577.0,0.0,19.715839,80.284161,0.0,100.0,0.556451
VA,9007.0,24032.0,6.0,27.256771,72.725072,0.018157,100.0,0.535685
HI,629.0,2867.0,0.0,17.991991,82.008009,0.0,100.0,0.525075
NV,844.0,5126.0,1288.0,11.628548,70.625517,17.745936,100.0,0.455617
AZ,2984.0,5661.0,9838.0,16.144565,30.628145,53.22729,100.0,0.453123


States sorted by email returns

In [52]:
dfr.sort_values('email', ascending=False)

Unnamed: 0,postal,email,other,postal_pct,email_pct,other_pct,tot_pct,non_paper_pct
VA,9007.0,24032.0,6.0,27.256771,72.725072,0.018157,100.0,0.535685
MA,1486.0,23363.0,41.0,5.970269,93.865006,0.164725,100.0,0.639802
NC,5273.0,21478.0,51.0,19.673905,80.135811,0.190284,100.0,0.388371
CO,8866.0,20286.0,479.0,29.921366,68.462084,1.61655,100.0,0.625337
WA,27904.0,19861.0,16867.0,43.17366,30.72936,26.09698,100.0,0.892311
MI,4837.0,17375.0,280.0,21.505424,77.249689,1.244887,100.0,0.316437
NJ,2277.0,9212.0,243.0,19.408456,78.520286,2.071258,100.0,0.210361
OR,7042.0,7675.0,2034.0,42.039281,45.81816,12.142559,100.0,0.405196
SC,4711.0,6817.0,1435.0,36.341896,52.588135,11.069968,100.0,0.32696
IN,2227.0,6583.0,4.0,25.266621,74.687996,0.045382,100.0,0.212259


States sorted by other non-paper returns

In [53]:
dfr.sort_values('other', ascending=False)

Unnamed: 0,postal,email,other,postal_pct,email_pct,other_pct,tot_pct,non_paper_pct
CA,53931.0,0.0,39432.0,57.764853,0.0,42.235147,100.0,0.229107
FL,89769.0,46.0,17610.0,83.564347,0.042821,16.392832,100.0,0.167763
WA,27904.0,19861.0,16867.0,43.17366,30.72936,26.09698,100.0,0.892311
AZ,2984.0,5661.0,9838.0,16.144565,30.628145,53.22729,100.0,0.453123
MO,4804.0,3027.0,2990.0,44.395158,27.973385,27.631457,100.0,0.187946
OR,7042.0,7675.0,2034.0,42.039281,45.81816,12.142559,100.0,0.405196
UT,1377.0,2615.0,1643.0,24.436557,46.406389,29.157054,100.0,0.290389
SC,4711.0,6817.0,1435.0,36.341896,52.588135,11.069968,100.0,0.32696
MD,3539.0,0.0,1402.0,71.625177,0.0,28.374823,100.0,0.133073
NV,844.0,5126.0,1288.0,11.628548,70.625517,17.745936,100.0,0.455617
