# VBM

Data comes from the American National Election Studies and can be found [here](https://electionstudies.org/data-center/2020-time-series-study/).

In [23]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import statsmodels.formula.api as sm

In [24]:
data = pd.read_stata('../data_raw/anes_timeseries/anes_timeseries.dta')

### Urban vs. rural

In [25]:
data['V202355'].value_counts()

3. Suburb                                                         2237
4. City                                                           2192
2. Small town                                                     1843
1. Rural area                                                     1115
-6. No post-election interview                                     750
-7. No post-election data, deleted due to incomplete interview      77
-5. Interview breakoff (sufficient partial IW)                      53
-9. Refused                                                         11
-8. Don't know                                                       2
Name: V202355, dtype: int64

Create dummy variables:

In [30]:
data['rural'] = 0
data['town'] = 0
data['suburb'] = 0
data['city'] = 0

for i in range(len(data)):
    if data['V202355'][i] == '1. Rural area':
        data['rural'][i] = 1
    elif data['V202355'][i] == '2. Small town':
        data['town'][i] = 1
    elif data['V202355'][i] == '3. Suburb':
        data['suburb'][i] = 1
    elif data['V202355'][i] == '4. City':
        data['city'][i] = 1
    else:
        data['rural'][i] = np.nan
        data['town'][i] = np.nan
        data['suburb'][i] = np.nan
        data['city'][i] = np.nan

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['city'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['suburb'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['town'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['rural'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in

In [31]:
data[['rural','town','suburb','city']]

Unnamed: 0,rural,town,suburb,city
0,0.0,0.0,0.0,1.0
1,0.0,0.0,1.0,0.0
2,0.0,0.0,1.0,0.0
3,0.0,1.0,0.0,0.0
4,0.0,0.0,0.0,1.0
...,...,...,...,...
8275,0.0,1.0,0.0,0.0
8276,0.0,0.0,0.0,1.0
8277,0.0,0.0,1.0,0.0
8278,1.0,0.0,0.0,0.0


### Time to get to the polling place

In [32]:
data['V202122'].value_counts()

-1. Inapplicable                                                  4100
1. 0-15 minutes                                                   2961
-6. No post-election interview                                     750
2. 16-30 minutes                                                   325
-7. No post-election data, deleted due to incomplete interview      77
3. 31-59 minutes                                                    53
4. 1-2 hours                                                         8
5. More than 2 hours                                                 4
-5. Interview breakoff (sufficient partial IW)                       2
Name: V202122, dtype: int64

Create dummy variables:

In [33]:
data['dist_0-15'] = 0
data['dist_16-30'] = 0
data['dist_31-59'] = 0
data['dist_60-120'] = 0
data['dist_120+'] = 0

for i in range(len(data)):
    if data['V202122'][i] == '1. 0-15 minutes':
        data['dist_0-15'][i] = 1
    elif data['V202122'][i] == '2. 16-30 minutes':
        data['dist_16-30'][i] = 1
    elif data['V202122'][i] == '3. 31-59 minutes':
        data['dist_31-59'][i] = 1
    elif data['V202122'][i] == '4. 1-2 hours':
        data['dist_60-120'][i] = 1
    elif data['V202122'][i] == '5. More than 2 hours':
        data['dist_120+'][i] = 1
    else:
        data['dist_0-15'][i] = np.nan
        data['dist_16-30'][i] = np.nan
        data['dist_31-59'][i] = np.nan
        data['dist_60-120'][i] = np.nan
        data['dist_120+'][i] = np.nan

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['dist_0-15'][i] = np.nan
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['dist_16-30'][i] = np.nan
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['dist_31-59'][i] = np.nan
A value is trying to be set on a 

### How they voted

In [42]:
data['V202117'].value_counts()

-1. Inapplicable                                                  7078
-6. No post-election interview                                     750
2. By mail or absentee ballot                                      242
1. In person, at the polling place                                 101
-7. No post-election data, deleted due to incomplete interview      77
3. Do not know the answer                                           28
4. Rather not answer                                                 3
-9. Refused                                                          1
Name: V202117, dtype: int64

In [44]:
data['vbm'] = np.nan

for i in range(len(data)):
    if data['V202117'][i] == '2. By mail or absentee ballot':
        data['vbm'][i] = 1
    elif data['V202117'][i] in ['1. In person, at the polling place','3. Do not know the answer','4. Rather not answer ']:
        data['vbm'][i] = 0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['vbm'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['vbm'][i] = 0


In [46]:
data['vbm'].value_counts()

1.0    242
0.0    129
Name: vbm, dtype: int64

In [47]:
data['V202118'].value_counts()

1. In-person, at the polling place                                4897
2. By mail or absentee ballot                                     1233
-1. Inapplicable                                                  1045
-6. No post-election interview                                     750
3. This is my first time voting                                    274
-7. No post-election data, deleted due to incomplete interview      77
-5. Interview breakoff (sufficient partial IW)                       2
-8. Don't know                                                       1
-9. Refused                                                          1
Name: V202118, dtype: int64

In [205]:
print('2020   - around ', 129/(129+242),'% voted by mail')
print('Before - around ', 1233/(1233+4897),'% voted by mail')

2020   - around  0.3477088948787062 % voted by mail
Before - around  0.20114192495921696 % voted by mail


### Sex

In [52]:
data['V201600'].value_counts()

2. Female      4450
1. Male        3763
-9. Refused      67
Name: V201600, dtype: int64

In [54]:
data['sex'] = np.nan

for i in range(len(data)):
    if data['V201600'][i] == '1. Male':
        data['sex'][i] = 1
    elif data['V201600'][i] == '2. Female':
        data['sex'][i] = 0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['sex'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['sex'][i] = 0


In [55]:
data['sex'].value_counts()

0.0    4450
1.0    3763
Name: sex, dtype: int64

### Sexual orientation

In [56]:
data['V201601'].value_counts()

1. Heterosexual or straight                       7609
3. Bisexual                                        235
2. Homosexual or gay (or lesbian)                  199
-9. Refused                                        143
4. Something else {SPECIFY}                         92
-5. Interview breakoff (sufficient partial IW)       2
Name: V201601, dtype: int64

In [59]:
data['sex_orientation'] = np.nan

# for simplicity I created a dummy variable with 0 = straight, 1 = LGBTQ+
for i in range(len(data)):
    if data['V201601'][i] == '1. Heterosexual or straight':
        data['sex_orientation'][i] = 0
    elif data['V201601'][i] in ['3. Bisexual','2. Homosexual or gay (or lesbian)','4. Something else {SPECIFY}']:
        data['sex_orientation'][i] = 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['sex_orientation'][i] = 0
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['sex_orientation'][i] = 1


In [60]:
data['sex_orientation'].value_counts()

0.0    7609
1.0     526
Name: sex_orientation, dtype: int64

### Ethnicity

In [62]:
data['V201549x'].value_counts()

1. White, non-Hispanic                                                    5963
3. Hispanic                                                                762
2. Black, non-Hispanic                                                     726
4. Asian or Native Hawaiian/other Pacific Islander, non-Hispanic alone     284
6. Multiple races, non-Hispanic                                            271
5. Native American/Alaska Native or other race, non-Hispanic alone         172
-9. Refused                                                                 96
-8. Don't know                                                               6
Name: V201549x, dtype: int64

Create dummy variables:

In [63]:
data['white'] = 0
data['hisp'] = 0
data['black'] = 0
data['asian'] = 0
data['native'] = 0

for i in range(len(data)):
    if data['V201549x'][i] == '1. White, non-Hispanic':
        data['white'][i] = 1
    elif data['V201549x'][i] == '3. Hispanic':
        data['hisp'][i] = 1
    elif data['V201549x'][i] == '2. Black, non-Hispanic':
        data['black'][i] = 1
    elif data['V201549x'][i] == '4. Asian or Native Hawaiian/other Pacific Islander, non-Hispanic alone':
        data['asian'][i] = 1
    elif data['V201549x'][i] == '5. Native American/Alaska Native or other race, non-Hispanic alone':
        data['native'][i] = 1
    else:
        data['white'][i] = np.nan
        data['hisp'][i] = np.nan
        data['black'][i] = np.nan
        data['asian'][i] = np.nan
        data['native'][i] = np.nan

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['hisp'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['asian'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['white'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['native'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats i

In [65]:
data[['white','black','hisp','asian','native']]

Unnamed: 0,white,black,hisp,asian,native
0,0.0,0.0,1.0,0.0,0.0
1,0.0,0.0,0.0,1.0,0.0
2,1.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0
4,0.0,0.0,0.0,0.0,1.0
...,...,...,...,...,...
8275,1.0,0.0,0.0,0.0,0.0
8276,1.0,0.0,0.0,0.0,0.0
8277,0.0,0.0,1.0,0.0,0.0
8278,1.0,0.0,0.0,0.0,0.0


### Education

In [70]:
data['V201510'].value_counts()

6. Bachelor's degree (e.g. BA, AB, BS)                                              2055
3. Some college but no degree                                                       1684
2.  High school graduate - High school diploma or equivalent (e.g. GED)             1336
7. Master's degree (e.g. MA, MS, MEng, MEd, MSW, MBA)                               1185
4. Associate degree in college - occupational/vocational                             615
5. Associate degree in college - academic                                            491
8. Professional school degree (e.g. MD, DDS, DVM, LLB, JD)/Doctoral degree (e.g.     407
1. Less than high school                                                             376
95. Other {SPECIFY}                                                                   97
-9. Refused                                                                           33
-8. Don't know                                                                         1
Name: V201510, dtype:

Use highest level of education as proxy for years of education: \
Less than HS          9 yrs \
HS                   13 yrs \
College, no degree   15 yrs \
Associate            15 yrs \
Bachelor's           17 yrs \
Master's             18 yrs \
Doctoral             22 yrs

In [73]:
data['education'] = np.nan

for i in range(len(data)):
    if data['V201510'][i] == '1. Less than high school':
        data['education'][i] = 9
    if data['V201510'][i] == '2.  High school graduate - High school diploma or equivalent (e.g. GED)':
        data['education'][i] = 13
    if data['V201510'][i] in ['3. Some college but no degree','4. Associate degree in college - occupational/vocational','5. Associate degree in college - academic']:
        data['education'][i] = 15
    if data['V201510'][i] == "6. Bachelor's degree (e.g. BA, AB, BS)":
        data['education'][i] = 17
    if data['V201510'][i] == "7. Master's degree (e.g. MA, MS, MEng, MEd, MSW, MBA)":
        data['education'][i] = 18
    if data['V201510'][i].startswith('8'):
        data['education'][i] = 22

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['education'][i] = 17
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['education'][i] = 15
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['education'][i] = 13
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['education'][i] = 22
A value is trying to be set on a copy of a slice from a DataFram

In [74]:
data['education'].value_counts()

15.0    2790
17.0    2055
13.0    1336
18.0    1185
22.0     407
9.0      376
Name: education, dtype: int64

### Religion

In [79]:
data['V201435'].value_counts()

1. Protestant                                                2113
2. Roman Catholic                                            1640
11. Something else                                           1555
12. Nothing in particular                                    1475
10. Agnostic                                                  456
9. Atheist                                                    340
5. Jewish                                                     188
3. Orthodox Christian (such as Greek or Russian Orthodox)     154
4. Latter-Day Saints (LDS)                                    113
-9. Refused                                                    81
7. Buddhist                                                    74
6. Muslim                                                      48
8. Hindu                                                       41
-8. Don't know                                                  2
Name: V201435, dtype: int64

In [92]:
data['rel_chr'] = 0
data['rel_jew'] = 0
data['rel_mus'] = 0
data['rel_ath'] = 0

for i in range(len(data)):
    if data['V201435'][i].startswith(('1.','2.','3.')):
        data['rel_chr'][i] = 1
    elif data['V201435'][i].startswith('5.'):
        data['rel_jew'][i] = 1
    elif data['V201435'][i].startswith('6.'):
        data['rel_mus'][i] = 1
    elif data['V201435'][i].startswith(('9.','10.')):
        data['rel_ath'][i] = 1
    else:
        data['rel_chr'][i] = np.nan
        data['rel_jew'][i] = np.nan
        data['rel_mus'][i] = np.nan
        data['rel_ath'][i] = np.nan

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['rel_chr'][i] = np.nan
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['rel_jew'][i] = np.nan
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['rel_mus'][i] = np.nan
A value is trying to be set on a copy of 

In [101]:
# make sure it worked
sum(data['rel_jew'].dropna())

188.0

### Income

In [122]:
data['V201617x'].value_counts()

1. Under $9,999                                   720
-9. Refused                                       583
10. $50,000-59,999                                546
17. $100,000-109,999                              506
22. $250,000 or more                              449
15. $80,000-89,999                                426
21. $175,000-249,999                              415
20. $150,000-174,999                              404
19. $125,000-149,999                              347
18. $110,000-124,999                              343
6. $30,000-34,999                                 327
11. $60,000-64,999                                325
4. $20,000-24,999                                 324
8. $40,000-44,999                                 321
16. $90,000-99,999                                304
2. $10,000-14,999                                 282
13. $70,000-74,999                                264
5. $25,000-29,999                                 263
7. $35,000-39,999           

In order to not have too many dummy variables, I will take the lowest number of the range that was indicated in the survey as their income. It would be better if we had the exact income for each person (especially for the super rich), but this is the closest we can get from the data at hand. The one big problem is with the peopl earning less than USD 9,999, nevertheless I indicated them as USD 0.

In [158]:
data['income'] = np.nan

for i in range(len(data)):
    try:
        temp = data['V201617x'][i].split('$')
        temp = temp[1].split('-')[0].replace(',','')
        if temp.endswith('e'):
            temp = temp[0:6]
        if temp.startswith('99'):
            temp = 0
        data['income'][i] = int(temp)
    except IndexError:
        continue

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['income'][i] = int(temp)


In [159]:
data['income'].value_counts()

0.0         720
50000.0     546
100000.0    506
250000.0    449
80000.0     426
175000.0    415
150000.0    404
125000.0    347
110000.0    343
30000.0     327
60000.0     325
20000.0     324
40000.0     321
90000.0     304
10000.0     282
70000.0     264
25000.0     263
35000.0     243
75000.0     233
45000.0     230
15000.0     211
65000.0     182
Name: income, dtype: int64

### Candidate voted for

In [163]:
# only looking at the people that actually voted
data['vote'] = np.nan

# 0 = Trump, 1 = Biden
for i in range(len(data)):
    if data['V202073'][i].startswith('1'):
        data['vote'][i] = 1
    elif data['V202073'][i].startswith('2'):
        data['vote'][i] = 0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['vote'][i] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['vote'][i] = 0


In [165]:
data['vote'].value_counts()

1.0    3270
0.0    2463
Name: vote, dtype: int64

However, apparently none of the people that indicated how they voted (`vbm`) also mentioned who they voted for. Thus, a regression isn't possible. Hence I also took the variable indicating who they prefer.

### Participant ID

In [160]:
data['id'] = np.nan

for i in range(len(data)):
    data['id'] = data['V200001']

## Create data set with the new variables

In [169]:
vbm = data[['id','vote','sex','sex_orientation','vbm','income','education','white','black','hisp','asian','native','rural','town','suburb','city','dist_0-15','dist_16-30','dist_31-59','dist_60-120','dist_120+','rel_chr','rel_jew','rel_mus','rel_ath']]
vbm.to_csv('../data_clean/vbm.csv')
vbm

Unnamed: 0,id,vote,sex,sex_orientation,vbm,income,education,white,black,hisp,...,city,dist_0-15,dist_16-30,dist_31-59,dist_60-120,dist_120+,rel_chr,rel_jew,rel_mus,rel_ath
0,200015.0,,1.0,0.0,,175000.0,17.0,0.0,0.0,1.0,...,1.0,,,,,,,,,
1,200022.0,,0.0,0.0,,70000.0,15.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,,,,
2,200039.0,1.0,0.0,0.0,,100000.0,13.0,1.0,0.0,0.0,...,0.0,,,,,,,,,
3,200046.0,1.0,1.0,0.0,,35000.0,15.0,0.0,0.0,0.0,...,0.0,,,,,,1.0,0.0,0.0,0.0
4,200053.0,0.0,1.0,0.0,,250000.0,22.0,0.0,0.0,0.0,...,1.0,1.0,0.0,0.0,0.0,0.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8275,535315.0,,0.0,0.0,,40000.0,15.0,1.0,0.0,0.0,...,0.0,,,,,,,,,
8276,535360.0,0.0,0.0,0.0,,125000.0,17.0,1.0,0.0,0.0,...,1.0,1.0,0.0,0.0,0.0,0.0,,,,
8277,535414.0,1.0,1.0,0.0,,90000.0,13.0,0.0,0.0,1.0,...,0.0,,,,,,1.0,0.0,0.0,0.0
8278,535421.0,,0.0,0.0,0.0,75000.0,15.0,1.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


## Regression analysis

The analysis is done in the form of a linear probability model (LPM) in a separate R script (`vbm_reg.R`). 

## Work schedule

In [40]:
data['V202120g'].value_counts()

0. Not mentioned                                                  6229
-1. Inapplicable                                                  1045
-6. No post-election interview                                     750
1. Mentioned                                                       175
-7. No post-election data, deleted due to incomplete interview      77
-5. Interview breakoff (sufficient partial IW)                       2
-9. Refused                                                          2
Name: V202120g, dtype: int64

In [211]:
s = round(175/(6229+175),5)
print('Share of people that encountered problems with their work schedule: ', s,'%')

Share of people that encountered problems with their work schedule:  0.02733 %


6404

## Analyzing the people that voted by mail

Unfortunately, all the people that did vote by mail did not indicate who they voted for:

In [176]:
vbm[vbm['vbm'] == 1]['vote'].unique()

array([nan])

In [206]:
# RACE

x = vbm[vbm['vbm'] == 1]

tot = len(x) # 242 (ONLY!!!)
s_w = len(x[x['white'] == 1])
s_b = len(x[x['black'] == 1])
s_a = len(x[x['asian'] == 1])
s_h = len(x[x['hisp'] == 1])
s_n = len(x[x['native'] == 1])

print('Share of people voting by mail by race:')
print('White:    ',round(s_w/tot,4))
print('Black:    ',round(s_b/tot,4))
print('Asian:    ',round(s_a/tot,4))
print('Hispanic: ',round(s_h/tot,4))
print('Native:   ',round(s_n/tot,4))
print('Other:    ',round(1-(s_w+s_b+s_h+s_a+s_n)/tot,4))

Share of people voting by mail by race:
White:     0.7066
Black:     0.1116
Asian:     0.0207
Hispanic:  0.0992
Native:    0.0083
Other:     0.0537


In [196]:
# RELIGION

s_chr = len(x[x['rel_chr'] == 1])
s_mus = len(x[x['rel_mus'] == 1])
s_jew = len(x[x['rel_jew'] == 1])
s_ath = len(x[x['rel_ath'] == 1])

tot = s_chr+s_mus+s_jew+s_ath

print('Share of people voting by mail by religion:')
print('Christian: ',round(s_chr/tot,2))
print('Muslim:    ',round(s_mus/tot,2))
print('Jewish:    ',round(s_jew/tot,2))
print('Atheist:   ',round(s_ath/tot,2))

Share of people voting by mail by religion:
Christian:  0.78
Muslim:     0.01
Jewish:     0.06
Atheist:    0.14
Other:      0.0


In [207]:
# SEX

s_m = len(x[x['sex'] == 1])
s_f = len(x[x['sex'] == 0])

print('Share of people voting by mail by sex:')
print('Men:   ',round(s_m/tot,4))
print('Women: ',round(s_f/tot,4))
print('Other: ',round(1-(s_m+s_f)/tot,4))

Share of people voting by mail by sex:
Men:    0.4545
Women:  0.5289
Other:  0.0165


In [208]:
# AREA

s_r = len(x[x['rural'] == 1])
s_t = len(x[x['town'] == 1])
s_s = len(x[x['suburb'] == 1])
s_c = len(x[x['city'] == 1])

tot = s_r+s_t+s_s+s_c

print('Share of people voting by mail by area:')
print('Rural:  ',round(s_r/tot,4))
print('Town:   ',round(s_t/tot,4))
print('Suburb: ',round(s_s/tot,4))
print('City:   ',round(s_c/tot,4))

Share of people voting by mail by area:
Rural:   0.1464
Town:    0.2678
Suburb:  0.2594
City:    0.3264


In [209]:
# DISTANCE

s_1 = len(x[x['dist_0-15'] == 1])
s_2 = len(x[x['dist_16-30'] == 1])
s_3 = len(x[x['dist_31-59'] == 1])
s_4 = len(x[x['dist_60-120'] == 1])
s_5 = len(x[x['dist_120+'] == 1])

tot = s_1+s_2+s_3+s_4+s_5

print('Share of people voting by mail by distance to closest polling station:')
print('0-15 min:  ',round(s_1/tot,4))
print('16-30 min: ',round(s_2/tot,4))
print('30-59 min: ',round(s_3/tot,4))
print('1-2 hrs:   ',round(s_4/tot,4))
print('2+ hrs:    ',round(s_5/tot,4))

Share of people voting by mail by distance to closest polling station:
0-15 min:   0.9444
16-30 min:  0.0556
30-59 min:  0.0
1-2 hrs:    0.0
2+ hrs:     0.0
