## Notes

* should examine some of the LEAST agricultural districts in contrast to the ~60 MOST (which I have).
* To install Geopandas (finally...), you probably need to create a new conda environment and install geopandas there.
    * I used the following commands in Anaconda prompt running as admin, on windows: 
        * "conda install -n base nb_conda_kernels"
        * "conda create -n geopandas_env geopandas ipykernel"
            * "geopandas_env" (in the above line)) is just what I named my new environment, you can name it whatever you want
        * Finally, switch the kernel in your Jupyter notebook

# Summary:

## 

This work follows Bruckner and Ciccone (2011), who found that the negative exogenous economic shocks which followed droughts in sub-Saharan Africa were solid contributors to / predictors of democratic revolutions. 

I attempt to establish a relationship between droughts (again serving as negative exogenous economic shocks) and political change, which might take any of the following forms:
* Increased votes for a specific pro-farmer party (will have to investigate whether democrats or republicans are viewed as pro-farmer.)
* Increased probability of an incumbent losing
* Increased votes for the Democratic party, who generally favor greater redistribution (see: Acemoglu & Robinson)

  
  
 newline

The equation of interest takes the form \begin{equation} Y_{i,t} = \alpha \, * Drought_{i,t} + \beta'X_{i,t} + \phi_{i,t} + \psi_{i,t} + u_{i,t} \end{equation}

NOTE: will need to include (and interact) a dummy term for presidential election or not that year.

where 
* $Y_{i,t} = $ our outcome of interest - either vote share for a particular party; probability of an incumbent losing; etc.
* $\alpha = $ the effect of drought on the outcome of interest
* $X_{i,t} = $ characteristics of the district at time *t*, such as income, farming as a share of GDP, average farm size, etc.
* $\phi_{i,t} = $ district fixed effects
* $\psi_{i,t} = $ time fixed effects
* state fixed effects?

include willingness to elect other party? years since change of power? avg years for incumbent to stay in power?

# Non-Geo Data Import and Cleaning

## Import and Setup

In [3]:
import pandas as pd
import numpy as np

In [4]:
pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_colwidth', 199)
pd.options.display.float_format = '{:.2f}'.format

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

## Create "BASE_YEAR" variable

This is the first year where we have data available for all datasets in the analysis (with a few years for margin of error)

In [25]:
BASE_YEAR = 1998

# currently strictest year limitation comes from the drought dataset 

## Market Value by District Data

Note: Due to data/time limitations, I'm assuming that congressional districts that are currently highly agricultural were also highly agricultural over the period I examine. This is one place to expand my analysis when I have more time

In [5]:
# from 
# https://www.nass.usda.gov/Publications/AgCensus/2012/Online_Resources/Congressional_District_Rankings/ , and
# https://www.nass.usda.gov/Publications/AgCensus/2017/Online_Resources/Congressional_District_Rankings/
farm_census_2012 = pd.read_csv(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Mkt Value of Agr Products Sold\2012.csv")
farm_census_2017 = pd.read_csv(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Mkt Value of Agr Products Sold\2017.csv")

farm_census_2012['year'] = 2012
farm_census_2017['year'] = 2017

# merge 2012 and 2017
farm_census = pd.concat([farm_census_2012, farm_census_2017])

del farm_census_2012, farm_census_2017

### Some Text Cleaning

In [6]:
# upper-case
farm_census['state'] = [x.upper() for x in farm_census['state']]

#remove non-alphabet characters
farm_census['state'] = farm_census['state'].str.replace('\d+', '')
farm_census['state'] = farm_census['state'].str.replace('/', '')

  farm_census['state'] = farm_census['state'].str.replace('\d+', '')


In [7]:
# narrow down districts we can focus the analysis on
usable_districts = {state:set() for state in farm_census['state']}

In [8]:
for state_dist in list(zip(farm_census['state'], farm_census['district'])):
    if state_dist[1] != 'At Large':
        usable_districts[state_dist[0]].add(int(state_dist[1]))
    else:
        usable_districts[state_dist[0]].add(state_dist[1])

## Election Data (House of Rep's)

Include Senate data (same source)? Only catch is it isn't as specific to the agricultural districts. Could still be useful as a baseline.

**TO DO**
* Denote who won each election
    * Create "years in seat" variable
* Figure out whether to use pre-1998 data for incumbency
* Figure out "At Large" stuff
    * Note: "At Large" districts arenare denoted with zeros
* There are "NaNs" in the data (under candidate name)

In [9]:
# from https://electionlab.mit.edu/data 
elec_df = pd.read_csv(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\1976-2018 US House Election Data.csv", engine='python')

### Drop Unecessary Data Features and Observations

In [10]:
# #drop if before 1998
# elec_df = elec_df[elec_df['year'] >= 1998]

#drop useless columns (one unique value and/or irrelevant info)
elec_df.drop(['office', 'mode', 'version'], axis=1, inplace=True)

#drop if not general election (other possibilites are primary or blank). This only drops ~90/30000 obs
elec_df = elec_df[elec_df['stage']=='gen']

# drop runoffs - only 8 obs. Note that ~9k / 30k obs are "NA" under "runoff"... data description doesn't say what this means
elec_df = elec_df[elec_df['runoff']!='TRUE']

### Denote presidential election years

In [11]:
elec_df['DPres'] = elec_df['year'] % 4 == 0

### Denote election observations corresponding to *agricultural* districts

In [12]:
# DENOTE WHETHER OR NOT IS A BIG AGRICULTURAL DISTRICT (that we have economic/agricultural data for)

# this loop just appends 0 or 1 to the vector "usable_obs" if we have economic/agricultural data for that district
usable_obs = []

for i, row in elec_df.iterrows():
    if row['state'] in usable_districts and row['district'] in usable_districts[row['state']]:
        usable_obs.append(1)
    else:
        usable_obs.append(0)

# here we bring this vector into the df        
elec_df['usable'] = usable_obs

del usable_obs


# COULD drop all observations that don't correspond to "usable" districts. but probably best to keep them in the df for now
# This would leave us with 2974 election observations, before narrowing it down to winners
# elec_df = elec_df[elec_df['usable']==1]

### Who won, and by how much? Incumbent? Year on year change?

end goal - create elec_df variable for time between drought and election or something like that

In [13]:
#voting dict is structured as voting_dict[state][year][district][candidate][votes]

def div_by_zero(x,y):
    if y!=0:
        return x/y
    else:
        return 0

voting_dict = {}
for state in set(elec_df['state']):
    voting_dict[state] = {}
    for year in set(elec_df['year']):
        voting_dict[state][year] = {}



#do NOT combine following two loops. Otherwise, dict entries are overwritten for every new line that's iterated over
for i, row in elec_df.iterrows():
    voting_dict[ row['state'] ][ row['year'] ][ row['district'] ] = {}

for i, row in elec_df.iterrows():
    voting_dict[ row['state'] ][ row['year'] ][ row['district'] ] [row['candidate']] = {'candidatevotes':row['candidatevotes'], 'totalvotes':row['totalvotes'], 'voteshare':div_by_zero(row['candidatevotes'],row['totalvotes'])}

    
    
#find winner of each election
for state in voting_dict:
    for year in voting_dict[state]:
        for district in voting_dict[state][year]:
            for candidate in voting_dict[state][year][district]:
                if voting_dict[state][year][district][candidate]['voteshare'] == max([voting_dict[state][year][district][can]['voteshare'] for can in voting_dict[state][year][district]]):
                    voting_dict[state][year][district][candidate]['winner'] = 1
                else:
                    voting_dict[state][year][district][candidate]['winner'] = 0
                    
                    
# find change in voting share from last election. also create "incumbent" field
for state in voting_dict:
    for year in voting_dict[state]:
        for district in voting_dict[state][year]:
            for candidate in voting_dict[state][year][district]:
                try: #here I subtract voteshare this year from voteshare in the last election (2 years prior). if this is their first election, var set to nan
                    voting_dict[state][year][district][candidate]['votesharediff'] = voting_dict[state][year][district][candidate]['voteshare'] - voting_dict[state][year - 2][district][candidate]['voteshare']
                    voting_dict[state][year][district][candidate]['incumbent'] = 1
                except:
                    voting_dict[state][year][district][candidate]['votesharediff'] = np.nan
                    voting_dict[state][year][district][candidate]['incumbent'] = 0
                    
# find total election turnout
for state in voting_dict:
    for year in voting_dict[state]:
        for district in voting_dict[state][year]:
            for candidate in voting_dict[state][year][district]:
                voting_dict[state][year][district][candidate]['total turnout'] = sum([voting_dict[state][year][district][can]['totalvotes'] for can in voting_dict[state][year][district]])



### Bring "voting_dict" info into elec_df

In [14]:
winner = []
incumbent = []
share_diff = []
voteshare = []
totalturnout = []

for i, row in elec_df.iterrows():
    winner.append(voting_dict[row['state']][row['year']][row['district']][row['candidate']]['winner'])
    incumbent.append(voting_dict[row['state']][row['year']][row['district']][row['candidate']]['incumbent'])
    share_diff.append(voting_dict[row['state']][row['year']][row['district']][row['candidate']]['votesharediff'])
    voteshare.append(voting_dict[row['state']][row['year']][row['district']][row['candidate']]['voteshare'])
    totalturnout.append(voting_dict[row['state']][row['year']][row['district']][row['candidate']]['totalvotes'])

    
elec_df['winner'] = winner
elec_df['incumbent'] = incumbent
elec_df['share_diff'] = share_diff
elec_df['voteshare'] = voteshare
elec_df['totalturnout'] = totalturnout

### Crucial to drop non-major parties *after* creating "voting_dict"

Otherwise vote share will be incorrectly totalled

In [15]:
#drop if not Democratic, Republican, or Libertarian candidate
elec_df = elec_df[elec_df['party'].isin(['DEMOCRAT','REPUBLICAN', 'LIBERTARIAN'])]

## Drought Data

In [16]:
# from https://droughtmonitor.unl.edu/Data/DataDownload/ComprehensiveStatistics.aspx
# data description here (under "Excel") https://droughtmonitor.unl.edu/Data/Metadata.aspx
drought_df = pd.read_csv(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Drought Data.csv")

### Create "year" var

In [17]:
drought_df['year'] = drought_df['ValidStart'].astype(str).str[:4].astype(int)

### Drop some redundant fields

In [18]:
drought_df.drop(['MapDate', 'FIPS'], axis=1, inplace=True)

### Take yearly averages per county

In [19]:
drought_df = drought_df.groupby(['State', 'County', 'year']).mean()
drought_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,None,D0,D1,D2,D3,D4,StatisticFormatID
State,County,year,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AK,Aleutians East Borough,2000,99.40,0.60,0.00,0.00,0.00,0.00,1
AK,Aleutians East Borough,2001,100.00,0.00,0.00,0.00,0.00,0.00,1
AK,Aleutians East Borough,2002,100.00,0.00,0.00,0.00,0.00,0.00,1
AK,Aleutians East Borough,2003,100.00,0.00,0.00,0.00,0.00,0.00,1
AK,Aleutians East Borough,2004,100.00,0.00,0.00,0.00,0.00,0.00,1
...,...,...,...,...,...,...,...,...,...
WY,Weston County,2017,30.26,69.74,32.44,1.41,0.00,0.00,1
WY,Weston County,2018,88.22,11.78,0.00,0.00,0.00,0.00,1
WY,Weston County,2019,99.98,0.02,0.00,0.00,0.00,0.00,1
WY,Weston County,2020,40.38,59.62,48.60,1.99,0.00,0.00,1


# Geo Data Import and Cleaning

In [None]:
import geopandas as gpd
import shapely
from shapely.geometry import Polygon
from descartes import PolygonPatch

## US County Shapefiles

from https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html  
under "County"  
filename "cb_2018_us_county_500k.zip"  

In [51]:
county_shp = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\County Shapefiles (2018)\cb_2018_us_county_500k.shp")

In [71]:
county_shp['STATEFP'] = county_shp['STATEFP'].astype('int64')
county_shp['COUNTYFP'] = county_shp['COUNTYFP'].astype('int64')
county_shp['COUNTYNS'] = county_shp['COUNTYNS'].astype('int64')
county_shp['GEOID'] = county_shp['GEOID'].astype('int64')
county_shp['ALAND'] = county_shp['ALAND'].astype('int64')
county_shp['AWATER'] = county_shp['AWATER'].astype('int64')

#drop DC - it doesn't have any congressional rep's
county_shp = county_shp[county_shp['STATEFP'] != 11]

In [72]:
county_shp

Unnamed: 0,STATEFP,COUNTYFP,COUNTYNS,AFFGEOID,GEOID,NAME,LSAD,ALAND,AWATER,geometry
0,21,7,516850,0500000US21007,21007,Ballard,06,639387454,69473325,"POLYGON ((-89.18137 37.04630, -89.17938 37.05301, -89.17572 37.06207, -89.17188 37.06818, -89.16809 37.07422, -89.16703 37.07536, -89.15450 37.08891, -89.15431 37.08900, -89.15129 37.09049, -89.1..."
1,21,17,516855,0500000US21017,21017,Bourbon,06,750439351,4829777,"POLYGON ((-84.44266 38.28324, -84.44114 38.28373, -84.43738 38.28361, -84.43305 38.28030, -84.43015 38.28046, -84.42900 38.27967, -84.42469 38.28112, -84.42159 38.28388, -84.42168 38.28556, -84.4..."
2,21,31,516862,0500000US21031,21031,Butler,06,1103571974,13943044,"POLYGON ((-86.94486 37.07341, -86.94346 37.07484, -86.94291 37.07675, -86.94109 37.07725, -86.94077 37.07851, -86.93844 37.07768, -86.93714 37.07959, -86.93458 37.07855, -86.92776 37.07898, -86.9..."
3,21,65,516879,0500000US21065,21065,Estill,06,655509930,6516335,"POLYGON ((-84.12662 37.64540, -84.12483 37.64613, -84.11904 37.64717, -84.12064 37.64887, -84.11896 37.64888, -84.11787 37.64807, -84.11652 37.64890, -84.11582 37.65082, -84.11300 37.65081, -84.1..."
4,21,69,516881,0500000US21069,21069,Fleming,06,902727151,7182793,"POLYGON ((-83.98428 38.44549, -83.98246 38.45003, -83.98282 38.45182, -83.98139 38.45317, -83.97681 38.45433, -83.97610 38.45484, -83.97310 38.45913, -83.97203 38.46154, -83.97026 38.46382, -83.9..."
...,...,...,...,...,...,...,...,...,...,...
3228,31,73,835858,0500000US31073,31073,Gosper,06,1186616237,11831826,"POLYGON ((-100.09510 40.43866, -100.08937 40.43870, -100.08602 40.43895, -100.07574 40.43850, -100.04412 40.43866, -100.03776 40.43854, -100.02843 40.43849, -100.01885 40.43843, -99.98119 40.4382..."
3229,39,75,1074050,0500000US39075,39075,Holmes,06,1094405866,3695230,"POLYGON ((-82.22066 40.66758, -82.19327 40.66751, -82.16155 40.66799, -82.15637 40.66794, -82.15476 40.66804, -82.14784 40.66809, -82.12620 40.66823, -82.10673 40.66812, -82.10677 40.66791, -82.0..."
3230,48,171,1383871,0500000US48171,48171,Gillespie,06,2740719114,9012764,"POLYGON ((-99.30400 30.49983, -99.28234 30.49967, -99.28158 30.49939, -99.07866 30.49849, -99.02748 30.49822, -99.01117 30.49813, -99.00033 30.49808, -98.99575 30.49839, -98.96423 30.49848, -98.8..."
3231,55,79,1581100,0500000US55079,55079,Milwaukee,06,625440563,2455383635,"POLYGON ((-88.06959 42.86726, -88.06959 42.87288, -88.06956 42.89826, -88.06959 42.92195, -88.06959 42.92381, -88.06950 42.92995, -88.06938 42.94453, -88.06929 42.95237, -88.06926 42.95766, -88.0..."


### Convert county_shp['STATEFP'] to state

In [69]:
state_to_fips = pd.read_csv(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\State to FIPS Conversion.csv")

In [58]:
# create a dictionary mapping FIPS codes to US States

fips_to_state_dict = {}

for i,row in state_to_fips.iterrows():
    fips_to_state_dict[row['FIPS']] = row['State']

In [73]:
county_shp['state'] = county_shp['STATEFP'].apply(lambda x: fips_to_state_dict[x])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)


## Congressional District Shapefiles

source: http://cdmaps.polisci.ucla.edu/

In [36]:
congress_2015_2017 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\114th - J 2015 - J 2017\districtShapes\districts114.shp")
congress_2013_2014 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\113th - J 2013 to D 2014\districtShapes\districts113.shp")
congress_2011_2013 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\112th - J 2011 - J 2013\districtShapes\districts112.shp")
congress_2009_2010 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\111th - J 2009 - D 2010\districtShapes\districts111.shp")
congress_2007_2009 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\110th - J 2007 - J 2009\districtShapes\districts110.shp")
congress_2005_2006 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\109th - J 2005 - D 2006\districtShapes\districts109.shp")
congress_2003_2005 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\108th -\districts108.shp")
congress_2001_2002 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\107th - J 2001 - N 2002\districtShapes\districts107.shp")
congress_1999_2000 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\106th - J 1999 - D 2000\districtShapes\districts106.shp")
congress_1997_1998 = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Congressional District Shapefiles\105th - J 1997 - D 1998\districtShapes\districts105.shp")

In [39]:
congress_2001_2002.head()

Unnamed: 0,STATENAME,ID,DISTRICT,STARTCONG,ENDCONG,DISTRICTSI,COUNTY,PAGE,LAW,NOTE,BESTDEC,RNOTE,FROMCOUNTY,LASTCHANGE,FINALNOTE,geometry
0,California,6103107026,26,103,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","POLYGON ((-118.50746 34.33478, -118.50661 34.33528, -118.50542 34.33610, -118.50382 34.33670, -118.49345 34.32981, -118.48020 34.32984, -118.46902 34.32990, -118.46098 34.32997, -118.45822 34.330..."
1,California,6103107029,29,103,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","POLYGON ((-118.35401 34.16121, -118.35311 34.16111, -118.35221 34.15911, -118.35181 34.15821, -118.35161 34.15761, -118.34965 34.15312, -118.34821 34.14961, -118.34651 34.14561, -118.34511 34.142..."
2,California,6103107030,30,103,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","POLYGON ((-118.18398 34.14920, -118.18465 34.14599, -118.18458 34.14579, -118.18401 34.14511, -118.18344 34.14466, -118.18241 34.14383, -118.18201 34.14351, -118.18205 34.14342, -118.18209 34.143..."
3,Florida,12105107013,13,105,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","MULTIPOLYGON (((-82.42332 27.78074, -82.42316 27.78092, -82.42258 27.78147, -82.41886 27.77809, -82.41548 27.77634, -82.40935 27.77478, -82.40597 27.77446, -82.40087 27.77439, -82.40081 27.77720,..."
4,Florida,12105107014,14,105,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","MULTIPOLYGON (((-82.04014 27.03202, -81.99692 27.03157, -81.99170 27.03337, -81.99112 27.03357, -81.98613 27.03529, -81.96578 27.03485, -81.95946 27.03485, -81.95900 27.03494, -81.77186 27.03392,..."


## County boundary changes

In [43]:
county_changes_shp = gpd.read_file(r"C:\Users\mikha\OneDrive\Desktop\Dropbox\MIKHAEL NEW\mikhael school\Grad School\Master's\Term 2 Classes\544\Replication\My Addition\Historical County Shapefiles\US_AtlasHCB_Counties_Gen001\US_HistCounties_Gen001_Shapefile\US_HistCounties_Gen001.shp")

In [45]:
county_changes_shp['year'] = county_changes_shp['START_DATE'].astype(str).str[:4].astype(int)

### Drop all obs for years we don't have data (pre-BASE_YEAR) 

In [47]:
county_changes_shp = county_changes_shp[county_changes_shp['year'] >= BASE_YEAR]

In [50]:
print(f"There are {len(county_changes_shp)} changes to US Counties from {BASE_YEAR} to present")

There are 41 changes to US Counties from 1998 to present


### Still have to do something with the changes in county boundaries

Maybe just drop them? There are very few, should be fine if it isn't in one of our $\approx 60$ agricultural congressional districts

## Merge Congressional and County data over years

make it s.t. county_shp[congress #1xx].loc[county] = list(districts that this county was part of)

OR OTHER WAY AROUND - DENOTE COUNTIES THAT MAKE UP A DISTRICT - THIS COULD MAKE MORE SENSE

End of the day, I'll need a congress-county dictionary (maybe with percentages of overlap - e.g. 90% of montgomery county is in district 8 in 2018)

In [120]:
congress_1997_1998.head()

Unnamed: 0,STATENAME,ID,DISTRICT,STARTCONG,ENDCONG,DISTRICTSI,COUNTY,PAGE,LAW,NOTE,BESTDEC,RNOTE,FROMCOUNTY,LASTCHANGE,FINALNOTE,geometry
0,California,6103107026,26,103,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","POLYGON ((-118.50746 34.33478, -118.50661 34.33528, -118.50542 34.33610, -118.50382 34.33670, -118.49345 34.32981, -118.48020 34.32984, -118.46902 34.32990, -118.46098 34.32997, -118.45822 34.330..."
1,California,6103107029,29,103,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","POLYGON ((-118.35401 34.16121, -118.35311 34.16111, -118.35221 34.15911, -118.35181 34.15821, -118.35161 34.15761, -118.34965 34.15312, -118.34821 34.14961, -118.34651 34.14561, -118.34511 34.142..."
2,California,6103107030,30,103,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","POLYGON ((-118.18398 34.14920, -118.18465 34.14599, -118.18458 34.14579, -118.18401 34.14511, -118.18344 34.14466, -118.18241 34.14383, -118.18201 34.14351, -118.18205 34.14342, -118.18209 34.143..."
3,Florida,12105107013,13,105,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","MULTIPOLYGON (((-82.42332 27.78074, -82.42316 27.78092, -82.42258 27.78147, -82.41886 27.77809, -82.41548 27.77634, -82.40935 27.77478, -82.40597 27.77446, -82.40087 27.77439, -82.40081 27.77720,..."
4,Florida,12105107014,14,105,107,,,,,,,,F,2016-05-29 16:44:10.857626,"{""From US Census website""}","MULTIPOLYGON (((-82.04014 27.03202, -81.99692 27.03157, -81.99170 27.03337, -81.99112 27.03357, -81.98613 27.03529, -81.96578 27.03485, -81.95946 27.03485, -81.95900 27.03494, -81.77186 27.03392,..."


In [None]:
# county_in_district_dict[state][year][district] = list of counties in that district

county_in_district_dict = {}

for state in list(set(congress_1997_1998['STATENAME'])):
    county_in_district_dict[state] = {}
    
    for year in list(set(elec_df['year'])):
        county_in_district_dict[state][year] = {}
        
        for district in merged_district_df[(merged_district_df['state']==state) & (merged_district_df['year']==year)]['district']:
            county_in_district_dict[state][year][district] = {}

            #iterate only over same state in both dfs, otherwise this would probably take forever
            for i1,district in merged_district_df[(merged_district_df['state']==state) & (merged_district_df['year']==year) & (merged_district_df['district']==district)].iterrows(): #doesn't exist yet. will have to merge individual cong. district dfs over the years into one big one. will have to denote years operational
                for i2,county in county_shp[county_shp['state']==state].iterrows():

                    if district['geometry'].intersects(county['geometry']):
                        inter = district['geometry'].intersection(county['geometry']).area
                        
                        temp_pct_area = inter / district['geometry'].area
                        
                        county_in_district_dict[state][district['year']][district['DISTRICT']].append(tuple([county['NAME'], temp_pct_area])) #add name, (county overlap area) / (total district area) as tuple. this is the list of counties that are in a district
                        

## Left off here Feb 25 - next steps:

* create "merged_district_df" (general idea is in comment, line 15, above cell)
* make sure "county_in_district_dict" is running smoothly
* map counties to congressional districts in the elec_df (will probably end up being final df)
     * will need to map drought *and* demographics

# Gameplan

* Get county shapefiles
* Get congressional district shapefiles
* map counties (and county-data I have) to their districts in a dataframe