In [24]:
import pandas as pd
import geopandas as gp
import numpy as np

### Datenimport

Daten von Nikolai:

In [25]:
data = pd.read_csv("data/fake_pop.csv", sep=",")
data = data.rename({"fips": "fips",
                    "winner": "winner", 
                    "population": "population",
                    "shifted": "shifted",
}, axis=1)[["fips", "winner", "shifted", "population"]]
data.head(2)

Unnamed: 0,fips,winner,shifted,population
0,2000,Harris,-6,733406.0
1,1001,Harris,14,60342.0


Geodaten von https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html

In [26]:
geometry = gp.read_file("data/geom.geojson")
geometry["fips"] = pd.to_numeric(geometry.GEOID)
geometry = geometry[["fips", "NAME", "STUSPS" ,"geometry"]].rename({
    "fips": "fips",
    "NAME": "county",
    "STUSPS": "state"
}, axis=1)
geometry.head(2)

Unnamed: 0,fips,county,state,geometry
0,13027,Brooks,GA,"MULTIPOLYGON (((-83.73616 31.03768, -83.57396 ..."
1,31095,Jefferson,NE,"MULTIPOLYGON (((-97.36869 40.35039, -96.91606 ..."


### Fehlende Counties
Differenz zwischen den beiden Datensätzen:

In [27]:
d1 = (geometry
 .merge(data, left_on="fips", right_on="fips", how="outer" ,indicator=True))
d1.head(2)

Unnamed: 0,fips,county,state,geometry,winner,shifted,population,_merge
0,13027,Brooks,GA,"MULTIPOLYGON (((-83.73616 31.03768, -83.57396 ...",Trump,24.0,16245.0,both
1,31095,Jefferson,NE,"MULTIPOLYGON (((-97.36869 40.35039, -96.91606 ...",Trump,19.0,7054.0,both


In [28]:
d1.loc[d1._merge == 'left_only', 'datensatz'] = 'Geodaten'
d1.loc[d1._merge == 'right_only', 'datensatz'] = 'Nikolai'
d1.query("datensatz.isna() == False")[['fips', 'county', 'state', 'datensatz']].to_csv('temp/diff.csv', index=False)

### Puerto Rico kann weg

"Die haben eh keine Rechte" – darum zeigen wir Puerto Rico nicht. Die FIPS-Codes für Puerto Rico liegen zwiuschen 72000 und 72153. Siehe: https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697

In [29]:
d2 = d1[(d1.fips < 72000)|(d1.fips > 72153)].copy()

### Validity checks
Anteile sollten nicht grösser als 100 Prozent sein

In [30]:
assert d1.winner.isin(['Harris', 'Trump', np.nan]).all()
assert d1.shifted.min() > -100
assert d1.shifted.max() < 100

### TopoJSON erzeugen

Wird im nächsten Schritt mit der Vega-Spec kombiniert um die Plots zu erzeugen.
Das geht nicht direkt mit Geopandas.

Braucht die geo2topo-binary. Die kriegt man indem man das Topojson-Package installiert: https://github.com/topojson/topojson

In [31]:
d2[d2.geometry.isna() == False].to_file("temp/windofchange.geojson", driver='GeoJSON')

In [32]:
!geo2topo temp/windofchange.geojson>temp/windofchange.topo.json

### Dataframe für State-Detailansicht speichern

In [33]:
d2.to_pickle('temp/windofchange.pkl')