# Soil Erosion Data
This uses the file "erosion_1910.txt" which can be found in the replication data for [this paper](https://www.aeaweb.org/articles?id=10.1257/aer.102.4.1477). This python code is almost exactly what Hornbeck does in Stata.

In [7]:
import pandas as pd

with open("data_dir.txt") as f:
    data_dir = f.read()

df = pd.read_table(data_dir+"erosion/erosion_1910.txt")

df = df.rename(columns={"icpsrfip":"fips"})

In the original file, there is a few FIPS codes that are not integers, but instead floats with .1 at the end. In the original code, these counties are apparently dropped completely or else not dealt with. Here I decided to correct them, since FIPS codes do not have decimal places. I correct them here:

In [3]:
df['fips'] = df['fips'].apply(lambda x: int(str(x).split(".")[0]))

df = df[df["fips"]!=0]

In [4]:
fips_sum = df.groupby("fips")['erosion_area'].sum().reset_index()
fips_sum.columns = ['fips','area_sum']

id_sum = df.groupby(['fips',"id"])['erosion_area'].sum().reset_index()
id_sum.columns = ['fips','id','area_id_sum']

df = df.merge(id_sum.merge(fips_sum,how='left'),how='left')

df.loc[df.id==1,"erosion_medium"] = df['area_id_sum']/df['area_sum']
df.loc[df.id==2,"erosion_high"] = df['area_id_sum']/df['area_sum']

m1_1 = df.groupby("fips")['erosion_medium'].max()

m1_2 = df.groupby("fips")['erosion_high'].max()

E = pd.concat([m1_1,m1_2],axis=1).fillna(0)

E['m1_0'] = 1 - E['erosion_medium'] - E['erosion_high']

# Giving more intuitive names here.
E.columns = ['erosion_med',"erosion_high","erosion_low"]

E.loc[E.erosion_low.apply(lambda x: abs(x))<.001,"erosion_low"] = 0

E.sort_index()

Unnamed: 0_level_0,erosion_med,erosion_high,erosion_low
fips,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1001,0.912117,0.087883,0.000000
1003,0.240279,0.000000,0.759721
1005,1.000000,0.000000,0.000000
1007,0.471368,0.528632,0.000000
1009,0.936193,0.063807,0.000000
...,...,...,...
56033,0.070024,0.000000,0.929976
56037,0.588679,0.145112,0.266210
56041,0.390708,0.000000,0.609292
56045,0.300947,0.000000,0.699053


In [9]:
E.sort_index().to_csv(data_dir + "clean_data/hornbeck12_erosion.csv")