In [1]:
import pandas as pd

## Importing MFT data

In [2]:
raw_mft_df = pd.read_excel("./data/MFT Returns by Zip 2019.xlsx", dtype={'Zip': object})

In [3]:
raw_mft_df.columns

Index(['Zip', 'City', 'State', 'County', 'Org ID', 'United Way',
       '#e-filed returns', '#EITC returns', 'Average EITC Amount',
       'Average Refund Amount', 'Average Age'],
      dtype='object')

#### Dropping columns with sensistive data

In [4]:
raw_mft_df.drop(axis='columns',
                labels=['#EITC returns', 'Average EITC Amount', 'Average Refund Amount', 'Average Age'],
                inplace=True)

#### Dropping unprocessable rows

In [5]:
print(raw_mft_df[pd.isnull(raw_mft_df['United Way'])].size); print(raw_mft_df.size)

3619
125545


Dropping any row that has a letter in the zip code or doesn't have an associated United Way. Because we will be filling unmatched zip codes with the closest zip code served by the same United Way, we can't use rows without a United Way. Luckily, the rows without a United Way appear to mostly be the rows with nonexistent or malformed zip codes.

This removes 3,619 rows from the data set, out of 125,545 total rows.

In [6]:
mft_df = raw_mft_df[~raw_mft_df.Zip.str.match("[a-zA-Z]") & ~pd.isnull(raw_mft_df['United Way'])].copy()

In [7]:
mft_df.to_csv("./data/mft_returns_2019.csv", index=False)

In [8]:
mft_df.sample(15)

Unnamed: 0,Zip,City,State,County,Org ID,United Way,#e-filed returns
9160,46409,Gary,IN,Lake,16290F,Lake Area United Way,1.0
15213,85037,Phoenix,AZ,Maricopa,03040F,Valley of the Sun United Way,46.0
8307,42330,Central City,KY,Muhlenberg,19115F,"United Way of the Coalfield, Inc.",1.0
2609,13841,Smithville Flats,NY,Cortland,34400F,"Chenango United Way, Inc.",1.0
5682,29568,Longs,SC,Horry,43145U,"United Way of Horry County, Inc.",5.0
7722,37748,Harriman,TN,Roane,45330F,United Way of Roane County,17.0
10099,49709,Atlanta,MI,Montmorency,24020U,United Way of Northeast Michigan,1.0
5161,27909,Elizabeth City,NC,Pasquotank,35160F,"Albemarle Area United Way, Inc.",9.0
9785,48848,Laingsburg,MI,Shiawassee,24390F,United Way of Genesee County (Shiawassee),6.0
16476,93421,Arroyo Grande,CA,San Luis Obispo,05420F,United Way of San Luis Obispo County,7.0
