In [10]:
import pandas as pd

## Importing MFT data

In [11]:
raw_mft_df = pd.read_excel("./data/MFT Returns by Zip 2019.xlsx", dtype={'Zip': object})

In [12]:
raw_mft_df.columns

Index(['Zip', 'City', 'State', 'County', 'Org ID', 'United Way',
       '#e-filed returns', '#EITC returns', 'Average EITC Amount',
       'Average Refund Amount', 'Average Age'],
      dtype='object')

#### Dropping columns with sensistive data

In [13]:
raw_mft_df.drop(axis='columns',
                labels=['#EITC returns', 'Average EITC Amount', 'Average Refund Amount', 'Average Age'],
                inplace=True)

#### Dropping unprocessable rows

In [14]:
print(raw_mft_df[pd.isnull(raw_mft_df['United Way'])].size); print(raw_mft_df.size)

3619
125545


Dropping any row that has a letter in the zip code or doesn't have an associated United Way. Because we will be filling unmatched zip codes with the closest zip code served by the same United Way, we can't use rows without a United Way. Luckily, the rows without a United Way appear to mostly be the rows with nonexistent or malformed zip codes.

This removes 3,619 rows from the data set, out of 125,545 total rows.

In [15]:
mft_df = raw_mft_df[~raw_mft_df.Zip.str.match("[a-zA-Z]") & ~pd.isnull(raw_mft_df['United Way'])].copy()

In [17]:
mft_df.to_csv("./data/mft_returns_2019.csv", index=False)

In [8]:
mft_df.sample(15)

Unnamed: 0,Zip,City,State,County,Org ID,United Way,#e-filed returns
9659,48375,Novi,MI,Oakland,24125F,United Way for Southeastern Michigan,8.0
9272,46978,Royal Center,IN,Cass,16450F,"United Way of Cass County, Inc.",4.0
15717,89436,Sparks,NV,Washoe,30020F,United Way of Northern Nevada and the Sierra,4.0
3897,20019,Washington,DC,District of Columbia,10001F,United Way of the National Capital Area,29.0
11109,56501,Detroit Lakes,MN,Becker,25180F,United Way of Becker County,8.0
10641,54623,Coon Valley,WI,Vernon,53195U,"Great Rivers United Way, Inc.",5.0
2776,14533,Piffard,NY,Livingston,34485F,"United Way of Greater Rochester, Inc.",3.0
10021,49455,Shelby,MI,Oceana,24361F,United Way of the Lakeshore - Oceana County,6.0
9087,46206,Indianapolis,IN,Marion,16340F,United Way of Central Indiana,2.0
4351,23086,King William,VA,King William,50325U,United Way of the Virginia Peninsula,1.0
