Note on local government finance data:
- Available here (https://www.census.gov/data/datasets/2016/econ/local/public-use-datasets.html) and historical (https://www.census.gov/programs-surveys/gov-finances/data/historical-data.html).
- Pre-2012 data is in "_IndFin_1967-2012"/

Note on property tax data:
- American Housing Survey has rich items on taxes, but only a 50k housing units sample size (https://www.census.gov/programs-surveys/ahs.html). **Definitely no**.
- American Community Survey has items on taxes too (https://www.census.gov/topics/housing/guidance/topics.html#par_textimage_967571650), althogh it only starts in **2005**. It has a huge (3.5 million) sample size (https://www.census.gov/programs-surveys/acs/data/data-tables.html).
- Quarterly Summary of State & Local Tax Revenue (https://www.census.gov/programs-surveys/qtax.html), if combined with some data source of property value, can let me back out tax rate.
- The literature uses census data.
    - For 2020, not sure if there is tax question. It is not in the Demographic and
Housing Characteristics File Data Dict (https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/complete-tech-docs/demographic-and-housing-characteristics-file-and-demographic-profile/2020census-demographic-and-housing-characteristics-file-and-demographic-profile-techdoc.pdf).
    - 2020 Data (https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/release/about-2020-data-products.html) can be downloaded in bulk here (https://www2.census.gov/programs-surveys/decennial/2020/data/demographic-and-housing-characteristics-file/)
    - Prior data is best listed in the by-decade file (https://www.census.gov/programs-surveys/decennial-census/decade.2020.html#list-tab-693908974)
- There is taxable property survey before 1992.

In [1]:
import pandas as pd
import numpy as np
import scipy as sp
import os
import dask
import dask.dataframe as dd
import itertools
from itertools import chain
from math import sqrt, floor, ceil, isnan
import multiprocess
import importlib
from importlib import reload
from collections import Counter
from fuzzywuzzy import process, fuzz
import time
import seaborn as sns
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import warnings
warnings.filterwarnings("error")

pd.options.display.max_columns = 500
pd.options.display.max_rows = 1000
pd.options.display.max_colwidth = 400

# To do: Note that GOVS and FIPS codes are different... Need to restrict to one at the time of import
# ID Changed in 2017... from using GOVS to FIPS. Is there a master file for tracking?
# Sample size should be constant from 2012 to 2017
# Maybe use issuer name to match (based on 2012, change ID after 2017 to that of 2012). No sign of changing ID before that
# Summarize feature of data in note

Notes:
- A financial unit can be identified as a name X state X county. There might be multiple cities and school districts within a county. Note that their method of calculating population might not be comparable. The sample can be restricted by applying filters on certain "type code".
- ID changed in 2017, before that GOVS codes are in IDs, while after that FIPS codes are in IDs. I use data from 2012 and 2017 to create a mapping of old to new ID based on names, then change all new ID to old ID. Use these two years as in census years there are more observations.
    - Note that new units for which name did not appear before, ID is the new ID.

# 1. Data in 2012 and Before

In [2]:
# GOVS Codes and state and county names and FIPS code

GOVS_code = pd.read_excel('../RawData/GovFinSurvey/_IndFin_1967-2012/GOVS_to_FIPS_Codes_State_&_County_2012.xls',
    sheet_name='County Codes')
GOVS_code = GOVS_code[['Unnamed: 2','Unnamed: 3','Unnamed: 5','Unnamed: 8','Unnamed: 11','Unnamed: 12']]
GOVS_code = GOVS_code[~pd.isnull(GOVS_code['Unnamed: 2'])]
GOVS_code = GOVS_code[~pd.isnull(GOVS_code['Unnamed: 3'])]
GOVS_code = GOVS_code[~pd.isnull(GOVS_code['Unnamed: 5'])]
GOVS_code = GOVS_code[~pd.isnull(GOVS_code['Unnamed: 8'])]
GOVS_code = GOVS_code[~pd.isnull(GOVS_code['Unnamed: 11'])]
GOVS_code = GOVS_code[~pd.isnull(GOVS_code['Unnamed: 12'])]
GOVS_code = GOVS_code[GOVS_code['Unnamed: 2']!='code']
GOVS_code = GOVS_code.rename(columns={
    'Unnamed: 2':'State GOVS Code',
    'Unnamed: 3':'County GOVS Code',
    'Unnamed: 5':'FIPS State Code',
    'Unnamed: 8':'FIPS County Code',
    'Unnamed: 11':'StateFull',
    'Unnamed: 12':'County'})
GOVS_code['State GOVS Code'] = GOVS_code['State GOVS Code'].astype(int)
GOVS_code['County GOVS Code'] = GOVS_code['County GOVS Code'].astype(int)
GOVS_code['FIPS State Code'] = GOVS_code['FIPS State Code'].astype(int)
GOVS_code['FIPS County Code'] = GOVS_code['FIPS County Code'].astype(int)

%run -i SCRIPT_us_states.py

us_state_to_abbrev = pd.DataFrame.from_dict(us_state_to_abbrev,orient='index').reset_index()
us_state_to_abbrev.columns = ['StateFull','State']

GOVS_code = GOVS_code.merge(us_state_to_abbrev,on=['StateFull'])

## 1.1 Import data

In [3]:
IndFin67c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin67c.Txt', low_memory=False)
IndFin67b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin67b.Txt', low_memory=False)
IndFin67a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin67a.Txt', low_memory=False)
IndFin70c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin70c.Txt', low_memory=False)
IndFin70b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin70b.Txt', low_memory=False)
IndFin70a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin70a.Txt', low_memory=False)
IndFin71c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin71c.Txt', low_memory=False)
IndFin71b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin71b.Txt', low_memory=False)
IndFin71a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin71a.Txt', low_memory=False)
IndFin72c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin72c.Txt', low_memory=False)
IndFin72b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin72b.Txt', low_memory=False)
IndFin72a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin72a.Txt', low_memory=False)
IndFin73c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin73c.Txt', low_memory=False)
IndFin73b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin73b.Txt', low_memory=False)
IndFin73a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin73a.Txt', low_memory=False)
IndFin74c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin74c.Txt', low_memory=False)
IndFin74b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin74b.Txt', low_memory=False)
IndFin74a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin74a.Txt', low_memory=False)
IndFin75c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin75c.Txt', low_memory=False)
IndFin75b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin75b.Txt', low_memory=False)
IndFin75a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin75a.Txt', low_memory=False)
IndFin76c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin76c.Txt', low_memory=False)
IndFin76b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin76b.Txt', low_memory=False)
IndFin76a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin76a.Txt', low_memory=False)
IndFin77c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin77c.Txt', low_memory=False)
IndFin77b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin77b.Txt', low_memory=False)
IndFin77a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin77a.Txt', low_memory=False)
IndFin78c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin78c.Txt', low_memory=False)
IndFin78b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin78b.Txt', low_memory=False)
IndFin78a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin78a.Txt', low_memory=False)
IndFin79c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin79c.Txt', low_memory=False)
IndFin79b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin79b.Txt', low_memory=False)
IndFin79a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin79a.Txt', low_memory=False)
IndFin80c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin80c.Txt', low_memory=False)
IndFin80b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin80b.Txt', low_memory=False)
IndFin80a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin80a.Txt', low_memory=False)
IndFin81c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin81c.Txt', low_memory=False)
IndFin81b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin81b.Txt', low_memory=False)
IndFin81a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin81a.Txt', low_memory=False)
IndFin82c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin82c.Txt', low_memory=False)
IndFin82b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin82b.Txt', low_memory=False)
IndFin82a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin82a.Txt', low_memory=False)
IndFin83c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin83c.Txt', low_memory=False)
IndFin83b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin83b.Txt', low_memory=False)
IndFin83a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin83a.Txt', low_memory=False)
IndFin84c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin84c.Txt', low_memory=False)
IndFin84b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin84b.Txt', low_memory=False)
IndFin84a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin84a.Txt', low_memory=False)
IndFin85c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin85c.Txt', low_memory=False)
IndFin85b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin85b.Txt', low_memory=False)
IndFin85a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin85a.Txt', low_memory=False)
IndFin86c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin86c.Txt', low_memory=False)
IndFin86b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin86b.Txt', low_memory=False)
IndFin86a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin86a.Txt', low_memory=False)
IndFin87c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin87c.Txt', low_memory=False)
IndFin87b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin87b.Txt', low_memory=False)
IndFin87a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin87a.Txt', low_memory=False)
IndFin88c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin88c.Txt', low_memory=False)
IndFin88b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin88b.Txt', low_memory=False)
IndFin88a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin88a.Txt', low_memory=False)
IndFin89c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin89c.Txt', low_memory=False)
IndFin89b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin89b.Txt', low_memory=False)
IndFin89a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin89a.Txt', low_memory=False)
IndFin90c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin90c.Txt', low_memory=False)
IndFin90b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin90b.Txt', low_memory=False)
IndFin90a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin90a.Txt', low_memory=False)
IndFin91c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin91c.Txt', low_memory=False)
IndFin91b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin91b.Txt', low_memory=False)
IndFin91a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin91a.Txt', low_memory=False)
IndFin92c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin92c.Txt', low_memory=False)
IndFin92b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin92b.Txt', low_memory=False)
IndFin92a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin92a.Txt', low_memory=False)
IndFin93c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin93c.Txt', low_memory=False)
IndFin93b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin93b.Txt', low_memory=False)
IndFin93a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin93a.Txt', low_memory=False)
IndFin94c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin94c.Txt', low_memory=False)
IndFin94b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin94b.Txt', low_memory=False)
IndFin94a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin94a.Txt', low_memory=False)
IndFin95c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin95c.Txt', low_memory=False)
IndFin95b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin95b.Txt', low_memory=False)
IndFin95a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin95a.Txt', low_memory=False)
IndFin96c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin96c.Txt', low_memory=False)
IndFin96b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin96b.Txt', low_memory=False)
IndFin96a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin96a.Txt', low_memory=False)
IndFin97c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin97c.Txt', low_memory=False)
IndFin97b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin97b.Txt', low_memory=False)
IndFin97a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin97a.Txt', low_memory=False)
IndFin98c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin98c.Txt', low_memory=False)
IndFin98b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin98b.Txt', low_memory=False)
IndFin98a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin98a.Txt', low_memory=False)
IndFin99c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin99c.Txt', low_memory=False)
IndFin99b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin99b.Txt', low_memory=False)
IndFin99a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin99a.Txt', low_memory=False)
IndFin00c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin00c.Txt', low_memory=False)
IndFin00b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin00b.Txt', low_memory=False)
IndFin00a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin00a.Txt', low_memory=False)
IndFin01c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin01c.Txt', low_memory=False)
IndFin01b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin01b.Txt', low_memory=False)
IndFin01a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin01a.Txt', low_memory=False)
IndFin02c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin02c.Txt', low_memory=False)
IndFin02b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin02b.Txt', low_memory=False)
IndFin02a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin02a.Txt', low_memory=False)
IndFin03c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin03c.Txt', low_memory=False)
IndFin03b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin03b.Txt', low_memory=False)
IndFin03a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin03a.Txt', low_memory=False)
IndFin04c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin04c.Txt', low_memory=False)
IndFin04b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin04b.Txt', low_memory=False)
IndFin04a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin04a.Txt', low_memory=False)
IndFin05c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin05c.Txt', low_memory=False)
IndFin05b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin05b.Txt', low_memory=False)
IndFin05a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin05a.Txt', low_memory=False)
IndFin06c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin06c.Txt', low_memory=False)
IndFin06b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin06b.Txt', low_memory=False)
IndFin06a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin06a.Txt', low_memory=False)
IndFin08c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin08c.Txt', low_memory=False)
IndFin08b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin08b.Txt', low_memory=False)
IndFin08a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin08a.Txt', low_memory=False)
IndFin09c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin09c.Txt', low_memory=False)
IndFin09b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin09b.Txt', low_memory=False)
IndFin09a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin09a.Txt', low_memory=False)
IndFin10c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin10c.Txt', low_memory=False)
IndFin10b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin10b.Txt', low_memory=False)
IndFin10a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin10a.Txt', low_memory=False)
IndFin11c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin11c.Txt', low_memory=False)
IndFin11b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin11b.Txt', low_memory=False)
IndFin11a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin11a.Txt', low_memory=False)
IndFin12c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin12c.Txt', low_memory=False)
IndFin12b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin12b.Txt', low_memory=False)
IndFin12a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin12a.Txt', low_memory=False)
IndFin07c = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin07c.Txt', low_memory=False)
IndFin07b = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin07b.Txt', low_memory=False)
IndFin07a = pd.read_csv('../RawData/GovFinSurvey/_IndFin_1967-2012/IndFin07a.Txt', low_memory=False)

IndFin_Prior2012_raw = pd.DataFrame()
years = ['67']+[str(year) for year in range(70,100)]+['0'+str(year) for year in range(0,10)]+['10','11','12']
for year in years:
    # Concatenate data horizontally for each year by ID
    IndFin_oneyear = eval('IndFin'+year+'a').merge(eval('IndFin'+year+'b'),on=['Year4','ID','SortCode'])
    IndFin_oneyear = IndFin_oneyear.merge(eval('IndFin'+year+'c'),on=['Year4','ID','SortCode'])
    IndFin_Prior2012_raw = pd.concat([IndFin_Prior2012_raw,IndFin_oneyear])
IndFin_Prior2012_raw = IndFin_Prior2012_raw.reset_index(drop=True)
IndFin_Prior2012_raw['Type Code'] = IndFin_Prior2012_raw['Type Code'].astype(int)

In [4]:
IndFin_Prior2012 = IndFin_Prior2012_raw.copy()

# Note that there are also expenditures, debt issuance, and debt outstandings of sub-categories.
# It is confirmed that 'Total Debt Outstanding' and 'Total Long-Term Debt Out' here are the end-of-period values
IndFin_Prior2012 = IndFin_Prior2012[[
    'ID','Name','Year4','County','State Code','Population','Type Code',
    'Total Revenue','Total IG Revenue',
    'Total Taxes','Property Tax','Total Income Taxes','Individual Income Tax','Corp Net Income Tax',
    'Total Expenditure','Total Interest on Debt',
    'Total Current Oper','Total Capital Outlays','Total Construction',
    'Total Debt Outstanding','Total Long-Term Debt Out','Total LTD Issued',
    'ST Debt-End of Year','Total LTD Out',
    'Fin Admin-Total Exp','Fin Admin-Direct Exp',
    'Total Rev-Own Sources','Total General Charges','Misc General Revenue','Total Utility Revenue',
    'Total Fed IG Revenue','Total State IG Revenue','Tot Local IG Rev',
    'IG Exp-To State Govt','IG Exp-To Local Govts','IG Exp-To Federal Govt',
    ]]
IndFin_Prior2012 = IndFin_Prior2012[~pd.isnull(IndFin_Prior2012['County'])]
IndFin_Prior2012 = IndFin_Prior2012.rename(columns={
    'Total Current Oper':'Total Current Operation',
    'Total Capital Outlays':'Total Capital Outlay'})
IndFin_Prior2012 = IndFin_Prior2012.rename(columns={
    'State Code':'State GOVS Code',
    'County':'County GOVS Code'})
IndFin_Prior2012 = IndFin_Prior2012.copy()
IndFin_Prior2012['State GOVS Code'] = IndFin_Prior2012['State GOVS Code'].astype(int)
IndFin_Prior2012['County GOVS Code'] = IndFin_Prior2012['County GOVS Code'].astype(int)

# Note that for pre-2012 data, enrollment seems to have been imputed into the "Population" field.
# There is no separate column for it

# Merge in county and state name
IndFin_Prior2012 = IndFin_Prior2012.merge(GOVS_code,on=['County GOVS Code','State GOVS Code'])

# Sometimes individual debt items exist but they are not summarized into the appropriate items
IndFin_Prior2012['*Total Debt Outstanding'] = IndFin_Prior2012['ST Debt-End of Year']+IndFin_Prior2012['Total LTD Out']
IndFin_Prior2012['*Total Long-Term Debt Out'] = IndFin_Prior2012['Total LTD Out']

IndFin_Prior2012.loc[pd.isnull(IndFin_Prior2012['Total Debt Outstanding']),'Total Debt Outstanding'] = \
    IndFin_Prior2012['*Total Debt Outstanding'][pd.isnull(IndFin_Prior2012['Total Debt Outstanding'])]
IndFin_Prior2012.loc[pd.isnull(IndFin_Prior2012['Total Long-Term Debt Out']),'Total Long-Term Debt Out'] = \
    IndFin_Prior2012['*Total Long-Term Debt Out'][pd.isnull(IndFin_Prior2012['Total Long-Term Debt Out'])]

# Note that in pre-2012 data, enrollment of school districts is recorded in the field "population", while post 2012 these 
# are separate fields
IndFin_Prior2012['Enrollment'] = None
IndFin_Prior2012.loc[IndFin_Prior2012['Type Code']==5,'Enrollment'] = \
    IndFin_Prior2012[IndFin_Prior2012['Type Code']==5]['Population']
IndFin_Prior2012.loc[IndFin_Prior2012['Type Code']==5,'Population'] = None


# 2. Data Post 2012

In [5]:

input_list = [
    ['2013_Individual_Unit_file/2013FinEstDAT_10162019modp_pu.txt',
    [14,3,12,4,1],
    '2013_Individual_Unit_file/Fin_GID_2013.txt',
    [14,64,35,2,3,5,9,2,7,2,2,2,4,2],
    2013],
    \
    ['2014-individual-unit-file/2014FinEstDAT_10162019modp_pu.txt',
    [14,3,12,4,1],
    '2014-individual-unit-file/Fin_GID_2014.txt',
    [14,64,35,2,3,5,9,2,7,2,2,2,4,2],
    2014],
    \
    ['2015-individual-unit-file/2015FinEstDAT_10162019modp_pu.txt',
    [14,3,12,4,1],
    '2015-individual-unit-file/Fin_GID_2015.txt',
    [14,64,35,2,3,5,9,2,7,2,2,2,4,2],
    2015],
    \
    ['2016_Individual_Unit_file/2016FinEstDAT_10162019modp_pu.txt',
    [14,3,12,4,1],
    '2016_Individual_Unit_file/Fin_GID_2016.txt',
    [14,64,35,2,3,5,9,2,7,2,2,2,4,2],
    2016],
    \
    ['2017_Individual_Unit_file/2017FinEstDAT_06122023modp_pu.txt',
    [12,3,12,4,1],
    '2017_Individual_Unit_file/Fin_PID_2017.txt',
    [12,64,35,5,9,2,7,2,2,2,4,2],
    2017],
    \
    ['2018_Individual_Unit_file/2018FinEstDAT_06122023modp_pu.txt',
    [12,3,12,4,1],
    '2018_Individual_Unit_file/Fin_PID_2018.txt',
    [12,64,35,5,9,2,7,2,2,2,4,2],
    2018],
    \
    ['2019_Individual_Unit_file/2019FinEstDAT_06122023modp_pu.txt',
    [12,3,12,4,1],
    '2019_Individual_Unit_file/Fin_PID_2019.txt',
    [12,64,35,5,9,2,7,2,2,2,4,2],
    2019],
    \
    ['2020_Individual_Unit_file/2020FinEstDAT_06122023modp_pu.txt',
    [12,3,12,4,1],
    '2020_Individual_Unit_file/Fin_PID_2020.txt',
    [12,64,35,5,9,2,7,2,2,2,4,2],
    2020],
    \
    ['2021_Individual_Unit_file/2021FinEstDAT_06122023modp_pu.txt',
    [12,3,12,4,1],
    '2021_Individual_Unit_file/Fin_PID_2021.txt',
    [12,64,35,5,9,2,7,2,2,2,4,2],
    2021],
    ]

data_allyears = pd.DataFrame()

for inputs in input_list:
    
    amount_data = inputs[0]
    amount_format = inputs[1]
    id_data = inputs[2]
    id_format = inputs[3]
    year = inputs[4]
    
    #----------------------------#
    # Import amount by item data #
    #----------------------------#
    
    column_widths = amount_format
    column_names = ['ID','item code','amount','Year4','impute_flag']
    data = pd.read_fwf('../RawData/GovFinSurvey/'+amount_data, 
        widths=column_widths, header=None, names=column_names,dtype={'ID':'str'})
    data = data.pivot(index='ID',columns='item code',values='amount')
    for column in data.columns:
        data.loc[pd.isnull(data[column]),column] = 0
    data = data.reset_index()
    data = data.rename(columns={'ID':'*ID'})
    # Handle if items are inconsistent
    for column in ['X01','X02','X05','X08','X11','X12','Y04']:
        if column not in list(data.columns):
            data[column] = 0
    
    #-----------------#
    # Aggregate items #
    #-----------------#
    
    # Note that 
    # (1) Data post 2012 do not have aggregate items ready to use like data before 2012. To obtain comparable data entries, I do
    # the aggregation myself.
    # (2) For the "X" items (employee retirement), some are revenues, some are expenditures, and some are stocks. The same is true for
    # "Y" (unemployment benefit) terms
    
    # Revenue
    
    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='A']+\
        [item for item in list(data.columns) if item[:1]=='B']+\
        [item for item in list(data.columns) if item[:1]=='C']+\
        [item for item in list(data.columns) if item[:1]=='D']+\
        [item for item in list(data.columns) if item[:1]=='T']+\
        [item for item in list(data.columns) if item[:1]=='U']+\
        ['X01','X02','X05','X08','Y01','Y02','Y04','Y11','Y12','Y51','Y52']
    data['*Total Revenue'] = data[columns_to_sum].sum(axis=1)
    
    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='B']
    data['*Total Fed IG Revenue'] = data[columns_to_sum].sum(axis=1)

    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='C']
    data['*Total State IG Revenue'] = data[columns_to_sum].sum(axis=1)

    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='D']
    data['*Tot Local IG Rev'] = data[columns_to_sum].sum(axis=1)

    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='B']+\
        [item for item in list(data.columns) if item[:1]=='C']+\
        [item for item in list(data.columns) if item[:1]=='D']
    data['*Total IG Revenue'] = data[columns_to_sum].sum(axis=1)
    
    # Taxes
    
    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='T']
    data['*Total Taxes'] = data[columns_to_sum].sum(axis=1)
    
    columns_to_sum = \
        ['T01']
    data['*Property Tax'] = data[columns_to_sum].sum(axis=1)
    
    columns_to_sum = \
        ['T40','T41']
    data['*Total Income Taxes'] = data[columns_to_sum].sum(axis=1)
    
    columns_to_sum = \
        ['T40']
    data['*Individual Income Tax'] = data[columns_to_sum].sum(axis=1)
    
    columns_to_sum = \
        ['T41']
    data['*Corp Net Income Tax'] = data[columns_to_sum].sum(axis=1)
    
    # Expenditure
    
    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='E']+\
        [item for item in list(data.columns) if item[:1]=='F']+\
        [item for item in list(data.columns) if item[:1]=='G']+\
        [item for item in list(data.columns) if item[:1]=='I']+\
        [item for item in list(data.columns) if item[:1]=='J']+\
        [item for item in list(data.columns) if item[:1]=='L']+\
        [item for item in list(data.columns) if item[:1]=='M']+\
        [item for item in list(data.columns) if item[:1]=='Q']+\
        [item for item in list(data.columns) if item[:1]=='S']+\
        ['X11','X12','Y05','Y06','Y14','Y53','Z00']
    data['*Total Expenditure'] = data[columns_to_sum].sum(axis=1)
    
    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='I']
    data['*Total Interest on Debt'] = data[columns_to_sum].sum(axis=1)

    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='E']
    data['*Total Current Operation'] = data[columns_to_sum].sum(axis=1)

    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='F']
    data['*Total Construction'] = data[columns_to_sum].sum(axis=1)

    columns_to_sum = \
        [item for item in list(data.columns) if item[:1]=='G']
    data['*Total Capital Outlay'] = data[columns_to_sum].sum(axis=1)
    
    # Debt balance and issuance
    
    columns_to_sum = \
        ['44T','49U','64V']
    data['*Total Debt Outstanding'] = data[columns_to_sum].sum(axis=1)
    
    columns_to_sum = \
        ['44T','49U']
    data['*Total Long-Term Debt Out'] = data[columns_to_sum].sum(axis=1)
    
    columns_to_sum = \
        ['24T','29U']
    data['*Total LTD Issued'] = data[columns_to_sum].sum(axis=1)

    columns_to_sum = \
        ['64V']
    data['*ST Debt-End of Year'] = data[columns_to_sum].sum(axis=1)

    # Financial administration

    columns_to_sum = \
        ['E23','F23','G23','L23','M23']
    data['*Fin Admin-Total Exp'] = data[columns_to_sum].sum(axis=1)

    columns_to_sum = \
        ['E23','F23','G23']
    data['*Fin Admin-Direct Exp'] = data[columns_to_sum].sum(axis=1)

    
    columns = list(data.columns)
    columns = [item.replace('*','') for item in columns]
    data.columns = columns
    
    data = data[['ID',
        'Total Revenue','Total IG Revenue','Total Fed IG Revenue','Total State IG Revenue','Tot Local IG Rev',
        'Total Taxes','Property Tax','Total Income Taxes','Individual Income Tax','Corp Net Income Tax',
        'Total Expenditure','Total Interest on Debt',
        'Total Current Operation','Total Construction','Total Capital Outlay',
        'Total Debt Outstanding','Total Long-Term Debt Out','Total LTD Issued','ST Debt-End of Year',
        'Fin Admin-Total Exp','Fin Admin-Direct Exp',
        ]]
    
    #-------------------------#
    # Get name and other info #
    #-------------------------#
    
    column_widths = id_format
    column_names = None
    if np.sum(column_widths)==np.sum([14,64,35,2,3,5,9,2,7,2,2,2,4,2]):
        column_names = ['ID','Name','County','FIPS State Code','FIPS County Code','FIPS Place Code',
            'Population','Population_year','Enrollment','Enrollment_year',
            'Function Code','School Level Code','Fiscal Year Ending','Survey Year']
    if np.sum(column_widths)==np.sum([12,64,35,5,9,2,7,2,2,2,4,2]):
        column_names = ['ID','Name','County','FIPS Place Code',
            'Population','Population_year','Enrollment','Enrollment_year',
            'Function Code','School Level Code','Fiscal Year Ending','Survey Year']
    IDdata = pd.read_fwf('../RawData/GovFinSurvey/'+id_data, 
        widths=column_widths, header=None, names=column_names,dtype={'ID':'str'})

    IDdata['Type Code'] = IDdata['ID'].str.slice(2,3)

    if year>=2017:
        IDdata['FIPS State Code'] = IDdata['ID'].str.slice(0,2).astype(int)
        IDdata['FIPS County Code'] = IDdata['ID'].str.slice(3,6).astype(int)
    
    IDdata = IDdata[['ID','Type Code','Name','FIPS State Code','FIPS County Code','County',
        'Population','Population_year','Enrollment','Enrollment_year']]
    data_oneyear = data.merge(IDdata,on=['ID'])
    data_oneyear['Year4'] = year

    data_allyears = pd.concat([data_allyears,data_oneyear])


# 3. Combine and Export Data

In [6]:
# To make format consistent for two datasets, for data post 2012,
# (1) Remove leading zero.
# (2) Remove last five digits if all zero.
# (3) Convert to int.

data_allyears.loc[data_allyears['ID'].str[-5:]=='00000','ID'] = \
    data_allyears[data_allyears['ID'].str[-5:]=='00000']['ID'].str[:-5]
data_allyears['ID'] = data_allyears['ID'].astype(int)

# Combine two datasets
data_allyears = data_allyears.reset_index(drop=True)
IndFin_Prior2012 = IndFin_Prior2012.reset_index(drop=True)
GovFinData = pd.concat([data_allyears,IndFin_Prior2012])
GovFinData = GovFinData[~pd.isnull(GovFinData['FIPS State Code'])]
GovFinData = GovFinData[~pd.isnull(GovFinData['FIPS County Code'])]
GovFinData['FIPS State Code'] = GovFinData['FIPS State Code'].astype(int)
GovFinData['FIPS County Code'] = GovFinData['FIPS County Code'].astype(int)
GovFinData['ID'] = GovFinData['ID'].astype(int)

# Add state name to "GovFinData"
StateFIPSCode = GOVS_code[['FIPS State Code','State']].drop_duplicates()
StateFIPSCode['FIPS State Code'] = StateFIPSCode['FIPS State Code'].astype(int)
GovFinData = GovFinData.drop(columns=['State']).merge(StateFIPSCode,on=['FIPS State Code'])


In [7]:
#-------------------------------#
# Handle change of ID post 2017 #
#-------------------------------#

# ID Changed completely on 2017. Construct a mapping based on name using 2012 and 2017 data
GovFinData_2012 = GovFinData[GovFinData['Year4']==2012]
GovFinData_2017 = GovFinData[GovFinData['Year4']==2017]
GovFinData_2012 = GovFinData_2012[['ID','FIPS State Code','FIPS County Code','Name']]
GovFinData_2012 = GovFinData_2012.rename(columns={'ID':'ID Old'})
GovFinData_2017 = GovFinData_2017[['ID','FIPS State Code','FIPS County Code','Name']]

GovFinData_2017 = GovFinData_2017.merge(GovFinData_2012,on=['FIPS State Code','FIPS County Code','Name'])
GovFinData_2017 = GovFinData_2017.drop(columns=['Name'])
GovFinData_2017['ID New'] = GovFinData_2017['ID']

# In the following merge, "both" are cases where existing entities get new names. "left_only" are cases where
# the new entity appear after 2017
GovFinData = GovFinData.merge(GovFinData_2017,on=['ID','FIPS State Code','FIPS County Code'],
    how='outer',indicator=True)

GovFinData.loc[~pd.isnull(GovFinData['ID Old']),'ID'] = \
    GovFinData[~pd.isnull(GovFinData['ID Old'])]['ID Old']

GovFinData = GovFinData.drop(columns=['_merge'])

# Such duplicates are very rare in the sample
GovFinData = GovFinData.drop_duplicates(['ID','Year4'])

In [8]:
# Create "short-term debt, beginning of year" based on end-of-year variable of the previous year

GovFinData_m1 = GovFinData.copy()
GovFinData_m1['Year4'] = GovFinData_m1['Year4']+1
GovFinData_m1 = GovFinData_m1[['ID','Year4','ST Debt-End of Year']]
GovFinData_m1 = GovFinData_m1.rename(columns={'ST Debt-End of Year':'ST Debt-Beginning of Year'})
GovFinData = GovFinData.merge(GovFinData_m1,on=['ID','Year4'],how='outer',indicator=True)
GovFinData = GovFinData[GovFinData['_merge']!='right_only']
GovFinData = GovFinData.drop(columns=['_merge'])

## 3.1 State Gov Data

In [9]:
# Export state level data
StateGovFinData = GovFinData[(GovFinData['Type Code']==0)|(GovFinData['Type Code']=='0')]
StateGovFinData.to_csv('../CleanData/GovFinSurvey/0G_StateGovFinData.csv')

## 3.2 Local Gov Data

In [10]:
###############
# Import CBSA #
###############

%run -i SCRIPT_us_states.py

# "CSA" is for metropolitan and "CBSA" includes also those micropolitan
CBSAData = pd.read_excel("../RawData/MSA/CBSA.xlsx",skiprows=[0,1])
CBSAData = CBSAData[~pd.isnull(CBSAData['County/County Equivalent'])]

# Add state abbreviations
us_state_to_abbrev = pd.DataFrame.from_dict(us_state_to_abbrev,orient='index').reset_index()
us_state_to_abbrev.columns = ['State Name','State']
CBSAData = CBSAData.rename(columns={'County/County Equivalent':'County'})
CBSAData = CBSAData.merge(us_state_to_abbrev,on='State Name',how='outer',indicator=True)
CBSAData = CBSAData[CBSAData['_merge']=='both'].drop(columns=['_merge'])
# Merge is perfect
CBSAData['County'] = CBSAData['County'].str.upper()
CBSAData['County'] = CBSAData['County'].str.replace(' COUNTY','')
CBSAData['County'] = CBSAData['County'].str.replace(' AND ',' & ')
CBSAData['County'] = CBSAData['County'].str.replace('.','',regex=False)
CBSAData['CSA Code'] = CBSAData['CSA Code'].astype(float)
CBSAData['CBSA Code'] = CBSAData['CBSA Code'].astype(float)

# Merge CBSA into Gov Fin data
GovFinData['County'] = GovFinData['County'].str.upper()
GovFinData['County'] = GovFinData['County'].str.replace(' COUNTY','')
GovFinData['County'] = GovFinData['County'].str.replace(' AND ',' & ')
GovFinData['County'] = GovFinData['County'].str.replace('.','',regex=False)
GovFinData = GovFinData.merge(CBSAData,on=['State','County'])

In [11]:
# Adjust inflation
FPCPITOTLZGUSA = pd.read_csv("../RawData/StLouisFed/FPCPITOTLZGUSA.csv")
FPCPITOTLZGUSA['year'] = FPCPITOTLZGUSA['DATE'].str[:4].astype(int)
FPCPITOTLZGUSA = FPCPITOTLZGUSA.sort_values('year',ascending=False).reset_index(drop=True)
scaler = 1
FPCPITOTLZGUSA['scaler'] = None
for idx,row in FPCPITOTLZGUSA.iterrows():
    if idx==0:
        FPCPITOTLZGUSA.at[idx,'scaler'] = 1
    else:
        scaler = scaler*(FPCPITOTLZGUSA.at[idx-1,'FPCPITOTLZGUSA']/100+1)
        FPCPITOTLZGUSA.at[idx,'scaler'] = scaler
FPCPITOTLZGUSA = FPCPITOTLZGUSA[['scaler','year']]
FPCPITOTLZGUSA = pd.concat([FPCPITOTLZGUSA,pd.DataFrame([{'scaler':1/(1+3.2/100),'year':2023}])])

GovFinData = GovFinData.merge(FPCPITOTLZGUSA.rename(columns={'year':'Year4'}),on='Year4')

GovFinData.to_csv('../CleanData/GovFinSurvey/0G_GovFinData.csv')