Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
.ipynb_checkpoints
inputs
outputs
README.md
zbp_to_zcta.ipynb
zbp_to_zcta_2016.ipynb

README.md

ZIP Code Business Patterns

This script is a Python / Pandas notebook that pulls data from the Census Bureau's ZIP Code Business Patterns dataset and aggregates the data from ZIP Codes to ZCTAs, so the data can be mapped using ZCTA features. It was designed to be run iteratively, so that the output of each step of the process can be evaluated. Once data is retrieved from the API it's dumped to a json file, so that if the script has to be rerun from the beginning the API doesn't have to be called again A block of quality control checks appears towards the end.

This script requires an API key from the Census Bureau and two input files: a list of ZCTAs in NYC that was generated from MCDC's MABLE Geocorr tool and modified so that each ZCTA is assigned to one county, and a list of every USPS ZIP Code in the US and the ZCTA it is associated with. These tables are merged to generate a list of every ZIP Code and its affiliated ZCTA in NYC.

Three tables are generated from data pulled from the API: a count of employers, establishments, and wages for every ZIP Code, a count of establishments by two-digit sector NAICS code by ZIP Code, and list of the names of each two-digit sector. The data is filtered to eliminate records outside of NYC and is grouped by ZCTA. Employment data is flagged when aggregated, to indicate the number of establishments for which data is not disclosed, and the industry data is pivoted so that counts of businesses by NAICS are stored in columns. The sum of the industry data is not equal to the total, so an additional field was created to count businesses that were not published for an industry category. At the end of the process the data is written out to a SQLite database, and zeros are replaced with null values; this is simpler to perform with SQL in a DB due to oddities in the way null values are handled in integer columns in Pandas data frames. Values should be regarded as nulls and not zeros as they represent missing data. Data is always written to a test database and evaluated before being inserted into the final database.

Due to changes in the API and the ZBP dataset in 2017, the notebook was revised in several places. The old notebook was renamed for 2016 and should work with older versions of the ZBP (2012 - 2016).

You can’t perform that action at this time.