Visit healthyregions.org to learn about our recent projects and team. Here on Github you can explore all of our open source projects, and access some general data resources that we use internally.
HEROP_IDs
In some of our projects we use what we call a HEROP_ID to identify geographic boundaries defined by the US Census Bureau, which is a slight variation on the commonly used standard GEOID. Our format is similar to what the American FactFinder used (now data.census.gov).
A HEROP_ID consists of three parts:
- The 3-digit Summary Level Code for this geography. Common summary level codes are:
040
-- State050
-- County140
-- Census Tract150
-- Census Block Group860
-- Zip Code Tabulation Area (ZCTA)
- The 2-letter string
US
- The standard GEOID for the given unit (length depends on unit summary)
- GEOIDs are, in turn, hierarchical aggregations of FIPS codes
Expanding out the FIPS codes for the five summary levels shown above, the full IDs would look like:
summary level | format | length | example |
---|---|---|---|
State | 040US + STATE (2) |
7 | 040US17 (Illinois) |
County | 050US + STATE (2) + COUNTY (3) |
10 | 050US17019 (Champaign County) |
Tract | 140US + STATE (2) + COUNTY (3) + TRACT (6) |
16 | 140US17019005900 |
Block Group | 150US + STATE (2) + COUNTY (3) + TRACT (6) + BLOCK GROUP (1) |
17 | 150US170190059002 |
ZCTA | 860US + ZIP CODE (5) |
10 | 860US61801 |
The advantages of this composite ID are:
- Unique across all geographic areas in the US
- Will always be forced to string formatting
- Easy to programmatically change back into the more standard GEOIDs
Convert to GEOID (integers)
The HEROP_ID
can be converted back to standard GEOIDs by removing the first 5 characters, or by taking everything after the substring "US". Here are some examples of what this looks like in different software:
- Excel:
REPLACE(A1, 1, 5, "")
- R:
geoid <- str_split_i(HEROP_ID, "US", -1)
- Python:
geoid = HEROP_ID.split("US")[1]
- JavaScript:
const geoid = HEROP_ID.split("US")[1]
Download Census geography files
Within the backend of our OEPS project we have an ETL pipeline that merges, tranforms, and exports data files from the US Census Bureau into a few different geospatial data formats. There are two categories of files:
- Cartographic Boundaries have simplified geometries which makes them ideal for mapping applications learn more
- We typically use the 500k scale files, though they publish other scales as well
- TIGER/Line Shapefiles have official, unsimplified geometries and should be used for geospatial analysis learn more
- We don't have these in the pipeline yet, but hope to eventually...
Feel free to download and use these for your own projects.
- GeoJSON A simple plain text format that is good for small to medium size datasets and can be used in a wide variety of web and desktop software learn more
- Shapefiles Used in scripting and desktop software for performant display and analysis learn more
-
Tip:
geopandas
should allow you to directly open remote zip files with something like this learn more:import geopandas as gpd gpd.read_file('/vsizip//vsicurl/https://herop-geodata.s3.us-east-2.amazonaws.com/oeps/state-2010-shp.zip
-
- PMTiles A "cloud-native" vector format that is very fast in the right web mapping environment learn more
Note: We don't yet have ZCTA and Place geographies for 2010.