Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content for README #74

Closed
jGaboardi opened this issue Jul 30, 2020 · 0 comments
Closed

Content for README #74

jGaboardi opened this issue Jul 30, 2020 · 0 comments
Assignees
Labels
data product documentation Improvements or additions to documentation
Projects
Milestone

Comments

@jGaboardi
Copy link
Member

See also #73 and Jonathan's note here.


Documentation for NHGIS crosswalks from block group parts to later units

NHGIS crosswalk from 1990 to 2010 census blocks with GISJOIN identifiers

 
Contents


Data Summary

 
Each NHGIS crosswalk file provides interpolation weights for allocating census counts from a specified
set of source zones to a specified set of target zones. Each record in the crosswalk represents a spatial
intersection between a single source zone and a single target zone.

File naming scheme:  nhgis_[source geog][source year]_[target geog][target year]{_state FIPS}.csv

Geographic unit codes:
      blk →→ - Block 
      bgp →→ - Block group part (intersections between block groups, places, county subdivisions, etc.)
      bg →→ -Block group
      tr →→ - Census tract
      co →→ - County


--> Remainder copied from the 1990 block to 2010 block GISJOIN crosswalk readme <--

--> Must edit <--


Content:

  • The top row is a header row
  • Each subsequent row represents a potential intersection between a 1990 block and 2010 block
  • The GJOIN1990 and GJOIN2010 fields contain NHGIS-standard GISJOIN block identifiers:
    • A block GISJOIN is a concatenation of:
      • "G"
      • State NHGIS code: 3 digits (FIPS + "0")
      • County NHGIS code: 4 digits (FIPS + "0")
      • Census tract code: 4 or 6 digits in 1990; 6 digits in 2010
      • Census block code: 3 or 4 digits in 1990; 4 digits in 2010
    • The GJOIN1990 field contains numerous blank values. These represent cases where the only 1990 blocks intersecting the corresponding 2010 block are offshore, lying in coastal or Great Lakes waters, which are excluded from NHGIS's block boundary files. None of the missing 1990 blocks had any reported population or housing units. The blank values are included here to ensure that all 2010 blocks are represented in the file.
  • The WEIGHT field contains the interpolation weights NHGIS uses to allocate portions of 1990 block counts to 2010 blocks for geographically standardized time series tables
  • The PAREA_VIA_BLK00 field contains the approximate portion of the 1990 block's land* area lying in the 2010 block, based on intersections that the 1990 and 2010 block have with 2000 blocks in 2000 and 2010 TIGER/Line files (i.e. indirect overlay via 2000 blocks).
    • If a 1990 block's area is entirely water, then this value is based on the block's total area including water
    • NHGIS uses these values to compute lower and upper bounds on 1990 estimates: for any record with a value greater than 0 and less than 1, it is assumed that either all or none of the 1990 block's characteristics could be located in the corresponding 2010 block.

Notes

NHGIS uses this crosswalk to generate 1990 data standardized to 2010 census units for NHGIS time series tables. Complete documentation on the interpolation model used to generate the weights in the crosswalk is provided at https://www.nhgis.org/documentation/time-series/1990-blocks-to-2010-geog.

In short, the model is based on "cascading density weighting", as introduced in Chapter 3 of Jonathan Schroeder's dissertation (Visualizing Patterns in U.S. Urban Population Trends, University of Minnesota) available here: http://hdl.handle.net/11299/48076.

The general sequence of operations:

  1. Estimate 2000 population and housing unit counts for each intersection between 2000 and 2010 blocks.
  • Our basic "cascading density weighting" model does this by allocating 2000 counts among 2010 blocks in proportion to 2010 block population and housing densities (population and housing summed together).
  • We use this basic approach only for 2000 blocks that are not split by the boundaries of a 2010 target unit, where "target units" are the areas for which NHGIS plans to release standardized data: block groups, places, county subdivisions, school districts, ZCTA's, urban areas, congressional districts (111th and 113th), and any units that can be constructed from these (e.g., census tracts, counties, etc.).
  • For 2000 blocks that are split by the boundaries of a 2010 target unit, we use NHGIS's more advanced hybrid interpolation model (see https://www.nhgis.org/documentation/time-series/2000-blocks-to-2010-geog) to allocate 2000 counts among 2010 blocks.
  1. Use the estimated 2000 population and housing unit densities from step 1 to guide the allocation of 1990 counts among 1990-2000-2010 block intersections.

The procedure also combines two types of overlay to model intersections between 1990, 2000, and 2010 blocks:

  1. "Direct overlay" of 1990 & 2000 block polygons from 2000 TIGER/Line files with 2000 & 2010 block polygons from 2010 TIGER/Line files (with a preliminary step to georectify Hawaii's 2000 TIGER polygons to 2010 TIGER features in order to accommodate a systematic change in the coordinate system used to represent Hawaii features between the two TIGER versions)
  2. "Indirect overlay":
    a. Overlay 1990 & 2000 block polygons using the 2000 TIGER/Line basis
    b. Overlay 2000 & 2010 block polygons using the 2010 TIGER/Line basis
    c. Multiply 1990-2000 intersection proportions from step 2a with 2000-2010 proportions from step 2b to compute estimated proportions of each 1990 block within each 2010 block. (This is how the crosswalk's "PAREA_VIA_BLK00" values are derived.)

The direct overlay weights are constrained to eliminate any 1990-2010 intersections that are not valid in the indirect overlay. This prevents most "slivers" (invalid intersections caused by changes in TIGER feature representations) from being assigned any weight.

The final weighting blends weights from constrained direct overlay (CDO) and indirect overlay (IO) through a weighted average, giving high weight to CDO (and low weight to IO) in cases where the two TIGER/Line representations of a 2000 block align well and where the 1990-2000 block intersection and the 2000-2010 block intersection both comprise less than the entirety of the 2000 block. In cases where the block intersections cover the entirety of a 2000 block or the block intersection from one TIGER/Line version has no valid intersection with a the corresponding 2000 block in the other TIGER/Line version, then the weighting is based on IO alone.

 

Citation and Use

 
All persons are granted a limited license to use this documentation and the
accompanying data, subject to the following conditions:

  • Publications and research reports employing NHGIS data must cite it appropriately. The citation should include the following:

    Steven Manson, Jonathan Schroeder, David Van Riper, and Steven Ruggles. 
    IPUMS National Historical Geographic Information System: Version 12.0 [Database]. 
    Minneapolis: University of Minnesota. 2017. 
    http://doi.org/10.18128/D050.V12.0

  • For policy briefs or articles in the popular press, we recommend that you cite the use of NHGIS data as follows:

    IPUMS NHGIS, University of Minnesota, www.nhgis.org.

In addition, we request that users send us a copy of any publications, research
reports, or educational material making use of the data or documentation.
Printed matter should be sent to:

    IPUMS NHGIS
    Minnesota Population Center
    University of Minnesota
    50 Willey Hall
    225 19th Ave S
    Minneapolis, MN 55455

Send electronic material to: nhgis@umn.edu

@jGaboardi jGaboardi added documentation Improvements or additions to documentation data product labels Jul 30, 2020
@jGaboardi jGaboardi added this to the v0.0.8 milestone Jul 30, 2020
@jGaboardi jGaboardi self-assigned this Jul 30, 2020
@jGaboardi jGaboardi added this to In progress in v0.0.8 Jul 30, 2020
@jGaboardi jGaboardi moved this from In progress to Done in v0.0.8 Aug 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data product documentation Improvements or additions to documentation
Projects
No open projects
v0.0.8
  
Done
Development

No branches or pull requests

1 participant