Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does GEOID work for larger shapes (CRA, NEIGHBO, CITY)? #5

Open
famulare opened this issue Apr 27, 2019 · 11 comments
Open

How does GEOID work for larger shapes (CRA, NEIGHBO, CITY)? #5

famulare opened this issue Apr 27, 2019 · 11 comments
Labels
question Further information is requested

Comments

@famulare
Copy link
Member

For the higher-level shapes, is GEOID a correct field? Or is it just the first census tract for each CRA_NAME, etc? I ask because I don't think CRA_NAME has a GEOID (census.gov), so I'm not sure what's going on there.

If they GEOIDs aren't official census labels, they should be removed from the files.

@famulare famulare added the question Further information is requested label Apr 27, 2019
@jameshadfield
Copy link
Member

Good info! Side note: for the viz prototype, the reason I based things off of GEOID is because the results example file had residence_census_tract (which matched GEOID). The Census-level geoJSON, defined CRA_NAM and NEIGHBO for each GEOIDwhich, conveniently, matched with the geoJSONs for those resolutions.

@famulare
Copy link
Member Author

@jameshadfield Are you saying that the viz prototype

  1. match residence_census_tract to GEOID in the censusTracts.geojson,
  2. use censusTracts.geojson to look up the PUMA, CRA_NAM, or NEIGHBO as needed,
  3. aggregates data at the CRA_NAM (or NEIGHBO or PUMA) level, and
  4. uses the CRA_NAME.geojosn (or etc) to plot?

If the viz prototype just looks at the one GEOID in the CRA_NAM shapefile, then it must be loosing a lot of data that doesn't match.

Either way, the long-term fix is to do the higher level geocoding in the database itself (@tsibley) as this is a deterministic mapping. So we'd have "residence_cra_name", "residence_puma" etc like in the most recent simulated data.

Short term, we either need to

  • confirm the viz is doing some form of (1-4) and remove the incorrect fields from the higher-level shapes, or
  • make GEOID an array in each higher level shape with all the the census tract GEOIDs that belong to a particular higher level shape.

@jameshadfield @jotasolano Any preference?

@jameshadfield
Copy link
Member

@famulare steps 1-4 are much right (but PUMA is never used in the viz).

If the viz prototype just looks at the one GEOID in the CRA_NAM shapefile, then it must be loosing a lot of data that doesn't match.

No, each GEOID entry in the census geoJSON provides a CRA_NAM, which is then matched to the shape file in the cra geoJSON.

Note that it was built around the raw data, which arrived with only a GEOID -- it may be sensible to revisit this for the modelling data which may specify the CRA_NAM directly (for instance).

@tsibley
Copy link
Member

tsibley commented Apr 29, 2019

Either way, the long-term fix is to do the higher level geocoding in the database itself (@tsibley) as this is a deterministic mapping. So we'd have "residence_cra_name", "residence_puma" etc like in the most recent simulated data.

Yep, this is the plan, which we'll probably realize in the middle-term future.

@famulare
Copy link
Member Author

Okay, @jameshadfield . Sounds a bit Rube-Goldberg but I'm glad it works.

I put Seattle pumas in the repo the other day 3e3cc88. I also added washington state shapes at puma and census tract, which we will eventually want to use (and subset) since there's plenty of data from outside Seattle City boundaries.

And yes, the models live at the level of aggregation for now, so those can reference to the proper shapefiles directly.

And it sounds not urgent, but someone should remove the meaningless GEOID fields from the CRA_NAM etc shapefiles. I don't have time today and don't want to break any viz, but feel free to assign it to me if no one else feels the need to fix it.

@jotasolano
Copy link
Contributor

I can add this to the list of To Dos in the Informatics Trello board and then we can assign once we have a better understanding of the work that needs to be prioritized before May's demo

@jameshadfield
Copy link
Member

jameshadfield commented Apr 30, 2019

Sounds a bit Rube-Goldberg but I'm glad it works.

Given a single GEOID how would you extract CRA_NAM etc?

@famulare
Copy link
Member Author

Sounds a bit Rube-Goldberg but I'm glad it works.

That was intended as a collective state of design comment about our interacting systems, not that I had a better solution now. Given that CRA_NAM etc aren't in the database, you have to have a live lookup table, and the ...censusTract.geojson is a good source.

@jotasolano
Copy link
Contributor

Forgive me if this is already obvious, but do we still need to remove the GEOIDs from the shapefiles/geoJSONs? And, is this something we want before the May deadline? (just trying to prioritize work)

@famulare
Copy link
Member Author

famulare commented May 1, 2019

Not at all urgent. Fields that don't make sense should be removed, but since we all know this and no one else is using, I think it can happen whenever. It's probably not worth doing at all until we rethink how we're managing shapefiles anyway #1.

@jotasolano
Copy link
Contributor

Thanks @famulare I've created a card in Trello to keep track of this task there as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants