Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional county-level NY data #201

Merged
merged 6 commits into from Feb 4, 2019

Conversation

Projects
None yet
3 participants
@saranrapjs
Copy link
Contributor

saranrapjs commented Jan 4, 2019

Hi! I've eagerly followed this project / the blog posts by Mike Migurski / Planscore, and was inspired to file a FOIL (NY State's FOIA equivalent) to get at some further/better precinct shapes in my state.

The Secretary of State here doesn't control election district data (the various county secretaries of state do) so I went to the next best thing short of phoning up every county: the LATFOR, which is a permanent committee in NY State which "aids the Legislature by providing technical plans for meeting the requirements of legislative timetables for the reapportionment of Senate, Assembly and Congressional districts". After calling them and getting the runaround (you can google LATFOR, they are often tied to controversy!), I filed this FOIL request, which is where the data in this PR comes from:

https://www.muckrock.com/foi/new-york-16/ny-state-election-district-shapefiles-for-past-elections-60752/ (there's a downloadable ZIP there)

I haven't yet done the work to translate the data from all of these files to the out/36-new-york/state.gpkg part of the Makefile, and am happy to do so, but first wanted to confirm that:

  • having this county level data is desirable? I know that statewide data is preferred, but absent a source, this at least gets at historical data for most (though not all) of NY's counties
  • this is the right place for this data to live? Because this comes from a FOIL against a group that actually performs redistricting, rather than the Secretary of State, it's possible there are interesting bits of data in here beyond just the geodata...by way of example, when I pulled up the 001-albany fields, it looks like some of the column names might contain information that this committee collected on potential redistricting strategies

Thanks, and let me know if there's anything other conventions I should hew to in terms of folder naming or how to organize the additions to the Makefile

@nvkelso

This comment has been minimized.

Copy link
Owner

nvkelso commented Jan 4, 2019

Great find, thanks for all your sleuthing! This will fill in a huge gap in the map :)

  • Yes having county level data is good, especially when statewide data isn't already compiled / released by the state gov't.
  • Yes, it's okay to include the raw data here like you propose in this PR. Because the SHP, DBF, & etc components of the "shapefile" can get unweidly (and large file size) sometimes we compress each county into it's own ZIP, and GDAL can take care of reading the SHP out of the ZIP. That's an option here / could be a followup.
  • If you add Makefile support that'd be a huge help and I'm happy to review &/or help with that.
@migurski

This comment has been minimized.

Copy link
Collaborator

migurski commented Jan 9, 2019

Wow, this is great! I took a quick peek at these layers, and they’re all pretty different. I can help with the Makefile syntax & testing if you can provide a bit of additional information about each county: which file has the county’s newest data, the year that data is from, and the column name where the precinct name or ID can be found.

Example:

  • 001-albany, data/36-new-york/001-albany/2012/2012_EDs-Albany.shp, 2012, DistName
@saranrapjs

This comment has been minimized.

Copy link
Contributor Author

saranrapjs commented Jan 9, 2019

Great — I've got some time this weekend and can try to cobble these together!

ogr2ogr -sql "SELECT '2012' AS year, '36' AS state, '011' AS county, CONCAT('36011', CAST(MUNI AS character(20)), CAST(DISTRICT AS character(20))) AS precinct, 'polygon' AS accuracy FROM Election_Districts" \
-s_srs EPSG:2260 -t_srs EPSG:4326 -nln state -append -f GPKG $@ \
'data/36-new-york/011-cayuga/2012/Election_Districts.shp'
# This one errors with: "Failed to reproject feature 38 (geometry probably out of source or destination SRS)."

This comment has been minimized.

@saranrapjs

saranrapjs Jan 14, 2019

Author Contributor

That seems to work! It still reports the failure, but presumably (I haven't yet tried the full Makefile run locally) still adds the bulk/rest of that shapefile's layer to the map

@saranrapjs

This comment has been minimized.

Copy link
Contributor Author

saranrapjs commented Jan 14, 2019

OK, I think all of the counties should now be in that Makefile (and I've verified that the NY part of the Makefile runs OK). I also tweaked some of the NYC data specifically so that the per-borough counties get assigned correctly (+ added a related script to make this consistent into the future).

Some other notes/things that came up as I went thru these:

  • I've been snapping the files to the nearest election year where they don't occur on an even year, but not sure if this is desirable — e.g. if something appears to have been created on an odd year, am I correct to have assigned them to the previous-nearest election year, or should they retain whatever date they have?
  • I'm not super familiar with Makefile syntax, and wasn't sure if it's OK to not add all of the source files to the Makefile prerequisites (in part because it'd be such a long list)
  • I tried to be careful & conscientious about both dating the various county files correctly, as well as choosing what appeared to be the "correct" election district identifiers amongst the field names in the shapefiles....but there are definitely some cases where there didn't appear to be a specific field name identifier for a given election district polygon, or where it wasn't clear which of the field names corresponded to the unique district. Where multiple years were involved I tried to cross-reference identifiers across years (since I presume this is what's important) at a minimum
  • I also haven't verified if I needed to be picky about the input SRS for any of these (I've mostly been scanning them with ogrinfo on the command line, and will admit that I'm not super familiar with working with GDAL directly in this way!); it seems possible GDAL would complain if something was really wrong, but if there's a better way to assess or derive the SRS from the source file, let me know
@nvkelso

This comment has been minimized.

Copy link
Owner

nvkelso commented Jan 14, 2019

OK, I think all of the counties should now be in that Makefile (and I've verified that the NY part of the Makefile runs OK). I also tweaked some of the NYC data specifically so that the per-borough counties get assigned correctly (+ added a related script to make this consistent into the future).

Nice work! I'll find time tonight or tomorrow to review the PR and verify locally.

  • I've been snapping the files to the nearest election year where they don't occur on an even year, but not sure if this is desirable — e.g. if something appears to have been created on an odd year, am I correct to have assigned them to the previous-nearest election year, or should they retain whatever date they have?

I haven't been snapping to nearest election year in the rest of the data, so to be consistent please base on the creation of the data instead. (In some cases if I've verified it was for the earlier election year than I have moved the date backwards in time.)

  • I'm not super familiar with Makefile syntax, and wasn't sure if it's OK to not add all of the source files to the Makefile prerequisites (in part because it'd be such a long list)

They should all be added to the prereqs to keep Make informed. Else we'll update one of the counties later and it won't trigger a Make update.

  • I tried to be careful & conscientious about both dating the various county files correctly, as well as choosing what appeared to be the "correct" election district identifiers amongst the field names in the shapefiles....but there are definitely some cases where there didn't appear to be a specific field name identifier for a given election district polygon, or where it wasn't clear which of the field names corresponded to the unique district. Where multiple years were involved I tried to cross-reference identifiers across years (since I presume this is what's important) at a minimum

I'll give these a once over, too. Some of it is guesswork, we can always fix something later if someone finds a bug ;)

  • I also haven't verified if I needed to be picky about the input SRS for any of these (I've mostly been scanning them with ogrinfo on the command line, and will admit that I'm not super familiar with working with GDAL directly in this way!); it seems possible GDAL would complain if something was really wrong, but if there's a better way to assess or derive the SRS from the source file, let me know

I'll give this a once over, too. Generally as long as the output looks right (whole state with continuous fabric of coverage – no holes; and zooming to the NY state package layer extent just zooms to NY and not NY and also a null island with everything being tiny on screen – you've done the right thing.

Thanks again!

@saranrapjs

This comment has been minimized.

Copy link
Contributor Author

saranrapjs commented Jan 15, 2019

I updated the dates in the SQL to more closely match the dates in files (where they exist), falling back to the date modified of the files, and also added all of the files to the prerequisites.

@nvkelso

This comment has been minimized.

Copy link
Owner

nvkelso commented Jan 15, 2019

Great progress! A few comments...

Some county source projections look off when viewing the New York state GPKG file in QGIS:

image

Compared to what NYSDOT says:

image

  1. Besides being in the wrong spot, these all seem to have duplicate entries on the map, you'll need to remove them by ADDING them to the NOT clause in SQL line that is currently:

ogr2ogr -sql "SELECT '2010' AS year, STATEFP10 AS state, COUNTYFP10 AS county, GEOID10 AS precinct, 'polygon' AS accuracy FROM tl_2012_36_vtd10 WHERE COUNTYFP10 NOT IN ('005','047','061','069','081','083','085')" \

To something more like:

ogr2ogr -sql "SELECT '2010' AS year, STATEFP10 AS state, COUNTYFP10 AS county, GEOID10 AS precinct, 'polygon' AS accuracy FROM tl_2012_36_vtd10 WHERE COUNTYFP10 NOT IN ('001','003','005','007','009','011','019','023','027','029','031','043','047','055','059','061','063','065','067','069','071','073','075','077','079','081','083','085','087','089','093','095','101','103','107','109','111','113','115','119','123')" \

To generate that list I looked at your Makefile dependencies and did some text selection magic in BBedit as they all had the same pattern / character alignment. If you missed a dependency, it's FIPS is also missing above ;)

See screenshot at the bottom for example of what this larger exclusion looks like.

  1. The four farthest north oddballs are (FIPS):

To get these I'm opening the SHP up in QGIS and layer's CRS and copying out the PROJ def which ORG knows how to deal with. But strip out the +proj= bit as ORG doesn't like that bit.

  • 029 @ +proj=longlat +ellps=GRS80 +no_defs
  • 059 @ +proj=longlat +ellps=GRS80 +no_defs
  • 089 is missing PRJ file? suggest reusing 095's which lands it in the right spot on map (often a state prefers the same UTM zone or a NAD state grid which NY has a few, so an educated guess)
    +proj=utm +zone=18 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
  • 095 @ +proj=utm +zone=18 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs

image

  1. On the south side:
  • 007 @ +proj=tmerc +lat_0=40 +lon_0=-76.58333333333333 +k=0.9999375 +x_0=250000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
  • 009 @ +proj=tmerc +lat_0=40 +lon_0=-78.58333333333333 +k=0.9999375 +x_0=350000.0000000001 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
  • 011 @ +proj=tmerc +lat_0=40 +lon_0=-76.58333333333333 +k=0.9999375 +x_0=152400.3048006096 +y_0=0 +datum=NAD27 +units=us-ft +no_defs
  • 023 @ +proj=tmerc +lat_0=40 +lon_0=-76.58333333333333 +k=0.9999375 +x_0=250000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
  • 031 (middle) @ +proj=tmerc +lat_0=40 +lon_0=-74.33333333333333 +k=0.9999666666666667 +x_0=152400.3048006096 +y_0=0 +datum=NAD27 +units=us-ft +no_defs
  • 055 @ +proj=tmerc +lat_0=40 +lon_0=-78.58333333333333 +k=0.9999375 +x_0=350000.0000000001 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
  • 067 @ +proj=tmerc +lat_0=40 +lon_0=-76.58333333333333 +k=0.9999375 +x_0=250000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
  • 075 is most likely EPSG:2261
  • 101 @ +proj=tmerc +lat_0=40 +lon_0=-76.58333333333333 +k=0.9999375 +x_0=250000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
  • 107 @ +proj=tmerc +lat_0=40 +lon_0=-76.58333333333333 +k=0.9999375 +x_0=250000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
  • 109 @ +proj=tmerc +lat_0=40 +lon_0=-76.58333333333333 +k=0.9999375 +x_0=250000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
  • 111 @ EPSG:2260
  • 123 @ +proj=tmerc +lat_0=40 +lon_0=-76.58333333333333 +k=0.9999375 +x_0=250000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs

image

In the middle (subset of "south"):

image

Separately...

  1. The name column should have something in it for at least one of these counties!

For example, Broome source has full name column that can be used for this.

An aside: NY state composite with the updated NOT SQL clause:

image

@migurski

This comment has been minimized.

Copy link
Collaborator

migurski commented Jan 15, 2019

Thanks for getting straight into the Makefile!

@saranrapjs

This comment has been minimized.

Copy link
Contributor Author

saranrapjs commented Feb 2, 2019

OK, I just pushed some updates that should rectify (to my eye) most of these weirdly projected counties (finally figured out how to color the counties separately in QGIS).

I'm not sure I understood your note about the name field — was that just towards identifying mis-registered counties? Or something else?

I'm also not sure if the patchy shapes in one of your screenshot was just an artifact of the incorrect SRS's, or something else — I tried playing around w/ feature blending modes to see if there were counties hiding under other counties, or something, but didn't see anything obvious...

@nvkelso

This comment has been minimized.

Copy link
Owner

nvkelso commented Feb 4, 2019

The county filter and projections look good to me now!

image

My comment about name column is out of norms for the project – only a couple states include the name property in their output at all. But if you wanted to, it could be addressed by adding an new section to the SQL for each county sourcing name from each SHPs DBF file names. Not critical.

@nvkelso

nvkelso approved these changes Feb 4, 2019

@nvkelso nvkelso merged commit eea3da5 into nvkelso:master Feb 4, 2019

@nvkelso

This comment has been minimized.

Copy link
Owner

nvkelso commented Feb 4, 2019

Thanks for all your help @saranrapjs! Great addition to the project :)

I'm traveling this week but will try and push new data to the download S3 bucket as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.