Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Make proj-addre one-to-many data and load into database. #497

Merged
merged 3 commits into from Aug 28, 2017

Conversation

NealHumphrey
Copy link
Collaborator

@NealHumphrey NealHumphrey commented Aug 27, 2017

  • Adds an "apiconn" object for the prescat. Currently this just loads the file from disk, but in the future this can be the class that checks for newly uploaded data in an online folder that admins can put data into

  • Adds a function to the base_project.py apiconn that uses the split address function to create a csv file of the one-to-many project to addresses.

  • Moves the split address function out of ingestion and into tools, since it's not used only in ingestion

  • Fixes bug in the MarApiConn - adds a method to query api using address string instead of generic string. Helps eliminate false positives.

  • Changes references to find_location to instead be find_addr_string to eliminate false positives

  • Fixes bug in address splitting code that found and in Marland or Randolph. Changed split string to ' and '.

  • Adds proj_addre table to database

  • DHCD and DCHousing ApiConn methods also create _addre.csv files for upload

  • Entity resolution now done via the one to many mapping of addresses - if any of the addresses match, assumes the whole project matches.

To test

  • Create the prescat_addre.csv file: from the python/housinginsights/sources/ folder run python prescat.py (needs to be added to the options list in get_api_data.py, but should be ready to work there)
  • Add the table, from /python/scripts: python load_data.py docker --update-only proj_addre
  • Get DHCD and DCHousing data sets: python get_api_data.py with the code options edited to request the DCHousing and dhcd modules.

… the file from disk, but in the future this can be the class that checks for newly uploaded data in an online folder that admins can put data into

- Adds a function to the base_project.py apiconn that uses the split address function to create a csv file of the one-to-many project to addresses.
- Moves the split address function out of ingestion and into tools, since it's not used only in ingestion
- Fixes bug in the MarApiConn - adds a method to query api using address string instead of generic string. Helps eliminate false positives.
- Changes references to find_location to instead be find_addr_string to eliminate false positives

TODO
- Starts to add the one-to-many mapping to the create_project_subsidy_csv method for the dhcd and dchousing data sets. This is not complete so it instead breaks this code. Need to load the newly created prescat_addr.csv file into the database first and then resume coding this method.
@NealHumphrey NealHumphrey changed the title Make proj-addre one-to-many data and load into database. WIP: Make proj-addre one-to-many data and load into database. Aug 27, 2017
- in the section of the DHCD and DCHousing apiconn that makes the _subsidy and _project tables, it also makes an _addre table
- Entity resolution in that section that checks for an existing NLHIC_id in the database now uses the _addre table, checking first for an exact address match, then looking for a fuzzy match for that address in the mar, and looking for an exact mar_id match; if neither address or mar_id matches, it assumes it's new. If any of the addresses match any of the addresses in the existing database table, it assumes the whole project matches
- Adds the optional parameters to all of the ApiConn objects so that we can consistently pass database_choice to __init__, whether or not it is used
- Small fix to how address splitting occurs to solve error - @jkwening would be good to check logic.
@NealHumphrey
Copy link
Collaborator Author

Updated description to reflect progress w/ most recent commit

Copy link
Collaborator

@jkwening jkwening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@jkwening jkwening merged commit d7f2fbc into dev Aug 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants