Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update project table #466

Merged
merged 13 commits into from Aug 15, 2017
Merged

Update project table #466

merged 13 commits into from Aug 15, 2017

Conversation

NealHumphrey
Copy link
Collaborator

This branch contains code fixes needed to get all 3 project data sources loaded into the database.
This builds off of the PR #465 and includes all that code too. It also fixes a few bugs with that code.

It also switches the manifest.csv to use the new S3 bucket from the Code for DC AWS account, instead of the one from Neal's personal account that we had been using.

To test:
python load_data.py docker --remove-tables project --update-only project prescat_project prescat_subsidy dchousing_project dchousing_subsidy dhcd_dfd_properties_project dhcd_dfd_properties_subsidy

…e unique_data_id results in an empty table, delete the table before loading the data. This allows for the addition/removal of columns in a single table, as the code will recreate the table after it is deleted, using the current version of meta.json

Note this will not work for tables that have more than one unique_data_id in them, because the table will never be empty

Also handles errors due to missing or not found data in the zone_facts table - if a data field is not found, we want to log the error but don't want to terminate the code, for when this is run on a server.
…pdating the database.

This handles the situation where there are multiple unique_data_ids in a database table but we want to update the table structure without rebuilding the entire database.
Merge remote-tracking branch 'jkwening/437-building-permit-update' into merge-projects-enhancements

# Conflicts:
#	python/housinginsights/ingestion/SQLWriter.py
…rently it uses a simple string-matching implementation; TODO is to add more advanced info

Splits the DHCD data into 'project' and 'subsidy' tables per the same logic used for the DCHousing data set
… database, by querying the MAR API

Refactors the methods off of the 'BaseApiConn' into a new 'ProjectBaseApiConn' so that the Project api connection can use the MarApiConn without causing circular import issues when the MarApiConn tries to load the BaseApiConn
Changes both DHCD and DCHousing apiconn objects to use the `ProjectBaseApiConn`
…elds, renames Proj_zip field (changed case from Proj_Zip), updates manifest

- Updates all files in manifest to use new S3 bucket on the codefor dc account: `housing-insights` (instead of without the dash, which was bucket name on Neal's personal S3 account)
- Fixes project loading bug re: incomplete refactor of fields needed for rename-census-tracts
- Fixes bugs related to new ProjectBaseApiConn - split into separate file to avoid more circular import issues.
… and projects tables due to missing if statement
…s we will want, but gets these fields usable for now.
Copy link
Collaborator

@jkwening jkwening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 259 - 263 in Cleaner.py was throwing a KeyError. I believe the issue is that lat/lon and x/yxcoord lookups return a json object that has 'Table1' as the first level key so replacing 'returnDataset' key value and removing 'sourceOperation' validation fixed the error. Regarding the latter - it looks like you're trying verify that if address lookup it actually returns an address but this will throw a 'KeyError' if lat/lon or x/ycoord lookup returns empty result so you will need an additional condition for that to work correctly.

@NealHumphrey
Copy link
Collaborator Author

Hmm, so it looks like mar_api.find_location() returns a different format than the .reverse_lat_lon and .reverse_geocode() methods, due to the MAR api returning a different result format. I lifted the that set of conditionals from my other code that only used .find_location().

This also means that this line row['mar_id'] = result['Table1'][0]["ADDRESS_ID"] would never have worked if the .find_location() method were implemented - I think it just never got called during the DCHousing data set since we always had lat/lon available.

For now I'll hotfix this by putting the return method underneath each conditional. But, we'll need to consider broader solutions as part of our enhanced dedupe code - and we might want to consider adding some wrapper methods to the MarApiConn that return a consistent format and handle things like finding intersections instead of addresses.

@NealHumphrey
Copy link
Collaborator Author

Made change; testing and it works on my machine. Going to merge.

@NealHumphrey NealHumphrey merged commit 7ea5073 into dev Aug 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants