Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the zipcode database monthly #7

Open
seanpianka opened this issue Mar 28, 2020 · 7 comments
Open

Update the zipcode database monthly #7

seanpianka opened this issue Mar 28, 2020 · 7 comments

Comments

@seanpianka
Copy link
Owner

seanpianka commented Mar 28, 2020

Currently, the zipcode database can be out-of-sync because no one has made manual updates to the zipcodes.json data-file (which contains the zipcode data available in this package).

Goal: When https://www.unitedstateszipcodes.org releases an updated zipcode dataset, create a new release of this packages with the updated dataset.

Solution: Create a cronjob to perform the following steps monthly.

$ git clone https://github.com/seanpianka/Zipcodes
$ cd Zipcodes
$ python ci/__init__.py
$ bzip2 zips.json
$ mv zips.json.bz2 zipcodes/
$ bash scripts/get-next-patch-version "${current_version}"
$ bash scripts/create-new-python-wheel-release
$ bash scripts/add-to-git-and-publish-to-pypi
@seanpianka seanpianka changed the title Automate monthly updating of zipcodes.json and creating new patch release Automatically update zipcode database monthly Apr 15, 2020
@seanpianka seanpianka changed the title Automatically update zipcode database monthly Update the zipcode database monthly Apr 15, 2020
@seanpianka seanpianka self-assigned this Apr 15, 2020
@kenvenner
Copy link
Contributor

@seanpianka - are you looking for a volunteer to create a autoamted job to run the steps described above and push a PR to the repo each month with a new/updated zip code database. If you are - i could instrument this most likely and deliver this. As I rean into an issue just now that the DBMS is out of date - a zipcode is failing that I assume would pass if this library/tool was current. Let me know

Ken

@seanpianka
Copy link
Owner Author

Yes, I'm certainly open to pull requests that can automate this! As you know, it's important that it's updated regularly, but I don't have time to do so manually. A GitHub Actions pipeline that does this would be a great help!

@kenvenner
Copy link
Contributor

Great - i assume you are pulling the source data from USPS as an individual - the free version? I will plan on doing the same

@kenvenner
Copy link
Contributor

the ci folder does not appear to be checked in to the repo?
python ci/init.py

@seanpianka
Copy link
Owner Author

Yes, that's the db I've used the last few times. Additionally, the script for building the dataset merges in GPS data (lat/lon) from a separate dataset focused on GPS accuracy.

This script can be found in scripts/, I think I removed the ci/ folder in a recent commit.

@kenvenner
Copy link
Contributor

There are two data sources in your scripts:

https://www.unitedstateszipcodes.org/zip-code-database/ is obtained from https://www.unitedstateszipcodes.org/zip-code-database/# and is loaded in base_zipcodes_filename = "scripts/data/zip_code_database.csv"

not sure what the data source is for this file: gps_zipcodes_filename = "scripts/data/zip-codes-database-FREE.csv"

Can you provide me where this file comes from?

@seanpianka
Copy link
Owner Author

I am honestly not sure where I downloaded this from, and I neglected to document this anywhere it seems.

The goal here is to have an alternate zipcode dataset that we can use to update/override the lat/lon values in the unitedstateszipcodes.org dataset. The following sources should be suitable enough for this purpose:

https://www.uszipcodeslist.com/
https://simplemaps.com/data/us-zips

In the script to generate the final dataset, it makes a best-effort attempt to update the existing zipcodes with available lat/lon data from the other dataset. If one dataset does not include a zipcode present in the other dataset, it is fine to simply skip that value and leave the lat/lon data as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants