Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing place kana fields to office data #14

Merged
merged 8 commits into from
Nov 14, 2021
Merged

Add missing place kana fields to office data #14

merged 8 commits into from
Nov 14, 2021

Conversation

xatlasm
Copy link
Contributor

@xatlasm xatlasm commented Sep 21, 2021

I noticed that the office data does not include prefecture_kana, city_kana, neighborhood_kana fields that are present in the normal zipcode data.

To fix this, I modified the postal_data table in the sqlite db to add additional columns for prefecture, city, neighborhood. When processing the office data, I added a step that queries the postal_data table for an entry that has the same prefecture, city, and neighborhood, and if it exists copies the kana fields to be stored with the office data. See the generated officedata.json for example output.

A side effect of this change is that it is now possible to easily query the sqlite db by prefecture, city, and/or neighborhood, which could be useful if a user wanted a list of postcodes in each region or subregion.

Since the sqlite database is being queried once for each postcode in office data, the run time for prep.py is somewhat longer compared to the original implementation. There is probably an opportunity for optimization, but I'm by no means a python or db expert so I'll punt on this for now.

@polm
Copy link
Owner

polm commented Sep 21, 2021

Thanks, that looks like a great addition!

About the slowdown, it should be possible to speed things up by wrapping the kana fetching part in a function and popping lru_cache on it. I'll try to take a look at it this week.

@polm
Copy link
Owner

polm commented Nov 14, 2021

Thanks again for this PR, and sorry it took me a while to get to it!

It should be working now, and I'll incorporate the results for the next release. It's still not perfect - the main issue is that it only adds readings if there is a match for all three address levels (prefecture, city, neighborhood). I expected this to always be true, but it seems that sometimes the neighborhood (at least) of office codes isn't used in normal postal codes at all, which means the pref/city are left out. It should be easy to add backoff logic for this, I may get around to it this month.

@polm polm merged commit c8d69ea into polm:master Nov 14, 2021
@xatlasm xatlasm deleted the feature-office-place-kana branch November 17, 2021 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants