Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supports for multiple databases #289

Merged
merged 1 commit into from
Mar 10, 2020
Merged

Supports for multiple databases #289

merged 1 commit into from
Mar 10, 2020

Conversation

Joxit
Copy link
Member

@Joxit Joxit commented Mar 9, 2020

Background

WOF database will be hosted on geocode.earth, per country will be available, so the code should support per country databases too.

What's new ?

Now, wof admin lookup will combine all databases like whosonfirst-data-[a-z0-9-]+\.db that means both global and per country databases.

related: pelias/whosonfirst#460 pelias/whosonfirst#469 pelias/whosonfirst#477

@missinglink
Copy link
Member

missinglink commented Mar 9, 2020

Okay, so I am (hopefully) finished for now generating the databases/bundles, here's what we have available:
https://geocode.earth/data

@orangejulius
Copy link
Member

orangejulius commented Mar 9, 2020

Nice! This will definitely be needed.

I tried testing this out on our Portland Metro docker project:

I changed pelias.json so that sqlite: true was set, and importPlace was unset. Then I downloaded whosonfirst-admin-us-latest.db manually, and tried an import.

Strangely, there was no USA country record:

$ pelias import oa
info: [wof-pip-service:master] starting with layers neighbourhood,borough,locality,localadmin,county,macrocounty,macroregion,region,dependency,country,empire,
continent
info: [openaddresses] Importing 8 files.
info: [openaddresses] Creating read stream for: /data/openaddresses/us/or/portland_metro.csv
info: [wof-pip-service:master] macrocounty worker loaded 0 features in 1.494 seconds
info: [wof-pip-service:master] dependency worker loaded 0 features in 1.51 seconds
info: [wof-pip-service:master] country worker loaded 0 features in 1.539 seconds
info: [wof-pip-service:master] macroregion worker loaded 0 features in 1.547 seconds
info: [wof-pip-service:master] region worker loaded 36 features in 1.566 seconds
info: [wof-pip-service:master] borough worker loaded 4 features in 1.564 seconds
info: [wof-pip-service:master] continent worker loaded 3 features in 1.598 seconds
info: [wof-pip-service:master] empire worker loaded 11 features in 1.867 seconds

The record is in there, so I wonder if there's something else preventing it from being used?

sqlite> select * from spr where id = '85633793';
85633793|-1||unknown|XX|whosonfirst-data-admin-us|45.1593097445001|0.318715854000203|18.9061171430001|-179.143503384|71.4125023460001|179.780935092|1|0|0|0|0|
||-1

Could this be an issue with the downloads from geocode.earth? Or is it more likely that something in our code to read from a SQLite DB is incompatible with the per-country DBs?

@missinglink
Copy link
Member

missinglink commented Mar 10, 2020

This looks to me to be a bug in the bundling code.

I recently added some code to include alt geoms in the SQLite files but it looks like while the geojson table allows multiple entries with the same WOFID, the spr table does not.

I think in this case it imported an alt geometry first and then INSERT OR IGNORE meant that row stuck when the primary feature for that WOFID was imported.

You can see in the SQL output that there is no parent, no name and no placetype, which seems to confirm this.

I'll have a look, unfortunately there may be more of these bugs but we can work through them and cover them with unit tests and they'll go away ;)

FYI I always run .headers on in the sqlite3 CLI, it makes debugging a bit easier!

@missinglink
Copy link
Member

missinglink commented Mar 10, 2020

Yep, so that's exactly what happened:

sqlite> select rowid, id, is_alt from geojson where id = '85633793';
rowid|id|is_alt
265744|85633793|0
265743|85633793|1

I've updated the code, published a new npm version and invalidated the build cache, a full data rebuild is scheduled to kick off in 30 mins.

@orangejulius
Copy link
Member

Okay great. So I think it's safe to say this code works well :)

If a new bundle is coming soon, I'll wait to test that out just in case, but that's as much for my own enjoyment of seeing it work as for any actual confidence in this code.

@missinglink
Copy link
Member

missinglink commented Mar 10, 2020

$ cat tmp/sqlite/whosonfirst-data-admin-us-latest.db.bz2.json
{
  "name_compressed": "whosonfirst-data-admin-us-latest.db.bz2",
  "size_compressed": 706543390,
  "sha256_compressed": "bea05ca8377b04aa966b3ef97fb341873cc76f8c23410389cd418f8dbe177a8c",
  "last_modified": "2020-03-10T13:23:49.225Z",
  "name": "whosonfirst-data-admin-us-latest.db",
  "size": 3387883520,
  "sha256": "b231a5cf8c73f9929f3eddcc10b910c38c24f9b5135548eadf41f12cc9b05718",
  "repo": "whosonfirst-data-admin-us",
  "vintage": "latest",
  "commit": "9d2abe154a3d66524ec53a5a44ee15ca986effa6",
  "last_updated": "2020-03-05T22:14:00.000Z"
}
sqlite> select * from spr where id = '85633793';
id|parent_id|name|placetype|country|repo|latitude|longitude|min_latitude|min_longitude|max_latitude|max_longitude|is_current|is_deprecated|is_ceased|is_superseded|is_superseding|superseded_by|supersedes|lastmodified
85633793|102191575|United States|country|US|whosonfirst-data-admin-us|45.1532595|0.314297500000009|18.86546|-179.231086|71.441059|179.859681|1|0|0|0|0|||1583217946

@orangejulius the file is uploaded but the GE data page is still caching the old inventory somehow, you should be able to downlod the new version from here (just subtract the .json):
https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-us-latest.db.bz2.json

@orangejulius
Copy link
Member

Works great now. Thanks @Joxit!

@orangejulius orangejulius merged commit e07f770 into master Mar 10, 2020
@orangejulius orangejulius deleted the joxit/multi-sqlite branch March 10, 2020 16:15
@Joxit
Copy link
Member Author

Joxit commented Mar 10, 2020

Yeah! You are welcome 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants