Create a standard structure for importers #255

Open
orangejulius opened this Issue Feb 4, 2016 · 6 comments

Comments

Projects
None yet
5 participants
@orangejulius
Member

orangejulius commented Feb 4, 2016

Each of our 5 importers (openstreetmap, openaddresses, geonames, quattroshapes, and whosonfirst) have at least marginally different structure, calling conventions, and interfaces. I'd like to see them all unify on an identical interface that also has a few enhancements over how they behave now. All importers should...

Have configuration driven through pelias-config

All importers at least partially use pelias-config for settings like adminLookup, and parts of the import pipelines in each importer use pelias-config for Elasticsearch settings, etc. However many of the importers also take mandatory parameters via the command line. For example, the Geonames importer requires a 2 character ISO-3166 country code to specify which country's data to import. All these command line parameters should be moved to pelias-config so that we can...

Provide an npm script that starts the importer

Currently, every importer has a different command line interface, so our Chef scripts have 5 sections, each subtly different. It's not important that every importer is called in exactly the same way, but it should be possible to hide that interface in an npm script. This script should be npm start (see the docs, although if there is also an npm run import script that is called by npm start, that's great too.

Be able to download the data required for the importer

This should be an optional step that can be run separate from an import. The location where the data is downloaded to should be configurable.

Nice to have here would be a smart downloader that looks at the sizes and checksums of local files so that only incomplete/invalid/out of date files are downloaded. Faster downloads = faster builds.

Print a message on successful or unsuccessful finish of the importer

It is often confusing for users when our importers simply stop after working for a long time, without printing some sort of message saying that everything finished successfully. We should ensure all our importers print something. Ideally the printing would include the success or failure of the import, the amount of time the import took, the number of records imported, and the average import speed.

Live in a github repo called pelias/[datasource]-importer?

This would be nice for clarity but would be a lot of work to change

@trescube

This comment has been minimized.

Show comment
Hide comment
@trescube

trescube Feb 5, 2016

Contributor

👍

Contributor

trescube commented Feb 5, 2016

👍

@riordan

This comment has been minimized.

Show comment
Hide comment
@riordan

riordan Feb 8, 2016

Contributor

Couldn't agree more.

It'll make our philosophy of being dataset agnostic just a bit closer to approachable for more folks.

Contributor

riordan commented Feb 8, 2016

Couldn't agree more.

It'll make our philosophy of being dataset agnostic just a bit closer to approachable for more folks.

@missinglink

This comment has been minimized.

Show comment
Hide comment
Member

missinglink commented Feb 17, 2016

👍

@dianashk dianashk added the processed label Feb 25, 2016

@dianashk dianashk referenced this issue in pelias-deprecated/admin-lookup Mar 7, 2016

Closed

Migrate admin lookup config to a global setting #19

@dianashk dianashk added this to the Dependency Upgrades milestone Mar 7, 2016

@orangejulius orangejulius referenced this issue in pelias/openstreetmap Apr 19, 2016

Closed

Move execution out of index.js #37

@avulfson17 avulfson17 referenced this issue in pelias/openaddresses Jul 8, 2016

Merged

Config imports #137

orangejulius added a commit to pelias/geonames that referenced this issue Dec 13, 2016

Split single executable into individual tasks
These tasks are run via NPM scripts just like our other importers.

Connects pelias/pelias#255

@orangejulius orangejulius self-assigned this Dec 13, 2016

@orangejulius orangejulius added in progress and removed processed labels Dec 13, 2016

orangejulius added a commit to pelias/geonames that referenced this issue Dec 13, 2016

Split single executable into individual tasks
These tasks are run via NPM scripts just like our other importers.

Connects pelias/pelias#255

orangejulius added a commit to pelias/geonames that referenced this issue Jan 31, 2017

Split single executable into individual tasks
These tasks are run via NPM scripts just like our other importers.

Connects pelias/pelias#255
@dianashk

This comment has been minimized.

Show comment
Hide comment
@dianashk

dianashk Jun 15, 2017

Contributor

This might already be really close to being done. @orangejulius will confirm and update issue.

Contributor

dianashk commented Jun 15, 2017

This might already be really close to being done. @orangejulius will confirm and update issue.

@dianashk dianashk added on-deck and removed in progress labels Jun 15, 2017

@orangejulius

This comment has been minimized.

Show comment
Hide comment
@orangejulius

orangejulius Jun 20, 2017

Member

I've just added another item to the list of things we should do. Not counting the renaming of all the repositories, which is a lot of work without too much benefit right now, here's our current status:

Importer Uses pelias-config has npm start script has downloader script prints message on exit
OSM
OA (has an old PR)
Geonames
WOF
Polylines
Member

orangejulius commented Jun 20, 2017

I've just added another item to the list of things we should do. Not counting the renaming of all the repositories, which is a lot of work without too much benefit right now, here's our current status:

Importer Uses pelias-config has npm start script has downloader script prints message on exit
OSM
OA (has an old PR)
Geonames
WOF
Polylines
@orangejulius

This comment has been minimized.

Show comment
Hide comment
@orangejulius

orangejulius Jun 20, 2017

Member

I'd also like to point out that working on the exit messages would be a great starter task for anyone looking to contribute. The code in Openaddresses is a good starting point and should be able to work with few modifications in our other importers. We'd be happy to help anyone get started.

Member

orangejulius commented Jun 20, 2017

I'd also like to point out that working on the exit messages would be a great starter task for anyone looking to contribute. The code in Openaddresses is a good starting point and should be able to work with few modifications in our other importers. We'd be happy to help anyone get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment