Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data pipeline for EMS vector layers #4

Closed
thomasneirynck opened this issue Mar 13, 2018 · 6 comments
Closed

Data pipeline for EMS vector layers #4

thomasneirynck opened this issue Mar 13, 2018 · 6 comments
Assignees

Comments

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Mar 13, 2018

Context

EMS published vector data. This data comes from various sources, including Wikidata, Naturalearth.

See https://vector.maps.elastic.co/v2/manifest for full list of current offering.

The current files were created ad-hoc. There is not much in the way of code/process/docs to regenerate these files

There is a big risk that these files will grow stale. We need a way to keep these fresh, and improve continuously when issues pop up.

Goal

  • We need to ability to generate these files, from source, and have their schema comply with our current files.

  • Files need to be geojson, except the zip-file which is topojosn. Future files can be one or the other. Geojson preferred whenever reasonable. (is more human-readable/hackable). Topojson for larger files.

  • Each file needs to be accompanied with a separate meta-data file (includes info about attribution, fields (human readable label/discription), ...). This information is important because it is required by Kibana to boostrap the UI. We will likely also use this info in the EMS-landing page.

  • Some files may require corrections/additions. E.g. misspellings in the world file/missing countries. Ideally this can be rolled up in this "generation" process.

  • We need backwards compatibility. Addition of new properties and features is fine, but we cannot take away properties or features compared to previous versions of the tile.

Integration with EMS (??)

How to integrate with EMS is not entirely clear yet. e.g. How will these new files be "picked up" by EMS. Right now, we have an upload form. I'd suggest we work with that for now, until we have a clearer grasp of where we want to go, technology wise.

Example

Suppose we have the countries file, what we need is a program that spits out:

  1. geojson featurecollection

"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"name": "Antigua and Barbuda",
"iso2": "AG",
"iso3": "ATG"
},
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[
[
[
-61.686668,
17.024441000000152
],
[
-61.887222,
17.105274
],
[
-61.794449,
17.1633300000001
],
[
-61.686668,
17.024441000000152
]
]
],
[
[
[
-61.72917199999989,
17.608608
],
[
-61.853058,
17.583054000000104
],
[
-61.873062,
17.703888
],
[
-61.72917199999989,
17.608608
]
]
]
]
}
},
{
"type": "Feature",
"properties": {
"name": "Algeria",
"iso2": "DZ",
"iso3": "DZA"
},
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[
[
[
2.96361,
36.802216
],
[
4.785832,
36.894722
],
[
5.328055,
36.640274
],
[
6.398333,
37.086388
],
[
......
  1. Metadata json object (separate file)
{
"attribution": "[Made with NaturalEarth](http://www.naturalearthdata.com/about/terms-of-use) | [Elastic Maps Service](https://www.elastic.co/elastic-maps-service) | WHATEVER_SOURCES",
"name": "World Countries",
"format": "geojson",
"fields": [
{
"name": "iso2",
"description": "Two letter abbreviation"
},
{
"name": "name",
"description": "Country name"
},
{
"name": "iso3",
"description": "Three letter abbreviation"
}
],
"created_at": "WHATEVER_DATE_OF_GENERATION",
"tags": []
}

@alexfrancoeur
Copy link

Are we doing anything with these at this point and time? https://github.com/elastic/infra/issues/3475#issuecomment-362031047

@nickpeihl
Copy link
Member

Yes, I am reviewing the results and will upload the datasets to the staging map service for testing.

@thomasneirynck
Copy link
Contributor Author

also note that we were also blocked on releasing more layers due to not having the ability to enforce ordering in our layers (otherwise we get some pingponging, it's already an issue (e.g. elastic/kibana#17197). We have this functionality now in staging (thanks @mentat!), but we need to move this to production.

@alexfrancoeur
Copy link

+++ thanks @thomasneirynck @nickpeihl!

@nickpeihl
Copy link
Member

Issue submitted for infra. https://github.com/elastic/infra/issues/5256

@thomasneirynck
Copy link
Contributor Author

After discussion, we should close this as it's scoped out here.

  • data is reasonably static. That is, there really is no pressing need to be alerted of "changes", as they virtually do not occur.
  • production of the data can be automated by running the SPARQL query, and/or applying the manual edits that have been outlined in the accompanying documentation files.

We could need some additional documentation for new hires to figure out how to generate the data from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants