Skip to content

elastic/ems-file-service

Repository files navigation

Elastic Map Service Data Sources

Machine readable and standardized data sources for use in Elastic Map Service.

Usage

Create a new JSON or Hjson file in the appropriate folder in sources. The source file must match the schema in schema/source_schema.json.

To validate data sources against the schema run

yarn test

Setting the environment variable EMS_STRICT_TEST will perform an additional check to ensure all field definitions are present in all features:

EMS_STRICT_TEST=ok yarn test

To build manifests and vector data files for all versions run

yarn build

Continuous Integration and Deployment

  • New feature layers can be developed on the feature-layers branch (git checkout --track upstream/feature-layers). Buildkite will build and deploy all commits to this branch into a testing bucket on GCP. Test feature layers on this branch in Kibana by adding map.manifestServiceUrl: http://storage.googleapis.com/elastic-bekitzur-emsfiles-catalogue-dev/v7.2/manifest to config/kibana.yml.
  • Pull requests for new feature layers should be made from the feature-layers against the master branch. Pull requests for any other changes should be made on a new branch in your fork, e.g. git checkout -b my-bugfix.
  • Once merged, Buildkite will run .buildkite/deploy.sh script, which will place the contents of the dist directory into the staging bucket.
  • Deploying to production requires pushing a tag (git push --tags) and accepting the Deploy to production? block step. This will execute the deploy.sh script against the production GCP buckets, creating also an archive in a separate bucket for future reference.

Versioning

  • The file scripts/constants.js contains the versions for the manifests to be generated. It is OK to generate a few versions ahead of the Elastic Stack current status.
  • Aside of the Elastic Stack Semantic Versioning tailored for the cloud and self-managed environments, date-based version manifests are also generated, mapped to semver, at the DATE_VERSIONS array. Check PR #287 for details.
  • A final /latest/manifest is generated from the last entry in the DATE_VERSIONS array.
  • If the manifest contract needs to be changed, a new date version entry should be generated. Otherwise is OK to upgrade an entry in DATE_VERSIONS in place, to upgrade to a newer semantic version of the Stack. For example PR #306 promotes the 2023-10-31 version from 8.10 to point to 8.13 given that only new data has been added to the repository.

Adding a new country subdivision vector layer

Whenever possible new vector layers should be created using a SPARQL query in Sophox.

  1. Checkout the upstream feature-layers branch.
  2. If necessary, create a new folder in the sources directory with the corresponding two-digit country code (ex. ru for Russia).
  3. Copy and paste the template source file (templates/source_template.hjson) into the new directory you created in step 1. Give it a useful name (ex. states.hjson, provinces.hjson, etc).
  4. Complete the note and name fields in the new source file.
  5. Copy and paste the query.sparql value into the query box on http://sophox.org.
  6. Change the Q33 in the VALUES ?entity { wd:Q33 } to the corresponding Wikidata ID for the country for which you are adding subdivisions (ex. Q33 is the Wikidata ID for Finland).
  7. Run the SPARQL query and compare the iso_3166_2 results with the corresponding country's subdivision list on the ISO website looking for missing iso_3166_2 codes.
  8. The most common reason for missing iso_3166_2 codes in the query results is an incomplete "contains administrative territorial entity" property in the immediate parent region of the subdivision in Wikidata (usually, but not always, the country). You may need to add the subdivision Wikidata item to this property (ex. https://www.wikidata.org/wiki/Q33#P150).
  9. Add label_* fields for each official language of the country to the SPARQL query similar to the label_en field.
  10. Optionally, add unique subdivision code fields from other sources (ex. logianm in Ireland) to the query.
  11. Run the SPARQL query and check the map output.
  12. Optionally, click the "Simplify" link and drag the slider to reduce the number of vertices (smaller file size).
  13. Click the "Export" link on the top right of the map. Choose GeoJSON or TopoJSON as the File Format.
  14. Type rfc7946 ƒin the "command line options" to reduce the precision of the coordinates and click "Export" to download the vector file.
  15. Rename the downloaded file with the first supported EMS version number (ex. _v1, _v2, _v6.6) and the vector type (geo for GeoJSON, topo for TopoJSON) (ex. russia_states_v1.geo.json). Copy this file to the data directory.
  16. Complete the emsFormats properties: type is either geojson or topojson, file is the filename specified above, default is true when there is only one format. Subsequent formats can be added but only one item in the array can have default: true. The other items must be default: false or omit default entirely.
  17. Copy and paste the SPARQL query from Sophox to the query.sparql field in the source file.
  18. Use the scripts/wikidata-labels.js script to list the humanReadableName languages from Wikidata (e.g. node scripts/wikidata-labels.js Q33). You should spot check these translations as some languages might lack specificity (e.g. Provins rather than Kinas provinser).
  19. We should maintain the current precedent for title casing legacyIds and English labels of the humanReadableName. This may need to be manually edited in the source (e.g. Paraguay Departments).
  20. All fields used by sources that do not follow the label_<language_code> schema must have translations in (schema/fields.hjson). If necessary, use the scripts/wikidata-labels.js script to list translations and copy them to (schema/fields.hjson) (e.g. node scripts/wikidata-labels P5097).
  21. Use the following bash command to generate the timestamp for the createdAt field. Use gdate on Mac OSX. date -u +"%Y-%m-%dT%H:%M:%S.%6N"
  22. Generate a 17 digit number for the id field. A timestamp using the following bash command is suitable. Use gdate On Mac OSX. date +%s%6N
  23. The filename field in the source file should match the name of the file you added to the data directory.
  24. Run yarn test to test for errors.
  25. Invalid or non-simple geometry errors that occur during testing can usually be fixed by running the clean-geom.js script against the GeoJSON file (e.g. node scripts/clean-geom.js data/usa_states_v1.geo.json).
  26. Run ./build.sh to build the manifest and blob files locally.