Upgrade to Elasticsearch 2 #325

orangejulius · 2016-04-26T16:28:20Z

Elasticsearch 2 brings a lot of new features we want, such as faster Geo tools, an improved FST, and better tools for monitoring performance. We have a branch for experimental support, but there's actually quite a bit to the process. There are also some backwards incompatible changes in Elasticsearch 2 that we'll have to work around.

The process for migration might look something like this:

1.) Make whatever non-backwards incompatible changes to support ES 2.0 are possible before doing anything else. This just ensures the number of changes in play during the actual migration are minimal. 324f1fd and 0b35e6a from the experimental branch mostly cover this part already.

2.) Modify the API to support querying against multiple, configurable indices. pelias/api#334 describes part of this work. We will need this for...

3.) Modify the importers and schema to store admin regions and addresses/venues in two separate indices, with different n-grams settings (1-gram for admin regions, 2-gram for addresses/venues). Elasticsearch doesn't support our current setup where different types in the same index have different analysis settings. We want to make this change before, not alongside, the upgrade, so that we can see exactly what performance and result quality affects upgrading has. This change by itself should lead to equivalent results.

4.) Actually upgrade to ES 2.0

Update: we are moving along with the upgrade. Here's a list of task we still have to tackle:

orangejulius · 2016-07-02T15:01:34Z

The dev and prod_build builds from Thursday just finished. They both failed after successfully ingesting all the data. I didn't look into the dev failure, but the prod_build one hit an error when running the acceptance tests, which I'm sure we can fix without too much trouble.

I also realized that the Elasticsearch APIs we use to optimize the index before rotating has changed, so we'll have to take a look at that as well.

missinglink · 2016-07-06T09:34:51Z

the major behavioral changes:

improved handling of numeric values

prior to this release, numerals were treated the same as letters when it came to creating prefix-grams, so a token like 171 would create the tokens [ 1, 17, 171 ], this caused the undesirable behavior as shown in the pic below. this is now fixed, the same number will now only create a single token [ 171 ].

improved local focus

the TF/IDF scoring has been disabled for partial token matching (eg. only the last word typed using /v1/autocomplete).

as a result, we get better local biasing without having to change the scoring weights ✌️

search using single tokens

prior to this release we ignored single character tokens (except the very first keypress) due to performance reasons; after some refactoring and performance testing we are pleased to re-enabled this functionality.

missinglink · 2016-07-06T13:23:51Z

from the heff:

also some cleanup for you after all this is done and dusted... you can remove a few cookbooks from the pelias stacks. At a glance, I'd say apache2 and mapzen_elasticsearch.

before I forget, other stack cleanup when all this is done, you can pull all the elasticsearch json references:

"mapzen_elasticsearch": {
    "marvel": {
      "prune_days": "7"
    }
  },

"elasticsearch": {
    "version": "1.7.2",
    "skip_restart": true,
    "cluster_name": "mapzen-pelias-prod-us-east",
    "index": {
      "number_of_shards": 24,
      "number_of_replicas": 1
    },
    "plugin": {
      "mandatory": [
        "cloud-aws"
      ]
    },
    "plugins": {
      "marvel": {
        "url": "http://download.elasticsearch.org/elasticsearch/marvel/marvel-latest.zip"
      }
    },
    "custom_config": {
      "index.refresh_interval": "30s",
      "action.disable_shutdown": true,
      "action.destructive_requires_name": true,
      "index.search.slowlog.threshold.query.warn": "-1",
      "index.search.slowlog.threshold.query.info": "-1",
      "index.search.slowlog.threshold.query.debug": "-1",
      "index.search.slowlog.threshold.query.trace": "-1",
      "index.search.slowlog.threshold.fetch.warn": "-1",
      "index.search.slowlog.threshold.fetch.info": "-1",
      "index.search.slowlog.threshold.fetch.debug": "-1",
      "index.search.slowlog.threshold.fetch.trace": "-1",
      "index.indexing.slowlog.threshold.index.warn": "-1",
      "index.indexing.slowlog.threshold.index.info": "-1",
      "index.indexing.slowlog.threshold.index.debug": "-1",
      "index.indexing.slowlog.threshold.index.trace": "-1"
    }
  },

orangejulius · 2016-07-07T15:47:18Z

Regarding that second to last task for ops scripts, I think we're good and there are in fact no backwards incompatible changes in ES2 that we have to accommodate. I'll check back in after our next build which should finish just fine!

orangejulius · 2016-07-08T15:24:30Z

Our latest build did indeed pass (:tada:), so we are good in the ops scripts department!

trescube · 2016-07-08T15:25:26Z

orangejulius · 2016-07-08T15:39:21Z

So, while our build finished successfully, our script to deal with rolling over the builds to production isn't set up to handle Elasticsearch being in a different stack than the rest of our infrastructure, so I'm not sure it will work very well.

orangejulius · 2016-07-22T19:21:37Z

Update: the ops scripts are updated, and tested on dev2. After the current prod_build build finishes, and we roll over to prod early next week, we can check that box.

orangejulius · 2016-08-02T21:04:53Z

Rollover script worked, we are done! 🎉

orangejulius added this to the Dependency Upgrades milestone Apr 26, 2016

orangejulius added the Q2-2016 label May 2, 2016

orangejulius mentioned this issue May 13, 2016

remove references to deprecated property _cache pelias/query#17

Merged

orangejulius assigned missinglink May 17, 2016

orangejulius added in progress and removed Q2-2016 labels May 17, 2016

This was referenced May 17, 2016

upgrade to ES2+ pelias/api#538

Merged

upgrade ES API version from 1.7->2.2 pelias/config#24

Merged

orangejulius mentioned this issue Jun 1, 2016

Use Docvalues when possible pelias/schema#50

Closed

dianashk added in review and removed in progress labels Jun 30, 2016

orangejulius mentioned this issue Jul 1, 2016

ES2 Support? #198

Closed

orangejulius mentioned this issue Jul 5, 2016

Places in Copenhagen having Sweden as country #368

Closed

orangejulius mentioned this issue Jul 6, 2016

Some Kansas locations suggest numbers as feature name labels #238

Closed

dianashk added in progress and removed in review labels Jul 20, 2016

orangejulius assigned orangejulius and unassigned missinglink Aug 2, 2016

orangejulius closed this as completed Aug 2, 2016

orangejulius removed the in progress label Aug 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to Elasticsearch 2 #325

Upgrade to Elasticsearch 2 #325

orangejulius commented Apr 26, 2016 •

edited

orangejulius commented Jul 2, 2016

missinglink commented Jul 6, 2016

missinglink commented Jul 6, 2016 •

edited

orangejulius commented Jul 7, 2016

orangejulius commented Jul 8, 2016

trescube commented Jul 8, 2016

orangejulius commented Jul 8, 2016

orangejulius commented Jul 22, 2016

orangejulius commented Aug 2, 2016

Upgrade to Elasticsearch 2 #325

Upgrade to Elasticsearch 2 #325

Comments

orangejulius commented Apr 26, 2016 • edited

orangejulius commented Jul 2, 2016

missinglink commented Jul 6, 2016

missinglink commented Jul 6, 2016 • edited

orangejulius commented Jul 7, 2016

orangejulius commented Jul 8, 2016

trescube commented Jul 8, 2016

orangejulius commented Jul 8, 2016

orangejulius commented Jul 22, 2016

orangejulius commented Aug 2, 2016

orangejulius commented Apr 26, 2016 •

edited

missinglink commented Jul 6, 2016 •

edited