Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
This issue will track support for Elasticsearch 7 in Pelias.
Most Elasticsearch upgrades require two sets of changes:
Here's the list of breaking changes we'll need to adapt to (this list will be updated over time):
This is the first step in supporting Elasticsearch 7. At this time, Pelias does not work out of the box on ES7, but with a Docker image ready to go, we can begin testing changes for compatibility. This Dockerfile and config is identical to the ES6 Docker image, except for changing the version, and making one update to the `elasticsearch.yml`: In ES7, the bulk thread pool is removed, and both bulk and non-bulk operations go through a single [write](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html#modules-threadpool) thread pool. For Pelias we have found increasing the queue size of this thread pool is useful to ensure imports can succeed without errors, so the configuration file has been updated accordingly. Connects pelias/pelias#831
With the list of changes above as of this writing, an ES7 build and an import of a few million records for the Portland Metro area works well, and querying with the latest API causes no errors.
I'm sure there's more work to do, in particular I think at least one geo query related change will be required, but it looks like the core part of the ES7 upgrade is now fairly well understood!
The first error seen when trying to use our current schema with Elasticsearch 7 is: ``` [illegal_argument_exception] Token filter [word_delimiter] cannot be used to parse synonyms ``` The [word delimiter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html) token filter is only used in one place: the `peliasAdmin` analyzer. Looking at the documentation for `word_delimiter`, it does _a lot_: splitting words, handling punctuation, and even some basic stemming. It really feels like an extremely broad tool and at this point feels like something that Elasticsearch would deprecate in the future. Furthermore, looking at our integration tests, it seems one of the key reasons we used it was to tokenize on hyphens, which we have done using the `peliasNameTokenizer` since #375. Considering how complicated this token filter is, and how it's now being used with relatively little effect, it seems like something we can remove. Connects pelias/pelias#831
At a minimum you should ensure that you've made the following configuration changes for ES7: