Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch 6 Support #719

Open
orangejulius opened this issue Mar 24, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@orangejulius
Copy link
Member

commented Mar 24, 2018

Hey team!

The Elasticsearch team moves very fast, and before we can even add support for ES5, ES6 has come out and ES7 is even on the horizon!

The current todo for this is:

@thucnc

This comment has been minimized.

Copy link

commented Apr 6, 2018

It would be greater if there is a good migration plan from ES 2.4.

@karussell

This comment has been minimized.

Copy link

commented Apr 15, 2018

Not sure if it helps, but photon did also this migration and at the end it went through

komoot/photon#254

(and a few other follow up issues)

The most notable changes: the index got much bigger (we have not yet investigated why) and reverse geocoding got much faster.

@orangejulius

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2018

@karussell thanks for the valuable info. Index size getting bigger would be a big deal for Pelias, our global indices are already big enough :)

But faster reverse geocoding would be very welcome.

orangejulius referenced this issue in Tuxuri/pelias-schema Apr 15, 2018

orangejulius added a commit to pelias/schema that referenced this issue May 2, 2018

Use a single Elasticsearch mapping type
Currently, we define a unique Elasticsearch mapping type for each layer.
There are 20 different layers in our standard datsets now, but all the
mapping types are identical.

This currently is non-optimal, but not really a big deal. However, in
Elasticsearch 6, multiple mapping types is no longer supported. So we
might as well get ahead of things now and do it.

One immediate benefit, this change removes 4370 duplicate mapping type
definition lines from the expected schema fixture. The time saved in
updating that file alone when we make future changes will be huge! :)

Connects pelias/pelias#719

orangejulius added a commit to pelias/model that referenced this issue May 2, 2018

Remove _type field
Currently, we create numerous elasticsearch types, corresponding to
different layers. All the types are identical, so they don't really
serve any value.

In Elasticsearch 6 [mapping types will go away](https://www.elastic.co/guide/en/elasticsearch/reference/6.2/removal-of-types.html).
The sooner we can remove our minimal usage of types, the easier that
transition will be.

Its possible that this will give us a performance benefit right away,
although it probably won't. It _will_ simplify our code a bit though!

Connects pelias/pelias#719

orangejulius added a commit to pelias/schema that referenced this issue May 3, 2018

Use a single Elasticsearch mapping type
Currently, we define a unique Elasticsearch mapping type for each layer.
There are 20 different layers in our standard datsets now, but all the
mapping types are identical.

This currently is non-optimal, but not really a big deal. However, in
Elasticsearch 6, multiple mapping types is no longer supported. So we
might as well get ahead of things now and do it.

One immediate benefit, this change removes 4370 duplicate mapping type
definition lines from the expected schema fixture. The time saved in
updating that file alone when we make future changes will be huge! :)

Connects pelias/pelias#719
@MaherBTA

This comment has been minimized.

Copy link

commented May 12, 2018

Hi,
I think you did the first step by using one single type field. great job!
How can I participate ?

@orangejulius

This comment has been minimized.

Copy link
Member Author

commented May 12, 2018

Hey @MaherBTA,
A big help would be a pull request, or even just a branch, with the changes you made for ES6 compared to the pelias/schema repository. With those commits I can rebase things around and start merging stuff.

orangejulius added a commit to pelias/model that referenced this issue May 16, 2018

Remove _type field
Currently, we create numerous elasticsearch types, corresponding to
different layers. All the types are identical, so they don't really
serve any value.

In Elasticsearch 6 [mapping types will go away](https://www.elastic.co/guide/en/elasticsearch/reference/6.2/removal-of-types.html).
The sooner we can remove our minimal usage of types, the easier that
transition will be.

Its possible that this will give us a performance benefit right away,
although it probably won't. It _will_ simplify our code a bit though!

Connects pelias/pelias#719

orangejulius added a commit to pelias/model that referenced this issue May 16, 2018

feat(types): Remove _type field
Currently, we create numerous elasticsearch types, corresponding to
different layers. All the types are identical, so they don't really
serve any value.

In Elasticsearch 6 [mapping types will go away](https://www.elastic.co/guide/en/elasticsearch/reference/6.2/removal-of-types.html).
The sooner we can remove our minimal usage of types, the easier that
transition will be.

Its possible that this will give us a performance benefit right away,
although it probably won't. It _will_ simplify our code a bit though!

Connects pelias/pelias#719

orangejulius added a commit to pelias/schema that referenced this issue May 18, 2018

Use a single Elasticsearch mapping type
Currently, we define a unique Elasticsearch mapping type for each layer.
There are 20 different layers in our standard datsets now, but all the
mapping types are identical.

This currently is non-optimal, but not really a big deal. However, in
Elasticsearch 6, multiple mapping types is no longer supported. So we
might as well get ahead of things now and do it.

One immediate benefit, this change removes 4370 duplicate mapping type
definition lines from the expected schema fixture. The time saved in
updating that file alone when we make future changes will be huge! :)

Connects pelias/pelias#719

orangejulius added a commit to pelias/model that referenced this issue May 19, 2018

feat(types): Remove _type field
Currently, we create numerous elasticsearch types, corresponding to
different layers. All the types are identical, so they don't really
serve any value.

In Elasticsearch 6 [mapping types will go away](https://www.elastic.co/guide/en/elasticsearch/reference/6.2/removal-of-types.html).
The sooner we can remove our minimal usage of types, the easier that
transition will be.

Its possible that this will give us a performance benefit right away,
although it probably won't. It _will_ simplify our code a bit though!

Connects pelias/pelias#719

orangejulius added a commit to pelias/model that referenced this issue Sep 11, 2018

feat(types): Remove _type field
Currently, we create numerous elasticsearch types, corresponding to
different layers. All the types are identical, so they don't really
serve any value.

In Elasticsearch 6 [mapping types will go away](https://www.elastic.co/guide/en/elasticsearch/reference/6.2/removal-of-types.html).
The sooner we can remove our minimal usage of types, the easier that
transition will be.

Its possible that this will give us a performance benefit right away,
although it probably won't. It _will_ simplify our code a bit though!

Connects pelias/pelias#719

orangejulius added a commit to pelias/model that referenced this issue Sep 11, 2018

feat(types): Remove _type field
Currently, we create numerous elasticsearch types, corresponding to
different layers. All the types are identical, so they don't really
serve any value.

In Elasticsearch 6 [mapping types will go away](https://www.elastic.co/guide/en/elasticsearch/reference/6.2/removal-of-types.html).
The sooner we can remove our minimal usage of types, the easier that
transition will be.

Its possible that this will give us a performance benefit right away,
although it probably won't. It _will_ simplify our code a bit though!

Connects pelias/pelias#719

orangejulius added a commit to pelias/model that referenced this issue Sep 11, 2018

feat(types): Remove _type field
Currently, we create numerous elasticsearch types, corresponding to
different layers. All the types are identical, so they don't really
serve any value.

In Elasticsearch 6 [mapping types will go away](https://www.elastic.co/guide/en/elasticsearch/reference/6.2/removal-of-types.html).
The sooner we can remove our minimal usage of types, the easier that
transition will be.

Its possible that this will give us a performance benefit right away,
although it probably won't. It _will_ simplify our code a bit though!

Connects pelias/pelias#719

orangejulius added a commit to pelias/schema that referenced this issue Oct 25, 2018

feat(mapping): use "index": "not_analyzed" for literal fields
As guessed in #99, there _are_
differences between setting `"index": "not_analyzed"` for a field, and
merely setting the analyzer to `keyword`.

They are detailed in the Elasticsearch 2.4 [String datatype](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html#string-params)
documentation, although it's a little bit confusing.

In Elasticsearch 5+, there are _two_ different types of string
datatypes:

- [`text`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/text.html) and
- [`keyword`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/keyword.html).

These documentation pages make the difference much more clear. In short,
in Elasticsearch 2.4, setting `"index": "not_analyzed"` gives the
following changes, all of which we'd like for these literal fields:

- Analysis is skipped all together, the raw value is added to the index
directly (this is pretty much equivalent to setting `analyzer: keyword`)
- [norms](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/norms.html) are disabled for the field, saving some disk space
- [doc_values](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/doc-values.html) are _en_abled.

The last one is most interesting. In short, doc_values take up a little
disk space but allow us to very efficiently perform aggregations. Pelias
doesn't generally perform aggregations today. However, after we begin
using a [single mapping type](#293), we will have no way for the [pelias dashboard](https://github.com/pelias/dashboard) or any of our own analysis scripts to provide document counts for different sources or layers. The dasbhoard currently uses an API to get the count of various mapping types, which won't be supported going forward.

While minor, we needed a solution to this, and the only other one is
fielddata which is extremely expensive in terms of memory usage.

In my testing, for the Portland metro Docker project, disk usage went
from 451MB to 473MB, or about a 5% increase.

If we wanted to trim that down a bit, we could consider disabling
`doc_values` for the `parent.*_id` fields. We don't have an immediate
need for `doc_values` on those fields, although it might be interesting
for analysis.

Summary
------

While not technically required for [Elasticsearch 5 support](pelias/pelias#461), this PR does bring us more in line with the best practices of ES5.

It also sets us up for [Elasticsearch 6](pelias/pelias#719) where the `string`
datatype we use now is completely removed.

Fixes #99

JWileczek added a commit to JWileczek/schema that referenced this issue Oct 26, 2018

feat(mapping): use "index": "not_analyzed" for literal fields
As guessed in pelias#99, there _are_
differences between setting `"index": "not_analyzed"` for a field, and
merely setting the analyzer to `keyword`.

They are detailed in the Elasticsearch 2.4 [String datatype](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html#string-params)
documentation, although it's a little bit confusing.

In Elasticsearch 5+, there are _two_ different types of string
datatypes:

- [`text`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/text.html) and
- [`keyword`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/keyword.html).

These documentation pages make the difference much more clear. In short,
in Elasticsearch 2.4, setting `"index": "not_analyzed"` gives the
following changes, all of which we'd like for these literal fields:

- Analysis is skipped all together, the raw value is added to the index
directly (this is pretty much equivalent to setting `analyzer: keyword`)
- [norms](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/norms.html) are disabled for the field, saving some disk space
- [doc_values](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/doc-values.html) are _en_abled.

The last one is most interesting. In short, doc_values take up a little
disk space but allow us to very efficiently perform aggregations. Pelias
doesn't generally perform aggregations today. However, after we begin
using a [single mapping type](pelias#293), we will have no way for the [pelias dashboard](https://github.com/pelias/dashboard) or any of our own analysis scripts to provide document counts for different sources or layers. The dasbhoard currently uses an API to get the count of various mapping types, which won't be supported going forward.

While minor, we needed a solution to this, and the only other one is
fielddata which is extremely expensive in terms of memory usage.

In my testing, for the Portland metro Docker project, disk usage went
from 451MB to 473MB, or about a 5% increase.

If we wanted to trim that down a bit, we could consider disabling
`doc_values` for the `parent.*_id` fields. We don't have an immediate
need for `doc_values` on those fields, although it might be interesting
for analysis.

Summary
------

While not technically required for [Elasticsearch 5 support](pelias/pelias#461), this PR does bring us more in line with the best practices of ES5.

It also sets us up for [Elasticsearch 6](pelias/pelias#719) where the `string`
datatype we use now is completely removed.

Fixes pelias#99

orangejulius added a commit to pelias/schema that referenced this issue Nov 2, 2018

feat(mapping): use "index": "not_analyzed" for literal fields
As guessed in #99, there _are_
differences between setting `"index": "not_analyzed"` for a field, and
merely setting the analyzer to `keyword`.

They are detailed in the Elasticsearch 2.4 [String datatype](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html#string-params)
documentation, although it's a little bit confusing.

In Elasticsearch 5+, there are _two_ different types of string
datatypes:

- [`text`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/text.html) and
- [`keyword`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/keyword.html).

These documentation pages make the difference much more clear. In short,
in Elasticsearch 2.4, setting `"index": "not_analyzed"` gives the
following changes, all of which we'd like for these literal fields:

- Analysis is skipped all together, the raw value is added to the index
directly (this is pretty much equivalent to setting `analyzer: keyword`)
- [norms](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/norms.html) are disabled for the field, saving some disk space
- [doc_values](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/doc-values.html) are _en_abled.

The last one is most interesting. In short, doc_values take up a little
disk space but allow us to very efficiently perform aggregations. Pelias
doesn't generally perform aggregations today. However, after we begin
using a [single mapping type](#293), we will have no way for the [pelias dashboard](https://github.com/pelias/dashboard) or any of our own analysis scripts to provide document counts for different sources or layers. The dasbhoard currently uses an API to get the count of various mapping types, which won't be supported going forward.

While minor, we needed a solution to this, and the only other one is
fielddata which is extremely expensive in terms of memory usage.

This PR disables doc_values for all fields except `source` and `layer`,
which gives us about a 4% disk space savings. Merely changing the literal
field to use `not_analyzed` _increases_ disk space goes up around 3%, so
this is roughly a 7% win!

Summary
------

While not technically required for [Elasticsearch 5 support](pelias/pelias#461), this PR does bring us more in line with the best practices of ES5.

It also sets us up for [Elasticsearch 6](pelias/pelias#719) where the `string`
datatype we use now is completely removed.

Fixes #99

orangejulius added a commit to pelias/schema that referenced this issue Nov 2, 2018

feat(mapping): use "index": "not_analyzed" for literal fields
As guessed in #99, there _are_
differences between setting `"index": "not_analyzed"` for a field, and
merely setting the analyzer to `keyword`.

They are detailed in the Elasticsearch 2.4 [String datatype](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html#string-params)
documentation, although it's a little bit confusing.

In Elasticsearch 5+, there are _two_ different types of string
datatypes:

- [`text`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/text.html) and
- [`keyword`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/keyword.html).

These documentation pages make the difference much more clear. In short,
in Elasticsearch 2.4, setting `"index": "not_analyzed"` gives the
following changes, all of which we'd like for these literal fields:

- Analysis is skipped all together, the raw value is added to the index
directly (this is pretty much equivalent to setting `analyzer: keyword`)
- [norms](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/norms.html) are disabled for the field, saving some disk space
- [doc_values](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/doc-values.html) are _en_abled.

The last one is most interesting. In short, doc_values take up a little
disk space but allow us to very efficiently perform aggregations. Pelias
doesn't generally perform aggregations today. However, after we begin
using a [single mapping type](#293), we will have no way for the [pelias dashboard](https://github.com/pelias/dashboard) or any of our own analysis scripts to provide document counts for different sources or layers. The dasbhoard currently uses an API to get the count of various mapping types, which won't be supported going forward.

While minor, we needed a solution to this, and the only other one is
fielddata which is extremely expensive in terms of memory usage.

This PR disables doc_values for all fields except `source` and `layer`,
which gives us about a 4% disk space savings. Merely changing the literal
field to use `not_analyzed` _increases_ disk space goes up around 3%, so
this is roughly a 7% win!

Summary
------

While not technically required for [Elasticsearch 5 support](pelias/pelias#461), this PR does bring us more in line with the best practices of ES5.

It also sets us up for [Elasticsearch 6](pelias/pelias#719) where the `string`
datatype we use now is completely removed.

Fixes #99

orangejulius added a commit to pelias/schema that referenced this issue Nov 3, 2018

feat(mapping): use "index": "not_analyzed" for literal fields
As guessed in #99, there _are_
differences between setting `"index": "not_analyzed"` for a field, and
merely setting the analyzer to `keyword`.

They are detailed in the Elasticsearch 2.4 [String datatype](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html#string-params)
documentation, although it's a little bit confusing.

In Elasticsearch 5+, there are _two_ different types of string
datatypes:

- [`text`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/text.html) and
- [`keyword`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/keyword.html).

These documentation pages make the difference much more clear. In short,
in Elasticsearch 2.4, setting `"index": "not_analyzed"` gives the
following changes, all of which we'd like for these literal fields:

- Analysis is skipped all together, the raw value is added to the index
directly (this is pretty much equivalent to setting `analyzer: keyword`)
- [norms](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/norms.html) are disabled for the field, saving some disk space
- [doc_values](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/doc-values.html) are _en_abled.

The last one is most interesting. In short, doc_values take up a little
disk space but allow us to very efficiently perform aggregations. Pelias
doesn't generally perform aggregations today. However, after we begin
using a [single mapping type](#293), we will have no way for the [pelias dashboard](https://github.com/pelias/dashboard) or any of our own analysis scripts to provide document counts for different sources or layers. The dasbhoard currently uses an API to get the count of various mapping types, which won't be supported going forward.

While minor, we needed a solution to this, and the only other one is
fielddata which is extremely expensive in terms of memory usage.

This PR disables doc_values for all fields except `source` and `layer`,
which gives us about a 4% disk space savings. Merely changing the literal
field to use `not_analyzed` _increases_ disk space goes up around 3%, so
this is roughly a 7% win!

Summary
------

While not technically required for [Elasticsearch 5 support](pelias/pelias#461), this PR does bring us more in line with the best practices of ES5.

It also sets us up for [Elasticsearch 6](pelias/pelias#719) where the `string`
datatype we use now is completely removed.

Fixes #99

orangejulius added a commit to pelias/schema that referenced this issue Nov 3, 2018

feat(mapping): use "index": "not_analyzed" for literal fields
As guessed in #99, there _are_
differences between setting `"index": "not_analyzed"` for a field, and
merely setting the analyzer to `keyword`.

They are detailed in the Elasticsearch 2.4 [String datatype](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html#string-params)
documentation, although it's a little bit confusing.

In Elasticsearch 5+, there are _two_ different types of string
datatypes:

- [`text`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/text.html) and
- [`keyword`](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/keyword.html).

These documentation pages make the difference much more clear. In short,
in Elasticsearch 2.4, setting `"index": "not_analyzed"` gives the
following changes, all of which we'd like for these literal fields:

- Analysis is skipped all together, the raw value is added to the index
directly (this is pretty much equivalent to setting `analyzer: keyword`)
- [norms](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/norms.html) are disabled for the field, saving some disk space
- [doc_values](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/doc-values.html) are _en_abled.

The last one is most interesting. In short, doc_values take up a little
disk space but allow us to very efficiently perform aggregations. Pelias
doesn't generally perform aggregations today. However, after we begin
using a [single mapping type](#293), we will have no way for the [pelias dashboard](https://github.com/pelias/dashboard) or any of our own analysis scripts to provide document counts for different sources or layers. The dasbhoard currently uses an API to get the count of various mapping types, which won't be supported going forward.

While minor, we needed a solution to this, and the only other one is
fielddata which is extremely expensive in terms of memory usage.

This PR disables doc_values for all fields except `source` and `layer`,
which gives us about a 4% disk space savings. Merely changing the literal
field to use `not_analyzed` _increases_ disk space goes up around 3%, so
this is roughly a 7% win!

Summary
------

While not technically required for [Elasticsearch 5 support](pelias/pelias#461), this PR does bring us more in line with the best practices of ES5.

It also sets us up for [Elasticsearch 6](pelias/pelias#719) where the `string`
datatype we use now is completely removed.

Fixes #99

orangejulius added a commit to pelias/documentation that referenced this issue May 7, 2019

Update Elasticsearch requirements
Makes ES5 the default, recommends against ES2.4, announces support for
ES6 coming soon.

Connects pelias/pelias#461
Connects pelias/pelias#719
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.