New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress geo-point field data #4386

Closed
jpountz opened this Issue Dec 9, 2013 · 3 comments

Comments

Projects
None yet
2 participants
@jpountz
Contributor

jpountz commented Dec 9, 2013

Today we use doubles in order to encode latitudes and longitudes when loading field data for geo points into memory. This is 16 bytes per geo point.

However, we could take advantage of the fact that values are in a fixed range, and maybe trade some precision for memory. In particular, I've been thinking about using a fixed-length encoding with configurable precision. This precision could be configurable in mappings:

PUT /test
{
    "mappings": {
        "test": {
            "properties": {
                "pin": {
                    "type": "geo_point",
                    "fielddata": {
                      "format": "compressed",
                      "precision": "1cm"
                   }
                }
            }
        }
    }
}

Here are some values of the number of bytes needed per geo point depending on the expected precision:

Precision Bytes per point Size reduction
1km 4 75%
3m 6 62.5%
1cm 8 50%
1mm 10 37.5%

I plan to use 1cm has the default, which is good I think since it would be accurate enough for most use-cases and would require 4 bytes per latitude and longitude, which can be efficiently stored in an int[] array, for best speed.

The same encoding could be used to implement doc values support (#4207).

For now, the default format is going to remain exact and based on two double[] arrays, so you need to explicitely opt-in for this format by configuring the field data format in the mappings.

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Dec 10, 2013

Contributor

I really like it :) I just wonder what the perf hit is here. In general I am not too worried about Geo perf hits since most common usecase is calculating distances which is dominating anyways and should be done on the top N anyways. Maybe we should allow to still use float or double arrays?

Contributor

s1monw commented Dec 10, 2013

I really like it :) I just wonder what the perf hit is here. In general I am not too worried about Geo perf hits since most common usecase is calculating distances which is dominating anyways and should be done on the top N anyways. Maybe we should allow to still use float or double arrays?

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Dec 10, 2013

Contributor

David had concern too about the potential performance hit so I'm thinking about keeping the old format around so that users can get back to it if they want to. This is easy to do, I'll just try to share code about how to uninvert data, since the logic is essentially the same.

Contributor

jpountz commented Dec 10, 2013

David had concern too about the potential performance hit so I'm thinking about keeping the old format around so that users can get back to it if they want to. This is easy to do, I'll just try to share code about how to uninvert data, since the logic is essentially the same.

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Dec 10, 2013

Contributor

+1 I guess that make it a no-brainer! good stuff man I really like it

Contributor

s1monw commented Dec 10, 2013

+1 I guess that make it a no-brainer! good stuff man I really like it

jpountz added a commit to jpountz/elasticsearch that referenced this issue Dec 12, 2013

Compressed geo-point field data.
This commit allows to trade precision for memory when storing geo points.
This new field data impl accepts a `precision` parameter that controls the
maximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.

Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles).

Close #4386

@jpountz jpountz closed this in 33599d9 Dec 17, 2013

jpountz added a commit that referenced this issue Dec 18, 2013

Compressed geo-point field data.
This commit allows to trade precision for memory when storing geo points.
This new field data impl accepts a `precision` parameter that controls the
maximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.

Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles).

Close #4386

brusic added a commit to brusic/elasticsearch that referenced this issue Jan 19, 2014

Compressed geo-point field data.
This commit allows to trade precision for memory when storing geo points.
This new field data impl accepts a `precision` parameter that controls the
maximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.

Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles).

Close #4386

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Compressed geo-point field data.
This commit allows to trade precision for memory when storing geo points.
This new field data impl accepts a `precision` parameter that controls the
maximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.

Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles).

Close #4386
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment