Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce 64-bit unsigned long field type #60050

Merged
merged 18 commits into from
Sep 23, 2020

Conversation

mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Jul 22, 2020

This field type supports

  • indexing of integer values from [0, 18446744073709551615]
  • precise queries (term, range)
  • precise sorting and terms aggs.
  • other aggs are based on double values.

Sort values, term aggs keys, script values, doc values fields,
source fields return long (for values <=2^63-1)
and BigInteger (for values > 2^63-1).

Closes #32434

@mayya-sharipova mayya-sharipova added >feature :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0 v7.10.0 labels Jul 22, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jul 22, 2020
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorting and aggregations is based on conversion of long values
to double and can be imprecise for large values.

I did not check the pr yet but I wonder why sorting cannot be done on the unsigned value, We can use Long#compareUnsigned in a custom comparator ?

@mayya-sharipova
Copy link
Contributor Author

@jimczi Thanks, Jim, I will look into this.

@mayya-sharipova mayya-sharipova force-pushed the unsigned64bits_integer branch 2 times, most recently from 8861d6d to c276ffd Compare July 22, 2020 20:14
This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- sorting and aggregations is based on conversion of long values
  to double and can be imprecise for large values.

Closes elastic#32434
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start. I agree with Jim's comment that it would be great if sorting was accurate, like ranges.

@mayya-sharipova
Copy link
Contributor Author

@jimczi @jpountz Thanks for the initial review. I have addressed your comments, please continue to review when you have time.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks great @mayya-sharipova! I left some comments but nothing major.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed some small points while getting up to speed with the change !

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Oct 16, 2020
Collapse was not working on unsigned_long field,
as collapsing was enabled only on KeywordFieldType and NumberFieldType.

This introduces a new method `collapseType` to MappedFieldType,
that is checked to decide if collapsing should be enabled.

Relates to elastic#60050
mayya-sharipova added a commit that referenced this pull request Oct 16, 2020
Collapse was not working on unsigned_long field,
as collapsing was enabled only on KeywordFieldType and NumberFieldType.

This introduces a new method `collapseType` to MappedFieldType,
that is checked to decide if collapsing should be enabled.

Relates to #60050
mayya-sharipova added a commit that referenced this pull request Oct 16, 2020
UnsignedLongTests for the range agg was  using very specific intervals
that double type can not distinguish due to lack of precision:

9.223372036854776000E18 == 9.223372036854775807E18 returns true

If we add the corresponding range query test, it will return different
number of hits than the range agg, as range query unlike range agg
doesn't convert valued to double type, and hence more precise.

This patch make broader ranges for the range agg test (so values
converted to doubles don't loose precision), and hence corresponding
range query will return the same number of hits.

Relates to #60050
mayya-sharipova added a commit that referenced this pull request Oct 16, 2020
UnsignedLongTests for the range agg was  using very specific intervals
that double type can not distinguish due to lack of precision:

9.223372036854776000E18 == 9.223372036854775807E18 returns true

If we add the corresponding range query test, it will return different
number of hits than the range agg, as range query unlike range agg
doesn't convert valued to double type, and hence more precise.

This patch make broader ranges for the range agg test (so values
converted to doubles don't loose precision), and hence corresponding
range query will return the same number of hits.

Relates to #60050
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Oct 19, 2020
Max and min aggs were producing wrong results for unsigned_long field
if field was indexed. If field is indexed for max/min aggs instead of
field data, we use values from indexed Points, values of which
are derived using method pointReaderIfPossible. Before
UnsignedLongFieldType#pointReaderIfPossible was incorrectly
producing values, as it failed to shift them back to original
values.

This patch fixes method pointReaderIfPossible to produce
correct original values.

Relates to elastic#60050
mayya-sharipova added a commit that referenced this pull request Oct 19, 2020
Max and min aggs were producing wrong results for unsigned_long field
if field was indexed. If field is indexed for max/min aggs instead of
field data, we use values from indexed Points, values of which
are derived using method pointReaderIfPossible. Before
UnsignedLongFieldType#pointReaderIfPossible was incorrectly
producing values, as it failed to shift them back to original
values.

This patch fixes method pointReaderIfPossible to produce
correct original values.

Relates to #60050
mayya-sharipova added a commit that referenced this pull request Oct 19, 2020
Max and min aggs were producing wrong results for unsigned_long field
if field was indexed. If field is indexed for max/min aggs instead of
field data, we use values from indexed Points, values of which
are derived using method pointReaderIfPossible. Before
UnsignedLongFieldType#pointReaderIfPossible was incorrectly
producing values, as it failed to shift them back to original
values.

This patch fixes method pointReaderIfPossible to produce
correct original values.

Relates to #60050
mayya-sharipova added a commit that referenced this pull request Oct 19, 2020
Max and min aggs were producing wrong results for unsigned_long field
if field was indexed. If field is indexed for max/min aggs instead of
field data, we use values from indexed Points, values of which
are derived using method pointReaderIfPossible. Before
UnsignedLongFieldType#pointReaderIfPossible was incorrectly
producing values, as it failed to shift them back to original
values.

This patch fixes method pointReaderIfPossible to produce
correct original values.

Relates to #60050
pugnascotia pushed a commit to pugnascotia/elasticsearch that referenced this pull request Oct 21, 2020
Max and min aggs were producing wrong results for unsigned_long field
if field was indexed. If field is indexed for max/min aggs instead of
field data, we use values from indexed Points, values of which
are derived using method pointReaderIfPossible. Before
UnsignedLongFieldType#pointReaderIfPossible was incorrectly
producing values, as it failed to shift them back to original
values.

This patch fixes method pointReaderIfPossible to produce
correct original values.

Relates to elastic#60050
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Oct 22, 2020
Adds support for the unsigned_long type to data frame analytics.

This type is handled in the same way as the long type.  Values
sent to the ML native processes are converted to floats and
hence will lose accuracy when outside the range where a float
can uniquely represent long values.

Relates elastic#60050
droberts195 added a commit that referenced this pull request Oct 22, 2020
Adds support for the unsigned_long type to data frame analytics.

This type is handled in the same way as the long type.  Values
sent to the ML native processes are converted to floats and
hence will lose accuracy when outside the range where a float
can uniquely represent long values.

Relates #60050
bpintea added a commit that referenced this pull request Feb 3, 2022
This introduces the UNSIGNED_LONG type to QL following its availability in ES (#60050).

The type is mapped to a BigInteger whose value is checked against the UL bounds.

The SQL will now support the type as literal and in the arithmetic functions; the non-arithmetic functions however are unchanged (i.e. they still require a long / int parameter where that is the case).

The type is version-gated: for the driver SQL clients (only) the server checks their version and in case this is lower than the one introducing the UL support, it fails the request, for queries, or simply hidden in catalog functions (similar to how UNSUPPORTED is currently treated in the similar case)

The JDBC tests are adjusted to read the (bwc) version of the driver they are run against and selectively disable part of the tests accordingly.

Closes #63312
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.10.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for 64-bit unsigned integers
6 participants