Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unsigned long field type in painless #64361

Closed
stu-elastic opened this issue Oct 29, 2020 · 13 comments · Fixed by #76519
Closed

Support unsigned long field type in painless #64361

stu-elastic opened this issue Oct 29, 2020 · 13 comments · Fixed by #76519
Labels
>bug :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache Team:Core/Infra Meta label for core/infra team

Comments

@stu-elastic
Copy link
Contributor

The casting model needs to be overhauled to support the unsigned long field type, added in 60050, as this type can return a BigInteger.

Type promotion in the casting model works well for primitive types, String and def.

The def type is tricky to handle with boxed types and type promotion, the current model has limited support for boxed types.

As this is a major rework on a subtle subsystem, we should look toward interim support for unsigned long.

@jimczi @mayya-sharipova what do you think about having unsigned long field return doubles for ScriptDocValues?

Relates: #64347

@stu-elastic stu-elastic added >bug :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels Oct 29, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Oct 29, 2020
@jdconrad
Copy link
Contributor

There are two major issues with unsigned long fields in scripting as it stands.

  1. As pointed out here, our casting model does not have good support for Big* types. We discussed this in the weekly core-infra meeting, and it's something we do want to add to Painless, but it's a large change to the casting model.
  2. unsigned long can return two different types - long or BigInteger. This works because we happen to return def form doc access, but it makes any currently possible work around more difficult because this means a user has to do an instanceof check to see if the field returned BigInteger to get a useful value out of it to do operations on.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Oct 30, 2020

what do you think about having unsigned long field return doubles for ScriptDocValues?

It depends what precision a user expects. Conversion to double leads to loss of precision on last digits. For aggregations it doesn't much matter, as they already work on doubles, but for something like script_fields I think this loss of precision can be significant.

it's something we do want to add to Painless, but it's a large change to the casting model

I would suggest before this is ready, to leave things as they are and just to add a warning in the documentation for unsigned_long about implicit casts. PR #64422 already provides examples how unsigned_long should be used in scripts, we can enhance the documentation even further.

WDYT?

@jdconrad
Copy link
Contributor

jdconrad commented Oct 30, 2020

@mayya-sharipova I took a look at #64422 and the documentation is good. My two main concerns noted above remain, however.

Here's what I would propose instead:

  1. For any conversions to other types use doubleValue or longValue instead off of casting here. In Painless, when a method is called on a primitive type, we actually implicitly box the primitive type on behalf of the user. This means an unsigned long field can return either a BigInteger or a long and this method can be called on either one with the same result.

Now a script looks like double x = doc['my_ul_field'].value.doubleValue();

(I know this is how you've written the docs 👍)

  1. We can add bigIntegerValue and bigDecimalValue as augmentations of each of the boxed numeric types so if a user wants a BigInteger out of the unsigned long field, it's returned in a consistent way.

Now a script looks like BigInteger x = doc['my_ul_field'].value.bigIntegerValue();

  1. We remove the implicit/explicit casting for BigInteger from defTodoubleExplicit and defTodoubleImplicit since this is inconsistent with casting for all other types.

These changes allow for strong consistency in how unsigned long fields are accessed and remove any potential casting inconsistencies.

@rjernst
Copy link
Member

rjernst commented Oct 30, 2020

I have a different proposal, not better or worse, just another alternative. Unsigned longs fit into a normal long in Java, it is just how one interprets the bytes. Thanks to two's complement, basic operations like addition, subtraction and multiplication work. And for other operations, Java has intrinsic methods like Long.compareUnsigned. So instead of creating an object (BigInteger) or having complexity with sometimes returning an object or a primitive, we could return a long and tell the user to use the special intrinsics for certain operations.

@jdconrad
Copy link
Contributor

@rjernst proposal will have the best performance.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Nov 2, 2020

Current problems with unsigned_long in scripts:

  • incorrect implicit casting to Long when running some math operations.
    For example, a script doc['my_ul_field'].value+1 for a value 18446744073709551614 incorrectly returns 0 instead of the correct answer: BigInteger 18446744073709551615.

  • for proper math operations a need to use instanceof in script

  • poor performance when running script on all documents as a lot of BigInteger objects need to be created:

    • sort on _script (BigIntegers are created only to be converted to double)
    • script_score query (BigIntegers are created only to be converted to double)
    • scripted aggs

Solution options:

  1. Do nothing. Only update documentation with a warning about the implicit cast and examples of proper usage with instanceOf

  2. ScriptDocValues returns double

    • doc['my_ul_field'].value is double
    • what if a user needs a precise BigInteger? Can we add an extra function: getBigIntegerOrLong() or getBigInteger()?
  3. Disable implicit and explicit casts.

    • As it is currently, ScriptDocValues returns Long or BigInteger
    • A user needs to explicitly specify in script what type they need: doc['my_ul_field'].value.doubleValue(), doc['my_ul_field'].value.longValue(), doc['my_ul_field'].value.bigIntegerValue()
    • no need for instaceOf method, as these methods work both Long and BigInteger types.
  4. ScriptDocValues returns long in the unsigned_long format, where the first bit represents not a sign, but the most important digit.

    • It is user's job to interpret bytes properly
    • Inconsistencies between _script and field queries: sort with _script doc['my_ul_field'].value returns different results than sort on field; terms agg with script will returns different results from term agg on field
    • What about internal handling of this format? How we can get implicit doubleValue() from this format necessary for script_score and scripted sort?

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Nov 3, 2020

We had a team discussion and decided we would like to return ScriptDocValues for unsigned_long always return the same type. This type could be:

  • BigInteger
    • pluses: no loss of precision
    • minuses: potentially creating a lot of BigInteger objects that could discarded immediately (for example for double conversation)
  • long in unsigned_long format with introduction of some helper functions (doubleValue(), bigIntegerValue())
    • pluses: performance
    • minuses:
      • inconsistency with other search requests (e.g. sort fields, docvalue fields)
      • potential difficulties handling this type for a user. A user would need to explicitly convert to doubleValue(), otherwise most scripts (sort, score, aggs) will return incorrect results.

Since we have not made yet a final decision how to handle unsigned_long in scripts, we have decided to disable using unsigned_long in scripts in 7.10.

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Nov 3, 2020
mayya-sharipova added a commit that referenced this issue Nov 3, 2020
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Nov 3, 2020
mayya-sharipova added a commit that referenced this issue Nov 3, 2020
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Nov 3, 2020
mayya-sharipova added a commit that referenced this issue Nov 3, 2020
@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Nov 5, 2020

I have an idea of a slightly different proposal:

  1. make value (getValue, get(index)) unsupported operation. Or alternatively it returns a long in unsigned_long format.
  2. introduce two operations: getDoubleValue() and getBigIntegerValue(). We can also support corresponding functions for multiple values : double[]: getDoubleValues() and BigInteger[]: getBigIntegerValues()

I can see several pluses for this proposal:

  • As value function is not supported, we don't expose our internal format by presenting long unsigned values to a user.
  • We don't create unnecessary BigInteger objects if a user needs double values
  • All documents will return consistent types (either BigInteger or Double)

One minus for this proposal:

  • We can't use the same script across long and unsigned_long types. But even with previous proposals we either could not do that or the script would need to use instanceOf

@jimczi @rjernst @jdconrad @stu-elastic What do you think of this proposal?

@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@stu-elastic stu-elastic removed the needs:triage Requires assignment of a team area label label Dec 9, 2020
@bpintea
Copy link
Contributor

bpintea commented Jan 15, 2021

I wanted to briefly ask if we possibly see this added in 7.12, for scheduling the SQL support for it. Cheers!

@stu-elastic
Copy link
Contributor Author

Leaking the internal representation will be a really sharp edge for users. We'll block this work pending the universal api work's as{T}Field() type coercion. #61388

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Jan 21, 2021

Just to add besides type coercions, we've made a decision that:

  • getValue itself will return a long that is representing unsigned long where all 64 bits are used for a value.

And as @stu-elastic said the exposure of unsigned_long in scripts will be postponed until we are ready for field centric API in scripts

@jonathanl-telenav
Copy link

I don't know if this would help to address this problem, but I might be tempted to add a new class UnsignedLong and hide all the representation details of two's complement etc... that would be fast and it wouldn't violate encapsulation. Btw, java.lang.Long does provide divideUnsigned(long, long).

stu-elastic added a commit to stu-elastic/elasticsearch that referenced this issue Aug 13, 2021
Exposes unsigned long via the fields API.

Unsigned longs default to java signed longs.  That means the upper range
appears negative.  Consumers should use `Long.compareUnsigned(long, long)`
`Long.divideUnsigned(long, long)` and `Long.remainderUnsigned(long, long)`
to correctly work with values known to be unsigned long.

Alternatively, users may treat the unsigned long type as `BigInteger` using
the field API, `field('ul').as(Field.BigInteger).getValue(BigInteger.ZERO)`.
```
field('ul').as(Field.BigInteger).getValue(BigInteger.valueOf(1000))
field('ul').getValue(1000L)
```

This change also implements the beginning of the converters for the fields
API.  The following conversions have been added:
```
ulong <-> BigInteger
long <-> BigInteger
double -> BigInteger
String (parsed as long or double) -> BigInteger
double -> long
String (parsed as long or double) -> long
Date (epoch milliseconds) -> long
Nano Date (epoch nanoseconds) -> long
boolean (1L for true, 0L for false) -> long
```

Fixes: elastic#64361
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache Team:Core/Infra Meta label for core/infra team
Projects
None yet
7 participants