Skip to content

Commit

Permalink
Unsigned long 64bits(#62892)
Browse files Browse the repository at this point in the history
Introduce 64-bit unsigned long field type

This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
  to double and can be imprecise for large values.

Backport for #60050
Closes #32434
  • Loading branch information
mayya-sharipova committed Sep 24, 2020
1 parent a43f29c commit 54064a1
Show file tree
Hide file tree
Showing 33 changed files with 2,603 additions and 30 deletions.
5 changes: 4 additions & 1 deletion docs/reference/mapping/types/numeric.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ The following numeric types are supported:
`float`:: A single-precision 32-bit IEEE 754 floating point number, restricted to finite values.
`half_float`:: A half-precision 16-bit IEEE 754 floating point number, restricted to finite values.
`scaled_float`:: A floating point number that is backed by a `long`, scaled by a fixed `double` scaling factor.
`unsigned_long`:: An unsigned 64-bit integer with a minimum value of 0 and a maximum value of +2^64^-1+.

Below is an example of configuring a mapping with numeric fields:

Expand Down Expand Up @@ -115,7 +116,7 @@ The following parameters are accepted by numeric types:
<<coerce,`coerce`>>::

Try to convert strings to numbers and truncate fractions for integers.
Accepts `true` (default) and `false`.
Accepts `true` (default) and `false`. Not applicable for `unsigned_long`.

<<mapping-boost,`boost`>>::

Expand Down Expand Up @@ -169,3 +170,5 @@ The following parameters are accepted by numeric types:
sorting) will behave as if the document had a value of +2.3+. High values
of `scaling_factor` improve accuracy but also increase space requirements.
This parameter is required.

include::unsigned_long.asciidoc[]
116 changes: 116 additions & 0 deletions docs/reference/mapping/types/unsigned_long.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
[role="xpack"]
[testenv="basic"]

[[unsigned-long]]
=== Unsigned long data type
Unsigned long is a numeric field type that represents an unsigned 64-bit
integer with a minimum value of 0 and a maximum value of +2^64^-1+
(from 0 to 18446744073709551615 inclusive).

[source,console]
--------------------------------------------------
PUT my_index
{
"mappings": {
"properties": {
"my_counter": {
"type": "unsigned_long"
}
}
}
}
--------------------------------------------------

Unsigned long can be indexed in a numeric or string form,
representing integer values in the range [0, 18446744073709551615].
They can't have a decimal part.

[source,console]
--------------------------------
POST /my_index/_bulk?refresh
{"index":{"_id":1}}
{"my_counter": 0}
{"index":{"_id":2}}
{"my_counter": 9223372036854775808}
{"index":{"_id":3}}
{"my_counter": 18446744073709551614}
{"index":{"_id":4}}
{"my_counter": 18446744073709551615}
--------------------------------
//TEST[continued]

Term queries accept any numbers in a numeric or string form.

[source,console]
--------------------------------
GET /my_index/_search
{
"query": {
"term" : {
"my_counter" : 18446744073709551615
}
}
}
--------------------------------
//TEST[continued]

Range query terms can contain values with decimal parts.
In this case {es} converts them to integer values:
`gte` and `gt` terms are converted to the nearest integer up inclusive,
and `lt` and `lte` ranges are converted to the nearest integer down inclusive.

It is recommended to pass ranges as strings to ensure they are parsed
without any loss of precision.

[source,console]
--------------------------------
GET /my_index/_search
{
"query": {
"range" : {
"my_counter" : {
"gte" : "9223372036854775808.5",
"lte" : "18446744073709551615"
}
}
}
}
--------------------------------
//TEST[continued]


For queries with sort on an `unsigned_long` field,
for a particular document {es} returns a sort value of the type `long`
if the value of this document is within the range of long values,
or of the type `BigInteger` if the value exceeds this range.

NOTE: REST clients need to be able to handle big integer values
in JSON to support this field type correctly.

[source,console]
--------------------------------
GET /my_index/_search
{
"query": {
"match_all" : {}
},
"sort" : {"my_counter" : "desc"}
}
--------------------------------
//TEST[continued]

Similarly to sort values, script values of an `unsigned_long` field
return a `Number` representing a `Long` or `BigInteger`.
The same values: `Long` or `BigInteger` are used for `terms` aggregations.

==== Queries with mixed numeric types

Searches with mixed numeric types one of which is `unsigned_long` are
supported, except queries with sort. Thus, a sort query across two indexes
where the same field name has an `unsigned_long` type in one index,
and `long` type in another, doesn't produce correct results and must
be avoided. If there is a need for such kind of sorting, script based sorting
can be used instead.

Aggregations across several numeric types one of which is `unsigned_long` are
supported. In this case, values are converted to the `double` type.
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.math.BigInteger;
import java.time.ZonedDateTime;
import java.util.BitSet;
import java.util.Collections;
Expand Down Expand Up @@ -734,6 +735,8 @@ public static double defTodoubleImplicit(final Object value) {
return (float)value;
} else if (value instanceof Double) {
return (double)value;
} else if (value instanceof BigInteger) {
return ((BigInteger)value).doubleValue();
} else {
throw new ClassCastException("cannot implicitly cast " +
"def [" + PainlessLookupUtility.typeToUnboxedType(value.getClass()).getCanonicalName() + "] to " +
Expand Down Expand Up @@ -866,7 +869,8 @@ public static double defTodoubleExplicit(final Object value) {
value instanceof Integer ||
value instanceof Long ||
value instanceof Float ||
value instanceof Double
value instanceof Double ||
value instanceof BigInteger
) {
return ((Number)value).doubleValue();
} else {
Expand Down Expand Up @@ -1004,7 +1008,9 @@ public static Double defToDoubleImplicit(final Object value) {
} else if (value instanceof Float) {
return (double)(float)value;
} else if (value instanceof Double) {
return (Double)value;
return (Double) value;
} else if (value instanceof BigInteger) {
return ((BigInteger)value).doubleValue();
} else {
throw new ClassCastException("cannot implicitly cast " +
"def [" + PainlessLookupUtility.typeToUnboxedType(value.getClass()).getCanonicalName() + "] to " +
Expand Down Expand Up @@ -1151,7 +1157,8 @@ public static Double defToDoubleExplicit(final Object value) {
value instanceof Integer ||
value instanceof Long ||
value instanceof Float ||
value instanceof Double
value instanceof Double ||
value instanceof BigInteger
) {
return ((Number)value).doubleValue();
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ public void testdefTodoubleImplicit() {
assertEquals((double)0, exec("def d = Long.valueOf(0); double b = d; b"));
assertEquals((double)0, exec("def d = Float.valueOf(0); double b = d; b"));
assertEquals((double)0, exec("def d = Double.valueOf(0); double b = d; b"));
assertEquals((double)0, exec("def d = BigInteger.valueOf(0); double b = d; b"));
expectScriptThrows(ClassCastException.class, () -> exec("def d = new ArrayList(); double b = d;"));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,7 @@ ReducedQueryPhase reducedQueryPhase(Collection<? extends SearchPhaseResult> quer
if (queryResults.isEmpty()) {
throw new IllegalStateException(errorMsg);
}
validateMergeSortValueFormats(queryResults);
final QuerySearchResult firstResult = queryResults.stream().findFirst().get().queryResult();
final boolean hasSuggest = firstResult.suggest() != null;
final boolean hasProfileResults = firstResult.hasProfileResults();
Expand Down Expand Up @@ -486,6 +487,36 @@ private static InternalAggregations reduceAggs(InternalAggregation.ReduceContext
performFinalReduce ? aggReduceContextBuilder.forFinalReduction() : aggReduceContextBuilder.forPartialReduction());
}

/**
* Checks that query results from all shards have consistent unsigned_long format.
* Sort queries on a field that has long type in one index, and unsigned_long in another index
* don't work correctly. Throw an error if this kind of sorting is detected.
* //TODO: instead of throwing error, find a way to sort long and unsigned_long together
*/
private static void validateMergeSortValueFormats(Collection<? extends SearchPhaseResult> queryResults) {
boolean[] ulFormats = null;
boolean firstResult = true;
for (SearchPhaseResult entry : queryResults) {
DocValueFormat[] formats = entry.queryResult().sortValueFormats();
if (formats == null) return;
if (firstResult) {
firstResult = false;
ulFormats = new boolean[formats.length];
for (int i = 0; i < formats.length; i++) {
ulFormats[i] = formats[i] == DocValueFormat.UNSIGNED_LONG_SHIFTED ? true : false;
}
} else {
for (int i = 0; i < formats.length; i++) {
// if the format is unsigned_long in one shard, and something different in another shard
if (ulFormats[i] ^ (formats[i] == DocValueFormat.UNSIGNED_LONG_SHIFTED)) {
throw new IllegalArgumentException("Can't do sort across indices, as a field has [unsigned_long] type " +
"in one index, and different type in another index!");
}
}
}
}
}

/*
* Returns the size of the requested top documents (from + size)
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.math.BigInteger;
import java.nio.file.AccessDeniedException;
import java.nio.file.AtomicMoveNotSupportedException;
import java.nio.file.DirectoryNotEmptyException;
Expand Down Expand Up @@ -329,6 +330,11 @@ public Long readOptionalLong() throws IOException {
return null;
}

public BigInteger readBigInteger() throws IOException {
return new BigInteger(readString());
}


@Nullable
public Text readOptionalText() throws IOException {
int length = readInt();
Expand Down Expand Up @@ -741,6 +747,8 @@ public Object readGenericValue() throws IOException {
return readCollection(StreamInput::readGenericValue, LinkedHashSet::new, Collections.emptySet());
case 25:
return readCollection(StreamInput::readGenericValue, HashSet::new, Collections.emptySet());
case 26:
return readBigInteger();
default:
throw new IOException("Can't read unknown type [" + type + "]");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.OutputStream;
import java.math.BigInteger;
import java.nio.file.AccessDeniedException;
import java.nio.file.AtomicMoveNotSupportedException;
import java.nio.file.DirectoryNotEmptyException;
Expand Down Expand Up @@ -803,6 +804,11 @@ public final void writeOptionalInstant(@Nullable Instant instant) throws IOExcep
}
o.writeCollection((Set<?>) v, StreamOutput::writeGenericValue);
});
// TODO: improve serialization of BigInteger
writers.put(BigInteger.class, (o, v) -> {
o.writeByte((byte) 26);
o.writeString(v.toString());
});
WRITERS = Collections.unmodifiableMap(writers);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@
import org.elasticsearch.index.fielddata.IndexFieldData;

import java.io.IOException;
import java.math.BigInteger;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.Arrays;
Expand Down Expand Up @@ -369,6 +370,8 @@ public static FieldDoc readFieldDoc(StreamInput in) throws IOException {
cFields[j] = in.readBoolean();
} else if (type == 9) {
cFields[j] = in.readBytesRef();
} else if (type == 10) {
cFields[j] = new BigInteger(in.readString());
} else {
throw new IOException("Can't match type [" + type + "]");
}
Expand Down Expand Up @@ -398,6 +401,8 @@ public static Comparable readSortValue(StreamInput in) throws IOException {
return in.readBoolean();
} else if (type == 9) {
return in.readBytesRef();
}else if (type == 10) {
return new BigInteger(in.readString());
} else {
throw new IOException("Can't match type [" + type + "]");
}
Expand Down Expand Up @@ -517,6 +522,10 @@ public static void writeSortValue(StreamOutput out, Object field) throws IOExcep
} else if (type == BytesRef.class) {
out.writeByte((byte) 9);
out.writeBytesRef((BytesRef) field);
} else if (type == BigInteger.class) {
//TODO: improve serialization of BigInteger
out.writeByte((byte) 10);
out.writeString(field.toString());
} else {
throw new IOException("Can't handle sort field value of type [" + type + "]");
}
Expand Down

0 comments on commit 54064a1

Please sign in to comment.