Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce 64-bit unsigned long field type #60050

Merged
merged 18 commits into from
Sep 23, 2020
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
dffd748
Introduce 64-bit unsigned long field type
mayya-sharipova Jul 14, 2020
7eb2d4a
Address feedback
mayya-sharipova Aug 14, 2020
612b7da
Merge remote-tracking branch 'upstream/master' into unsigned64bits_in…
mayya-sharipova Aug 18, 2020
ada3422
Modifications after master merge
mayya-sharipova Aug 18, 2020
7551cd6
Merge remote-tracking branch 'upstream/master' into unsigned64bits_in…
mayya-sharipova Aug 19, 2020
e903940
Rename methods
mayya-sharipova Aug 19, 2020
9e057c0
Address Jim's feedback
mayya-sharipova Sep 8, 2020
ab54a23
Include unsigned_long docs into numeric type
mayya-sharipova Sep 8, 2020
4de3bd0
Merge remote-tracking branch 'upstream/master' into unsigned64bits_in…
mayya-sharipova Sep 8, 2020
2b567c9
Convert UnsignedLongFieldMapper to parametrized
mayya-sharipova Sep 9, 2020
07470b5
Small edits in documentation
mayya-sharipova Sep 10, 2020
17912bc
Address Julie's comment on documentation
mayya-sharipova Sep 16, 2020
b2eef4c
Add check that unsigned_long field type can't be sorted with other types
mayya-sharipova Sep 16, 2020
b315a0f
Merge remote-tracking branch 'upstream/master' into unsigned64bits_in…
mayya-sharipova Sep 16, 2020
7652e66
Fix build and test failures
mayya-sharipova Sep 16, 2020
0508e70
Merge remote-tracking branch 'upstream/master' into unsigned64bits_in…
mayya-sharipova Sep 23, 2020
24cbe55
Rename the method for validating consistency of merge formats
mayya-sharipova Sep 23, 2020
cca6b30
Change unsigned_long mapper based on recent master changes
mayya-sharipova Sep 23, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/reference/mapping/types/numeric.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ The following numeric types are supported:
`float`:: A single-precision 32-bit IEEE 754 floating point number, restricted to finite values.
`half_float`:: A half-precision 16-bit IEEE 754 floating point number, restricted to finite values.
`scaled_float`:: A floating point number that is backed by a `long`, scaled by a fixed `double` scaling factor.
`unsigned_long`:: An unsigned 64-bit integer with a minimum value of 0 and a maximum value of +2^64^-1+.

Below is an example of configuring a mapping with numeric fields:

Expand Down Expand Up @@ -115,7 +116,7 @@ The following parameters are accepted by numeric types:
<<coerce,`coerce`>>::

Try to convert strings to numbers and truncate fractions for integers.
Accepts `true` (default) and `false`.
Accepts `true` (default) and `false`. Not applicable for `unsigned_long`.

<<doc-values,`doc_values`>>::

Expand Down Expand Up @@ -164,3 +165,5 @@ The following parameters are accepted by numeric types:
sorting) will behave as if the document had a value of +2.3+. High values
of `scaling_factor` improve accuracy but also increase space requirements.
This parameter is required.

include::unsigned_long.asciidoc[]
115 changes: 115 additions & 0 deletions docs/reference/mapping/types/unsigned_long.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
[role="xpack"]
[testenv="basic"]

[[unsigned-long]]
=== Unsigned long data type
Unsigned long is a numeric field type that represents an unsigned 64-bit
integer with a minimum value of 0 and a maximum value of +2^64^-1+
(from 0 to 18446744073709551615 inclusive).

[source,console]
--------------------------------------------------
PUT my_index
{
"mappings": {
"properties": {
"my_counter": {
"type": "unsigned_long"
}
}
}
}
--------------------------------------------------

Unsigned long can be indexed in a numeric or string form,
representing integer values in the range [0, 18446744073709551615].
They can't have a decimal part.

[source,console]
--------------------------------
POST /my_index/_bulk?refresh
{"index":{"_id":1}}
{"my_counter": 0}
{"index":{"_id":2}}
{"my_counter": 9223372036854775808}
{"index":{"_id":3}}
{"my_counter": 18446744073709551614}
{"index":{"_id":4}}
{"my_counter": 18446744073709551615}
--------------------------------
//TEST[continued]

Term queries accept any numbers in a numeric or string form.

[source,console]
--------------------------------
GET /my_index/_search
{
"query": {
"term" : {
"my_counter" : 18446744073709551615
}
}
}
--------------------------------
//TEST[continued]

Range query terms can contain values with decimal parts.
In this case {es} converts them to integer values:
`gte` and `gt` terms are converted to the nearest integer up inclusive,
and `lt` and `lte` ranges are converted to the nearest integer down inclusive.

It is recommended to pass ranges as strings to ensure they are parsed
without any loss of precision.

[source,console]
--------------------------------
GET /my_index/_search
{
"query": {
"range" : {
"my_counter" : {
"gte" : "9223372036854775808.5",
"lte" : "18446744073709551615"
}
}
}
}
--------------------------------
//TEST[continued]


For queries with sort on an `unsigned_long` field,
for a particular document {es} returns a sort value of the type `Long`
if the value of this document is within the range of long values,
or of the type `BigInteger` if the value exceeds this range.

WARNING: Not all {es} clients can properly handle big integer values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit scary to add as a warning ;). I would change {es} with something like rest clients need to handle big integer values in json to support this field type correctly or something along those lines ?


[source,console]
--------------------------------
GET /my_index/_search
{
"query": {
"match_all" : {}
},
"sort" : {"my_counter" : "desc"}
}
--------------------------------
//TEST[continued]

Similarly to sort values, script values of an `unsigned_long` field
jtibshirani marked this conversation as resolved.
Show resolved Hide resolved
produce `BigInteger` or `Long` values. The same values: `BigInteger` or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say produce a Number that represents a long or a BigInteger...

`Long` are used for `terms` aggregation.

==== Queries with mixed numeric types
mayya-sharipova marked this conversation as resolved.
Show resolved Hide resolved

Search queries across several numeric types one of which `unsigned_long` are
supported, except queries with sort. Thus, a sort query across two indexes
where the same field name has an `unsigned_long` type in one index,
and `long` type in another, doesn't produce correct results and must
be avoided. If there is a need for such kind of sorting, script based sorting
can be used instead.

Aggregations across several numeric types one of which is `unsigned_long` are
supported. In this case, values are converted to the `double` type.
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.math.BigInteger;
import java.time.ZonedDateTime;
import java.util.BitSet;
import java.util.Collections;
Expand Down Expand Up @@ -734,6 +735,8 @@ public static double defTodoubleImplicit(final Object value) {
return (float)value;
} else if (value instanceof Double) {
return (double)value;
} else if (value instanceof BigInteger) {
return ((BigInteger)value).doubleValue();
} else {
throw new ClassCastException("cannot implicitly cast " +
"def [" + PainlessLookupUtility.typeToUnboxedType(value.getClass()).getCanonicalName() + "] to " +
Expand Down Expand Up @@ -866,7 +869,8 @@ public static double defTodoubleExplicit(final Object value) {
value instanceof Integer ||
value instanceof Long ||
value instanceof Float ||
value instanceof Double
value instanceof Double ||
value instanceof BigInteger
) {
return ((Number)value).doubleValue();
} else {
Expand Down Expand Up @@ -1004,7 +1008,9 @@ public static Double defToDoubleImplicit(final Object value) {
} else if (value instanceof Float) {
return (double)(float)value;
} else if (value instanceof Double) {
return (Double)value;
return (Double) value;
} else if (value instanceof BigInteger) {
return ((BigInteger)value).doubleValue();
} else {
throw new ClassCastException("cannot implicitly cast " +
"def [" + PainlessLookupUtility.typeToUnboxedType(value.getClass()).getCanonicalName() + "] to " +
Expand Down Expand Up @@ -1151,7 +1157,8 @@ public static Double defToDoubleExplicit(final Object value) {
value instanceof Integer ||
value instanceof Long ||
value instanceof Float ||
value instanceof Double
value instanceof Double ||
value instanceof BigInteger
) {
return ((Number)value).doubleValue();
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ public void testdefTodoubleImplicit() {
assertEquals((double)0, exec("def d = Long.valueOf(0); double b = d; b"));
assertEquals((double)0, exec("def d = Float.valueOf(0); double b = d; b"));
assertEquals((double)0, exec("def d = Double.valueOf(0); double b = d; b"));
assertEquals((double)0, exec("def d = BigInteger.valueOf(0); double b = d; b"));
expectScriptThrows(ClassCastException.class, () -> exec("def d = new ArrayList(); double b = d;"));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.math.BigInteger;
import java.nio.file.AccessDeniedException;
import java.nio.file.AtomicMoveNotSupportedException;
import java.nio.file.DirectoryNotEmptyException;
Expand Down Expand Up @@ -348,6 +349,11 @@ public Long readOptionalLong() throws IOException {
return null;
}

public BigInteger readBigInteger() throws IOException {
return new BigInteger(readString());
}


@Nullable
public Text readOptionalText() throws IOException {
int length = readInt();
Expand Down Expand Up @@ -760,6 +766,8 @@ public Object readGenericValue() throws IOException {
return readCollection(StreamInput::readGenericValue, LinkedHashSet::new, Collections.emptySet());
case 25:
return readCollection(StreamInput::readGenericValue, HashSet::new, Collections.emptySet());
case 26:
return readBigInteger();
default:
throw new IOException("Can't read unknown type [" + type + "]");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.OutputStream;
import java.math.BigInteger;
import java.nio.file.AccessDeniedException;
import java.nio.file.AtomicMoveNotSupportedException;
import java.nio.file.DirectoryNotEmptyException;
Expand Down Expand Up @@ -826,6 +827,13 @@ public final void writeOptionalInstant(@Nullable Instant instant) throws IOExcep
o.writeByte((byte) 25);
}
o.writeCollection((Set<?>) v, StreamOutput::writeGenericValue);
}),
entry(
// TODO: improve serialization of BigInteger
BigInteger.class,
(o, v) -> {
o.writeByte((byte) 26);
o.writeString(v.toString());
}
));

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@
import org.elasticsearch.index.fielddata.IndexFieldData;

import java.io.IOException;
import java.math.BigInteger;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.Arrays;
Expand Down Expand Up @@ -366,6 +367,8 @@ public static FieldDoc readFieldDoc(StreamInput in) throws IOException {
cFields[j] = in.readBoolean();
} else if (type == 9) {
cFields[j] = in.readBytesRef();
} else if (type == 10) {
cFields[j] = new BigInteger(in.readString());
} else {
throw new IOException("Can't match type [" + type + "]");
}
Expand Down Expand Up @@ -510,6 +513,10 @@ public static void writeSortValue(StreamOutput out, Object field) throws IOExcep
} else if (type == BytesRef.class) {
out.writeByte((byte) 9);
out.writeBytesRef((BytesRef) field);
} else if (type == BigInteger.class) {
//TODO: improve serialization of BigInteger
out.writeByte((byte) 10);
out.writeString(field.toString());
} else {
throw new IOException("Can't handle sort field value of type [" + type + "]");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
import org.elasticsearch.search.aggregations.bucket.geogrid.GeoTileUtils;

import java.io.IOException;
import java.math.BigInteger;
import java.net.InetAddress;
import java.text.DecimalFormat;
import java.text.DecimalFormatSymbols;
Expand All @@ -48,6 +49,8 @@

/** A formatter for values as returned by the fielddata/doc-values APIs. */
public interface DocValueFormat extends NamedWriteable {
long MASK_2_63 = 0x8000000000000000L;
BigInteger BIGINTEGER_2_64_MINUS_ONE = BigInteger.ONE.shiftLeft(64).subtract(BigInteger.ONE); // 2^64 -1
mayya-sharipova marked this conversation as resolved.
Show resolved Hide resolved

/** Format a long value. This is used by terms and histogram aggregations
* to format keys for fields that use longs as a doc value representation
Expand Down Expand Up @@ -472,5 +475,66 @@ public boolean equals(Object o) {
public int hashCode() {
return Objects.hash(pattern);
}
}
};

/**
* DocValues format for unsigned 64 bit long values,
* that are stored as shifted signed 64 bit long values.
*/
DocValueFormat UNSIGNED_LONG_SHIFTED = new DocValueFormat() {

@Override
public String getWriteableName() {
return "unsigned_long_shifted";
}

@Override
public void writeTo(StreamOutput out) {
}

@Override
public String toString() {
return "unsigned_long_shifted";
}

/**
* Formats the unsigned long to the shifted long format
*/
@Override
public long parseLong(String value, boolean roundUp, LongSupplier now) {
long parsedValue = Long.parseUnsignedLong(value);
// subtract 2^63 or 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
// equivalent to flipping the first bit
return parsedValue ^ MASK_2_63;
}

/**
* Formats a raw docValue that is stored in the shifted long format to the unsigned long representation.
*/
@Override
public Object format(long value) {
// add 2^63 or 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000,
// equivalent to flipping the first bit
long formattedValue = value ^ MASK_2_63;
if (formattedValue >= 0) {
return formattedValue;
} else {
return BigInteger.valueOf(formattedValue).and(BIGINTEGER_2_64_MINUS_ONE);
}
}

/**
* Double docValues of the unsigned_long field type are already in the formatted representation,
* so we don't need to do anything here
*/
@Override
public Double format(double value) {
return value;
}

@Override
public double parseDouble(String value, boolean roundUp, LongSupplier now) {
return Double.parseDouble(value);
}
};
}
Original file line number Diff line number Diff line change
Expand Up @@ -703,6 +703,7 @@ private void registerValueFormats() {
registerValueFormat(DocValueFormat.IP.getWriteableName(), in -> DocValueFormat.IP);
registerValueFormat(DocValueFormat.RAW.getWriteableName(), in -> DocValueFormat.RAW);
registerValueFormat(DocValueFormat.BINARY.getWriteableName(), in -> DocValueFormat.BINARY);
registerValueFormat(DocValueFormat.UNSIGNED_LONG_SHIFTED.getWriteableName(), in -> DocValueFormat.UNSIGNED_LONG_SHIFTED);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,13 @@ public SearchSortValues(Object[] rawSortValues, DocValueFormat[] sortValueFormat
this.rawSortValues = rawSortValues;
this.formattedSortValues = Arrays.copyOf(rawSortValues, rawSortValues.length);
for (int i = 0; i < rawSortValues.length; ++i) {
//we currently format only BytesRef but we may want to change that in the future
Object sortValue = rawSortValues[i];
if (sortValue instanceof BytesRef) {
this.formattedSortValues[i] = sortValueFormats[i].format((BytesRef) sortValue);
} else if ((sortValue instanceof Long) && (sortValueFormats[i] == DocValueFormat.UNSIGNED_LONG_SHIFTED)) {
this.formattedSortValues[i] = sortValueFormats[i].format((Long) sortValue);
} else {
this.formattedSortValues[i] = sortValue;
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,8 @@ protected Bucket[] createBucketsArray(int size) {
public InternalAggregation reduce(List<InternalAggregation> aggregations, ReduceContext reduceContext) {
boolean promoteToDouble = false;
for (InternalAggregation agg : aggregations) {
if (agg instanceof LongTerms && ((LongTerms) agg).format == DocValueFormat.RAW) {
if (agg instanceof LongTerms &&
(((LongTerms) agg).format == DocValueFormat.RAW || ((LongTerms) agg).format == DocValueFormat.UNSIGNED_LONG_SHIFTED) ) {
/*
* this terms agg mixes longs and doubles, we must promote longs to doubles to make the internal aggs
* compatible
Expand Down
Loading