Skip to content

Commit

Permalink
Introduce 64-bit unsigned long field type
Browse files Browse the repository at this point in the history
This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- sorting and aggregations is based on conversion of long values
  to double and can be imprecise for large values.

Closes #32434
  • Loading branch information
mayya-sharipova committed Jul 22, 2020
1 parent fc94423 commit dffd748
Show file tree
Hide file tree
Showing 21 changed files with 2,009 additions and 3 deletions.
4 changes: 3 additions & 1 deletion docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ document:
=== Core data types

string:: <<text,`text`>>, <<keyword,`keyword`>> and <<wildcard,`wildcard`>>
<<number>>:: `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`
<<number>>:: `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`, <<unsigned-long,`unsigned_long`>>
<<date>>:: `date`
<<date_nanos>>:: `date_nanos`
<<boolean>>:: `boolean`
Expand Down Expand Up @@ -136,3 +136,5 @@ include::types/shape.asciidoc[]
include::types/constant-keyword.asciidoc[]

include::types/wildcard.asciidoc[]

include::types/unsigned_long.asciidoc[]
3 changes: 2 additions & 1 deletion docs/reference/mapping/types/numeric.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ The following numeric types are supported:
`float`:: A single-precision 32-bit IEEE 754 floating point number, restricted to finite values.
`half_float`:: A half-precision 16-bit IEEE 754 floating point number, restricted to finite values.
`scaled_float`:: A floating point number that is backed by a `long`, scaled by a fixed `double` scaling factor.
`unsigned_long`:: An <<unsigned-long,`unsigned 64-bit integer`>> with a minimum value of 0 and a maximum value of +2^64^-1+.

Below is an example of configuring a mapping with numeric fields:

Expand Down Expand Up @@ -115,7 +116,7 @@ The following parameters are accepted by numeric types:
<<coerce,`coerce`>>::

Try to convert strings to numbers and truncate fractions for integers.
Accepts `true` (default) and `false`.
Accepts `true` (default) and `false`. Not applicable for unsigned_long.

<<doc-values,`doc_values`>>::

Expand Down
147 changes: 147 additions & 0 deletions docs/reference/mapping/types/unsigned_long.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
[role="xpack"]
[testenv="basic"]

[[unsigned-long]]
=== Unsigned long data type
++++
<titleabbrev>Unsigned long</titleabbrev>
++++

Unsigned long is a numeric field type that represents an unsigned 64-bit
integer with a minimum value of 0 and a maximum value of +2^64^-1+
(from 0 to 18446744073709551615).

At index-time, an indexed value is converted to the singed long range:
[- 9223372036854775808, 9223372036854775807] by subtracting +2^63^+ from it
and stored as a singed long taking 8 bytes.
At query-time, the same conversion is done on query terms.

[source,console]
--------------------------------------------------
PUT my_index
{
"mappings": {
"properties": {
"my_counter": {
"type": "unsigned_long"
}
}
}
}
--------------------------------------------------

Unsigned long can be indexed in a numeric or string form,
representing integer values in the range [0, 18446744073709551615].
They can't have a decimal part.

[source,console]
--------------------------------
POST /my_index/_bulk?refresh
{"index":{"_id":1}}
{"my_counter": 0}
{"index":{"_id":2}}
{"my_counter": 9223372036854775808}
{"index":{"_id":3}}
{"my_counter": 18446744073709551614}
{"index":{"_id":4}}
{"my_counter": 18446744073709551615}
--------------------------------
//TEST[continued]

Term queries accept any numbers in a numeric or string form.

[source,console]
--------------------------------
GET /my_index/_search
{
"query": {
"term" : {
"my_counter" : 18446744073709551615
}
}
}
--------------------------------
//TEST[continued]

Range queries can contain ranges with decimal parts.
It is recommended to pass ranges as strings to ensure they are parsed
without any loss of precision.

[source,console]
--------------------------------
GET /my_index/_search
{
"query": {
"range" : {
"my_counter" : {
"gte" : "9223372036854775808.5",
"lte" : "18446744073709551615"
}
}
}
}
--------------------------------
//TEST[continued]

WARNING: Unlike term and range queries, sorting and aggregations on
unsigned_long data may return imprecise results. For sorting and aggregations
double representation of unsigned longs is used, which means that long values
are first converted to double values. During this conversion,
for long values greater than +2^53^+ there could be some loss of
precision for the least significant digits. Long values less than +2^53^+
are converted accurately.

[source,console]
--------------------------------
GET /my_index/_search
{
"query": {
"match_all" : {}
},
"sort" : {"my_counter" : "desc"} <1>
}
--------------------------------
//TEST[continued]
<1> As both document values: "18446744073709551614" and "18446744073709551615"
are converted to the same double value: "1.8446744073709552E19", this
descending sort may return imprecise results, as the document with a lower
value of "18446744073709551614" may come before the document
with a higher value of "18446744073709551615".

[[unsigned-long-params]]
==== Parameters for unsigned long fields

The following parameters are accepted:

[horizontal]

<<doc-values,`doc_values`>>::

Should the field be stored on disk in a column-stride fashion, so that it
can later be used for sorting, aggregations, or scripting? Accepts `true`
(default) or `false`.

<<ignore-malformed,`ignore_malformed`>>::

If `true`, malformed numbers are ignored. If `false` (default), malformed
numbers throw an exception and reject the whole document.

<<mapping-index,`index`>>::

Should the field be searchable? Accepts `true` (default) and `false`.

<<null-value,`null_value`>>::

Accepts a numeric value of the same `type` as the field which is
substituted for any explicit `null` values. Defaults to `null`, which
means the field is treated as missing.

<<mapping-store,`store`>>::

Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default).

<<mapping-field-meta,`meta`>>::

Metadata about the field.
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,8 @@ <T> Map<String, T> map(

long longValue() throws IOException;

long unsignedLongValue() throws IOException;

float floatValue() throws IOException;

double doubleValue() throws IOException;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,11 @@ public long longValue() throws IOException {
return parser.longValue();
}

@Override
public long unsignedLongValue() throws IOException {
return parser.unsignedLongValue();
}

@Override
public float floatValue() throws IOException {
return parser.floatValue();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.elasticsearch.core.internal.io.IOUtils;

import java.io.IOException;
import java.math.BigInteger;
import java.nio.CharBuffer;

public class JsonXContentParser extends AbstractXContentParser {
Expand Down Expand Up @@ -166,6 +167,26 @@ public long doLongValue() throws IOException {
return parser.getLongValue();
}

@Override
protected long doUnsignedLongValue() throws IOException {
JsonParser.NumberType numberType = parser.getNumberType();
if ((numberType == JsonParser.NumberType.INT) || (numberType == JsonParser.NumberType.LONG)) {
long longValue = parser.getLongValue();
if (longValue < 0) {
throw new IllegalArgumentException("Value [" + longValue + "] is out of range for unsigned long.");
}
return longValue;
} else if (numberType == JsonParser.NumberType.BIG_INTEGER) {
BigInteger bigIntegerValue = parser.getBigIntegerValue();
if (bigIntegerValue.compareTo(BIGINTEGER_MAX_UNSIGNED_LONG_VALUE) > 0 || bigIntegerValue.compareTo(BigInteger.ZERO) < 0) {
throw new IllegalArgumentException("Value [" + bigIntegerValue + "] is out of range for unsigned long");
}
return bigIntegerValue.longValue();
} else { // for all other value types including numbers with decimal parts
throw new IllegalArgumentException("For input string: [" + parser.getValueAsString() + "].");
}
}

@Override
public float doFloatValue() throws IOException {
return parser.getFloatValue();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ public abstract class AbstractXContentParser implements XContentParser {
// references to this policy decision throughout the codebase and find
// and change any code that needs to apply an alternative policy.
public static final boolean DEFAULT_NUMBER_COERCE_POLICY = true;
public static BigInteger BIGINTEGER_MAX_UNSIGNED_LONG_VALUE = BigInteger.ONE.shiftLeft(64).subtract(BigInteger.ONE); // 2^64 -1


private static void checkCoerceString(boolean coerce, Class<? extends Number> clazz) {
if (!coerce) {
Expand Down Expand Up @@ -208,8 +210,26 @@ public long longValue(boolean coerce) throws IOException {
return result;
}

@Override
public long unsignedLongValue() throws IOException {
Token token = currentToken();
if (token == Token.VALUE_STRING) {
return Long.parseUnsignedLong(text());
}
long result = doUnsignedLongValue();
return result;
}

protected abstract long doLongValue() throws IOException;

/**
* Returns an unsigned long value of the current numeric token.
* The method must check for proper boundaries: [0; 2^64-1], and also check that it doesn't have a decimal part.
* An exception is raised if any of the conditions is violated.
* Numeric tokens greater than Long.MAX_VALUE must be returned as negative values.
*/
protected abstract long doUnsignedLongValue() throws IOException;

@Override
public float floatValue() throws IOException {
return floatValue(DEFAULT_NUMBER_COERCE_POLICY);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,26 @@ protected long doLongValue() throws IOException {
return numberValue().longValue();
}

@Override
protected long doUnsignedLongValue() throws IOException {
Number value = numberValue();
if ((value instanceof Integer) || (value instanceof Long) || (value instanceof Short) || (value instanceof Byte)) {
long longValue = value.longValue();
if (longValue < 0) {
throw new IllegalArgumentException("Value [" + longValue + "] is out of range for unsigned long.");
}
return longValue;
} else if (value instanceof BigInteger) {
BigInteger bigIntegerValue = (BigInteger) value;
if (bigIntegerValue.compareTo(BIGINTEGER_MAX_UNSIGNED_LONG_VALUE) > 0 || bigIntegerValue.compareTo(BigInteger.ZERO) < 0) {
throw new IllegalArgumentException("Value [" + bigIntegerValue + "] is out of range for unsigned long.");
}
return bigIntegerValue.longValue();
} else {
throw new IllegalArgumentException("For input string: [" + value.toString() + "].");
}
}

@Override
protected float doFloatValue() throws IOException {
return numberValue().floatValue();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ public final ValuesSourceType getValuesSourceType() {
* Values are casted to the provided <code>targetNumericType</code> type if it doesn't
* match the field's <code>numericType</code>.
*/
public final SortField sortField(
public SortField sortField(
NumericType targetNumericType,
Object missingValue,
MultiValueMode sortMode,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,11 @@ public long longValue() throws IOException {
return parser.longValue();
}

@Override
public long unsignedLongValue() throws IOException {
return parser.unsignedLongValue();
}

@Override
public float floatValue() throws IOException {
return parser.floatValue();
Expand Down
24 changes: 24 additions & 0 deletions x-pack/plugin/mapper-unsigned-long/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

evaluationDependsOn(xpackModule('core'))

apply plugin: 'elasticsearch.esplugin'

esplugin {
name 'unsigned-long'
description 'Module for the unsigned long field type'
classname 'org.elasticsearch.xpack.unsignedlong.UnsignedLongMapperPlugin'
extendedPlugins = ['x-pack-core']
}
archivesBaseName = 'x-pack-unsigned-long'

dependencies {
compileOnly project(path: xpackModule('core'), configuration: 'default')
testImplementation project(path: xpackModule('core'), configuration: 'testArtifacts')
}

integTest.enabled = false
Loading

0 comments on commit dffd748

Please sign in to comment.