Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support numeric bounds with decimal parts for long/integer/short/byte datatypes #21972

Merged
merged 6 commits into from Dec 22, 2016

Conversation

Projects
None yet
3 participants
@scampi
Copy link
Contributor

scampi commented Dec 5, 2016

close #21600

@scampi

This comment has been minimized.

Copy link
Contributor Author

scampi commented Dec 5, 2016

I have a question about the code. The {byte,short} rangeQuery methods rely on the INTEGER implementation of rangeQuery. Therefore, the call to parse uses the INTEGER.parse implementation, not the BYTE.parse one. Is this normal ?

@jpountz
Copy link
Contributor

jpountz left a comment

the call to parse uses the INTEGER.parse implementation, not the BYTE.parse one. Is this normal?

Yes: we do not optimize storage for bytes or shorts at the moment, so it is fine to share the same code as far as queries are concerned. The indexing code is different however since we want to fail when someone indexes eg. a large integer in a byte field.

If I am not mistaken, a side effect of this pull request is that searching a decimal value on an integer field used to raise an error while it would not silently round the value down. I am fine with not throwing an error, but could we make sure to create a query that does not match anything if the decimal part is not null rather than silently rounding down? I think both term and terms are affected.

double doubleValue = ((Number) number).doubleValue();
return doubleValue % 1 != 0;
}
return false;

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 5, 2016

Contributor

Is it safe to return false otherwise? Maybe return Double.parseDouble(number) % 1 != 0?

This comment has been minimized.

Copy link
@scampi

scampi Dec 7, 2016

Author Contributor

If number is a string like 1.1 then the parsing would fail with java.lang.NumberFormatException: For input string: "1.1". This is already the case in the existing parse methods. So I think it should be fine to return false here.

@@ -415,9 +415,6 @@ Byte parse(Object value) {
if (doubleValue < Byte.MIN_VALUE || doubleValue > Byte.MAX_VALUE) {
throw new IllegalArgumentException("Value [" + value + "] is out of range for a byte");
}
if (doubleValue % 1 != 0) {
throw new IllegalArgumentException("Value [" + value + "] has a decimal part");
}

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 5, 2016

Contributor

I'd still like an exception to be thrown if this is called from parseCreateField. Maybe add a boolean coerce parameter (consistent with the rest of this class) and only perform this check if coerce is false?

@jpountz jpountz self-assigned this Dec 5, 2016

@scampi scampi force-pushed the scampi:range-query-decimal-part branch to 19cdc1b Dec 7, 2016

@scampi

This comment has been minimized.

Copy link
Contributor Author

scampi commented Dec 7, 2016

@jpountz Thanks for the explanation of the rangeQuery.
Ready for another look!

@jpountz
Copy link
Contributor

jpountz left a comment

Sorry for the lag, I just had anothen look at your changes. I think it does not work with negative values, since your code assumes that calling longValue() on a decimal rounds down, while it actually rounds up for negative values.

return HalfFloatPoint.newExactQuery(field, v);
}

@Override
Query termsQuery(String field, List<Object> values) {
float[] v = new float[values.size()];
for (int i = 0; i < values.size(); ++i) {
v[i] = parse(values.get(i));
v[i] = (float) parse(values.get(i), false);

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 14, 2016

Contributor

the cast looks unnecessary?

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Dec 14, 2016

I have a recreation if you want to look into it:

PUT index 
{
  "mappings": {
    "type": {
      "properties": {
        "field": {
          "type": "integer"
        }
      }
    }
  }
}

PUT index/type/1
{
  "field": -3
}

GET index/_search
{
  "query": {
    "range": {
      "field": {
        "gte": -3.5,
        "lte": -2.5
      }
    }
  }
}

GET index/_search
{
  "query": {
    "range": {
      "field": {
        "gte": -4.5,
        "lte": -3.5
      }
    }
  }
}

The document matches the second query but not the first one.


int[] v = new int[nonDecimalValues.size()];
for (int i = 0; i < nonDecimalValues.size(); ++i) {
v[i] = parse(nonDecimalValues.get(i), true);

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 14, 2016

Contributor

Maybe we could avoid generating intermediate garbage because of boxed objects by directly creating the primitive array, eg. something like:

int[] v = new int[values.size()];
int upTo = 0;
for (Object value : values) {
  if (hasDecimalPart(value) == false) {
    v[upTo++] = parse(value, true);
  }
}
if (upTo != v.length) {
  v = Arrays.copyOf(v, upTo);
}

scampi added some commits Dec 18, 2016

@scampi

This comment has been minimized.

Copy link
Contributor Author

scampi commented Dec 18, 2016

@jpountz thanks for the review! I have resolved your comments in the last two PRs. Ready for another round

@jpountz
Copy link
Contributor

jpountz left a comment

The changes to terms queries look good, but I think there are still issues with range query generation.

MappedFieldType ftInt = new NumberFieldMapper.NumberFieldType(NumberType.INTEGER);
ftInt.setName("field");
ftInt.setIndexOptions(IndexOptions.DOCS);
assertEquals(IntPoint.newRangeQuery("field", -3, -2), ftInt.rangeQuery(-3.5, -2.5, true, true, null));

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 19, 2016

Contributor

Sorry I may be a bit confused, but if the range is [-3.5, -2.5] then -2 should not match?

This comment has been minimized.

Copy link
@scampi

scampi Dec 21, 2016

Author Contributor

sorry about that, i'll correct this in next commit

// - if the bound is negative then we leave it as is:
// if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue
boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm);
if ((includeLower == false && !lowerTermHasDecimalPart) || (lowerTermHasDecimalPart && l > 0)) {

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 19, 2016

Contributor

I suspect this approach cannot work since -0.5 and +0.5 both parse to 0 when coerce is true, so you have no way to know whether the original value was positive or negative?

@scampi

This comment has been minimized.

Copy link
Contributor Author

scampi commented Dec 21, 2016

@jpountz I corrected those two issues you pointed out.

@jpountz
Copy link
Contributor

jpountz left a comment

It looks good to me. I left some comments about formatting if you don't mind applying them and then I'll merge. Thanks @scampi!

// - if the bound is negative then we leave it as is:
// if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue
boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm);
if ((!lowerTermHasDecimalPart && includeLower == false) ||

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 21, 2016

Contributor

can you either use ! in both cases or == false?

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 21, 2016

Contributor

(others tend to have a preference for == false so I'd recommend using that)

// if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue
boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm);
if ((!lowerTermHasDecimalPart && includeLower == false) ||
(lowerTermHasDecimalPart && signum(lowerTerm) > 0)) {

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 21, 2016

Contributor

can you indent one level more so that it is easier to figure out what is part of the if statement and what is part of the inner block?

u = parse(upperTerm, true);
boolean upperTermHasDecimalPart = hasDecimalPart(upperTerm);
if ((!upperTermHasDecimalPart && includeUpper == false) ||
(upperTermHasDecimalPart && signum(upperTerm) < 0)) {

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 21, 2016

Contributor

same here

// if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue
boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm);
if ((!lowerTermHasDecimalPart && includeLower == false) ||
(lowerTermHasDecimalPart && signum(lowerTerm) > 0)) {

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 21, 2016

Contributor

same here

u = parse(upperTerm, true);
boolean upperTermHasDecimalPart = hasDecimalPart(upperTerm);
if ((!upperTermHasDecimalPart && includeUpper == false) ||
(upperTermHasDecimalPart && signum(upperTerm) < 0)) {

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 21, 2016

Contributor

same here

/**
* Returns -1, 0, or 1 if the value is lower than, equal to, or greater than 0
*/
protected double signum(Object value) {

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 21, 2016

Contributor

let's make those two methods private, I don't think we need to extend them?

@scampi

This comment has been minimized.

Copy link
Contributor Author

scampi commented Dec 21, 2016

@jpountz I have addressed your comments, thanks for the review ;o)

@jpountz jpountz merged commit e1b8528 into elastic:master Dec 22, 2016

1 check passed

CLA Commit author has signed the CLA
Details

@scampi scampi deleted the scampi:range-query-decimal-part branch Dec 22, 2016

jpountz added a commit that referenced this pull request Dec 22, 2016

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Dec 22, 2016

Merge branch 'master' into use-reader-for-doc-stats
* master: (22 commits)
  Support negative numbers in writeVLong (elastic#22314)
  UnicastZenPing's PingingRound should prevent opening connections after being closed
  Add task to clean idea build directory. Make cleanIdea task invoke it.
  add trace logging to UnicastZenPingTests.testResolveReuseExistingNodeConnections
  Adds ingest processor headers to exception for unknown processor. (elastic#22315)
  Remove much ceremony from parsing client yaml test suites (elastic#22311)
  Support numeric bounds with decimal parts for long/integer/short/byte datatypes (elastic#21972)
  inner hits: Don't inline inner hits if the query the inner hits is inlined into can't resolve mappings and ignore_unmapped has been set to true
  Fix stackoverflow error on InternalNumericMetricAggregation
  Date detection should not rely on a hardcoded set of characters. (elastic#22171)
  `value_type` is useful regardless of scripting. (elastic#22160)
  Improve concurrency of ShardCoreKeyMap. (elastic#22316)
  fixed jdocs and removed already fixed norelease
  Adds abstract test classes for serialisation (elastic#22281)
  Introduce translog no-op
  Provide helpful error message if a plugin exists
  Clear static variable after suite
  Repeated language analyzers (elastic#22240)
  Restore deprecation warning for invalid match_mapping_type values (elastic#22304)
  Make `-0` compare less than `+0` consistently. (elastic#22173)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.