Failure in HDRPercentilesIT.testScriptSingleValued #92822

DaveCTurner · 2023-01-11T14:42:53Z

CI Link

https://gradle-enterprise.elastic.co/s/4loz5k2jllcvg

Repro line

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.search.aggregations.metrics.HDRPercentilesIT.testScriptSingleValued" -Dtests.seed=928B8BBC20C73401 -Dtests.locale=de-CH -Dtests.timezone=America/Argentina/Catamarca -Druntime.java=19

Does it reproduce?

Yes

Applicable branches

#92817 (a small change over main)

Failure history

No response

Failure excerpt

I encountered this a few times in #92817, which is effectively main plus the following patch which should not cause any failures:

diff --git a/test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java b/test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
index 18f21cb7368..c51df18675c 100644
--- a/test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
+++ b/test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
@@ -152,7 +152,6 @@ import static org.hamcrest.Matchers.equalTo;
 import static org.hamcrest.Matchers.greaterThan;
 import static org.hamcrest.Matchers.greaterThanOrEqualTo;
 import static org.hamcrest.Matchers.not;
-import static org.hamcrest.Matchers.nullValue;
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertThat;
@@ -603,6 +602,8 @@ public final class InternalTestCluster extends TestCluster {
             );
         }

+        builder.put(INITIAL_STATE_TIMEOUT_SETTING.getKey(), "0s");
+
         return builder.build();
     }

@@ -731,14 +732,6 @@ public final class InternalTestCluster extends TestCluster {
         updatedSettings.put("node.name", name);
         updatedSettings.put(NodeEnvironment.NODE_ID_SEED_SETTING.getKey(), seed);

-        if (autoManageMasterNodes) {
-            assertThat(
-                "if master nodes are automatically managed then nodes must complete a join cycle when starting",
-                updatedSettings.get(INITIAL_STATE_TIMEOUT_SETTING.getKey()),
-                nullValue()
-            );
-        }
-
         return updatedSettings.build();
     }

The failure is as follows:

org.elasticsearch.search.aggregations.metrics.HDRPercentilesIT > testScriptSingleValued FAILED	
    java.lang.AssertionError: 	
    Expected: a value less than or equal to <10.00100068359375>	
         but: <10.0068359375> was greater than <10.00100068359375>	
        at __randomizedtesting.SeedInfo.seed([928B8BBC20C73401:2E3A0A78C1851C92]:0)	
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)	
        at org.junit.Assert.assertThat(Assert.java:956)	
        at org.junit.Assert.assertThat(Assert.java:923)	
        at org.elasticsearch.search.aggregations.metrics.HDRPercentilesIT.assertConsistent(HDRPercentilesIT.java:87)	
        at org.elasticsearch.search.aggregations.metrics.HDRPercentilesIT.testScriptSingleValued(HDRPercentilesIT.java:376)

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2023-01-11T14:43:17Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

DaveCTurner · 2023-01-11T14:56:50Z

In fact I seem to be getting something similar on other tests in the same suite too - at least:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.search.aggregations.metrics.HDRPercentilesIT.testSingleValuedFieldPartiallyUnmapped" -Dtests.seed=CC100B3B0FAA4466 -Dtests.locale=lt-LT -Dtests.timezone=America/Cayman -Druntime.java=19

DaveCTurner · 2023-01-11T16:23:43Z

Oh this failed in main too: https://gradle-enterprise.elastic.co/s/atstclct72y24

DaveCTurner · 2023-01-11T17:39:22Z

Muted in a2adedf.

salvatore-campagna · 2023-02-10T14:37:20Z

I investigated this isssue and I think the problem is the following.

When creating a sketch for a HdrHistogram we need to provide the numberOfSignificantValueDigits that is used by the HdrHistogram algorithm to calculate percentiles with a specific precision. Some tests return empty results...I guess this happens because the test uses few documents and some shards have no document indexed. The test generates values for numberOfSignificantValueDigits in the range [0, 5]. So, if a test generates a value that is 4 or 5 it expects a certain precision. Anyway, at reduction time we might merge results from sketches using different precisions. Empty results use sketches with 3 significant digits, non-empty results sketches with the number of digits expected by the test.

As an additional note...testing in the main branch I could reproduce the issue for test testSingleValuedFieldPartiallyUnmapped using the seed from the Gradle reproductioin line and not in other cases.
I guess anyway that the issue here is triggered depending on how documents are distributed across shards.

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.search.aggregations.metrics.HDRPercentilesIT.testSingleValuedFieldPartiallyUnmapped" -Dtests.seed=CC100B3B0FAA4466 -Dtests.locale=lt-LT -Dtests.timezone=America/Cayman -Druntime.java=19

For empty histogram aggregations we instantiate by default a DoubleHistogram with 3 significant digits precision. The test generates a random value for the number of significant digits that is in the range [0, 5]. As a result, if the test runs with 4 or 5 significant value digits but the HdrHistogram sketch only uses 3, checking errors on results fails since all computations are done with lower than expected precision. The issue happens at reduction time in AbstractInternalHDRPercentiles when merging histograms coming from different shards. If a shard returns no data the sketch is empty but uses 3 significant digits, while for non empty results the correct number of digits is used. Now, depending on the order sketches are merged it might happen, for instance, that we merge a sketch using 4 or 5 significant digits into one using 3 significant digits (used for the empty result). The result, will then use whatever precision is used by the first "merged" object created. This in some cases leads to correct result and sometimes not. Here, when merging histograms, we always use the one with higher value of numberOfSignificantValueDigits so to avoid reducing precision of the result. Note that, as a result of this merging strategy, we can even use just 0 digits precision for empty results and save on some serialization/deserialization for empty histograms. Resolves #92822

For empty histogram aggregations we instantiate by default a DoubleHistogram with 3 significant digits precision. The test generates a random value for the number of significant digits that is in the range [0, 5]. As a result, if the test runs with 4 or 5 significant value digits but the HdrHistogram sketch only uses 3, checking errors on results fails since all computations are done with lower than expected precision. The issue happens at reduction time in AbstractInternalHDRPercentiles when merging histograms coming from different shards. If a shard returns no data the sketch is empty but uses 3 significant digits, while for non empty results the correct number of digits is used. Now, depending on the order sketches are merged it might happen, for instance, that we merge a sketch using 4 or 5 significant digits into one using 3 significant digits (used for the empty result). The result, will then use whatever precision is used by the first "merged" object created. This in some cases leads to correct result and sometimes not. Here, when merging histograms, we always use the one with higher value of numberOfSignificantValueDigits so to avoid reducing precision of the result. Note that, as a result of this merging strategy, we can even use just 0 digits precision for empty results and save on some serialization/deserialization for empty histograms. Resolves elastic#92822

DaveCTurner added :Analytics/Aggregations Aggregations >test-failure Triaged test failures from CI labels Jan 11, 2023

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 11, 2023

salvatore-campagna self-assigned this Feb 10, 2023

salvatore-campagna mentioned this issue Feb 10, 2023

Merge histograms without losing precision #93704

Merged

salvatore-campagna closed this as completed in #93704 Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure in HDRPercentilesIT.testScriptSingleValued #92822

Failure in HDRPercentilesIT.testScriptSingleValued #92822

DaveCTurner commented Jan 11, 2023

elasticsearchmachine commented Jan 11, 2023

DaveCTurner commented Jan 11, 2023

DaveCTurner commented Jan 11, 2023

DaveCTurner commented Jan 11, 2023

salvatore-campagna commented Feb 10, 2023 •

edited

Loading

Failure in HDRPercentilesIT.testScriptSingleValued #92822

Failure in HDRPercentilesIT.testScriptSingleValued #92822

Comments

DaveCTurner commented Jan 11, 2023

CI Link

Repro line

Does it reproduce?

Applicable branches

Failure history

Failure excerpt

elasticsearchmachine commented Jan 11, 2023

DaveCTurner commented Jan 11, 2023

DaveCTurner commented Jan 11, 2023

DaveCTurner commented Jan 11, 2023

salvatore-campagna commented Feb 10, 2023 • edited Loading

salvatore-campagna commented Feb 10, 2023 •

edited

Loading