Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge histograms without losing precision #93704

Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
98ff032
fix: use max value 3 for numberOfSignificantValueDigits
salvatore-campagna Feb 10, 2023
fb6ac0d
Update docs/changelog/93704.yaml
salvatore-campagna Feb 10, 2023
f49036f
fix: use the correct number of significant values digits
salvatore-campagna Feb 10, 2023
39b1c42
fix: undo first commit
salvatore-campagna Feb 10, 2023
1f5fc39
fix: update yaml
salvatore-campagna Feb 10, 2023
00fa30b
fix: always merge histograms using the higher precision
salvatore-campagna Feb 10, 2023
8721bc9
fix: use just 0 digits for empty histograms to save memory
salvatore-campagna Feb 10, 2023
18b79a6
fix: do a little bit less work while merging histograms
salvatore-campagna Feb 10, 2023
da68312
docs: include some javadoc for the merge method
salvatore-campagna Feb 10, 2023
7c2f22b
yaml: update bug resolution description
salvatore-campagna Feb 10, 2023
4fc598a
fix: set auto resize
salvatore-campagna Feb 10, 2023
6683480
fix: set autorize before merging
salvatore-campagna Feb 10, 2023
c1be8c2
fix: use 3 digits minimum for empty histograms bwc
salvatore-campagna Feb 10, 2023
44e739e
fix: try using zero digits empty histograms
salvatore-campagna Feb 10, 2023
9f8907f
fix: restore test
salvatore-campagna Feb 10, 2023
7eac3f8
fix: do not use recursive call
salvatore-campagna Feb 13, 2023
5b13951
fix: prevent serialization of empty tdigest objects
salvatore-campagna Feb 13, 2023
30126e7
fix: undo changes to TDigest percentiles
salvatore-campagna Feb 13, 2023
c57b68a
fix: undo changes to TDigest percentiles
salvatore-campagna Feb 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/93704.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 93704
summary: Use the corredt value of `numberOfSignificantValueDigits` on empty aggregations
area: Aggregations
type: bug
issues:
- 92822
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
package org.elasticsearch.search.aggregations.metrics;

import org.apache.logging.log4j.LogManager;
import org.apache.lucene.tests.util.LuceneTestCase;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.util.CollectionUtils;
Expand Down Expand Up @@ -50,7 +49,6 @@
import static org.hamcrest.Matchers.notNullValue;
import static org.hamcrest.Matchers.sameInstance;

@LuceneTestCase.AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/92822")
public class HDRPercentilesIT extends AbstractNumericTestCase {

@Override
Expand Down Expand Up @@ -215,10 +213,7 @@ public void testSingleValuedFieldPartiallyUnmapped() throws Exception {
SearchResponse searchResponse = client().prepareSearch("idx", "idx_unmapped")
.setQuery(matchAllQuery())
.addAggregation(
percentiles("percentiles").numberOfSignificantValueDigits(sigDigits)
.method(PercentilesMethod.HDR)
.field("value")
.percentiles(pcts)
percentiles("percentiles").percentilesConfig(new PercentilesConfig.Hdr(sigDigits)).field("value").percentiles(pcts)
)
.get();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,6 @@ public double metric(String name, long bucketOrd) {

@Override
public InternalAggregation buildEmptyAggregation() {
return new InternalHDRPercentiles(name, keys, null, keyed, format, metadata());
return new InternalHDRPercentiles(name, keys, new DoubleHistogram(numberOfSignificantValueDigits), keyed, format, metadata());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in TDigestPercentilesAggregator#buildEmptyAggregation we do something similar

return new InternalTDigestPercentiles(name, keys, new TDigestState(compression), keyed, formatter, metadata());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating DoubleHistogram with a high numberOfSignificantValueDigits consumes a lot of memory. Can we avoid doing this here? Also serialising it would be non trivial.

Can we maybe create the right DoubleHistogram instance in AbstractInternalHDRPercentiles#reduce(...) method? Right now it always uses a EMPTY_HISTOGRAM in case of an empty bucket. Maybe we create another instances if numberOfSignificantValueDigits != 3?

}
}