-
Notifications
You must be signed in to change notification settings - Fork 25.6k
T digest csv test support #138391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
T digest csv test support #138391
Conversation
Conflicts: x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
…test-support' into t-digest-csv-test-support
…test-support' into t-digest-csv-test-support
|
More test failures? |
| ), | ||
|
|
||
| /* | ||
| TDIGEST( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this commented-out code intentionally in this PR?
| } | ||
|
|
||
| @Override | ||
| public TDigestHolder getTDigestHolder(int offset, BytesRef scratch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point of passing in the BytesRef scratch here?
This method still allocates one TDigestHolder per entry when iterating over the block.
In fact, TDigestHolder is actually the bigger object compared to BytesRef.
I'd recommend to start simple and improve in a follow-up:
Just don't pass a scratch in, simply allocate the BytesRef in this method.
Then later add a custom type for your scratch (or make TDigestHolder reusable) and pass that in as second argument.
| return encodedDigest; | ||
| } | ||
|
|
||
| // TODO - compute these if they're not given? or do that at object creation time, maybe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any possibility for these to be not populated?
I think if users don't provide them at ingest time, we estimate them then.
This means that at query time, they are always present?
I'd recommend adding an assert assertInvariants() call to the TDigestArrayBlock constructor and the corresponding assertInvariants() method similar to the one I added for exponential histogram blocks.
In that method, I'd check that min, max, sum and count are present or null according to our expectance.
Having this check there saved me quite a bit of headache during the exponential histogram work.
|
|
||
| public static TDigestHolder randomTDigest() { | ||
| // TODO: This is mostly copied from TDigestFieldMapperTests; refactor it. | ||
| Map<String, Object> value = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Map<String, Object> value = new LinkedHashMap<>(); |
unused code
| Make sure we can even load tdigest data | ||
| required_capability: tdigest_field_type_basic_functionality | ||
|
|
||
| FROM tdigest_standard_index | KEEP @timestamp,instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| FROM tdigest_standard_index | KEEP @timestamp,instance; | |
| FROM tdigest_standard_index | WHERE responseTime IS NOT NULL | KEEP @timestamp,instance; |
I don't think that your original query would even load the responseTime, would it?
If I'm wrong here I'd be keen on some explanations
| */ | ||
| EXPONENTIAL_HISTOGRAM_PRE_TECH_PREVIEW_V4(EXPONENTIAL_HISTOGRAM_FEATURE_FLAG), | ||
|
|
||
| TDIGEST_FIELD_TYPE_BASIC_FUNCTIONALITY(T_DIGEST_ESQL_SUPPORT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I got annoyed having to add a new capability every time I do a small change while still behind a feature flag.
That's why I changed my approach:
I now simply increment the version suffix of EXPONENTIAL_HISTOGRAM_PRE_TECH_PREVIEW_V4 on every change that would wrongly break the bwc tests. That helps with not ending up with a mess of capabilities which we have to clean up later before removing the feature flag.
Wire up the t-digest field type to the ESQL CSV tests. Mostly this involves adding the element type and plumbing it though, with some additional work filling out the block builder behaviors.