New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail composite aggregation if after key is unparsable #74252
Fail composite aggregation if after key is unparsable #74252
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be nice to have a yaml test to make sure everything is plugged in and routes all the way through. I know it's annoying but exception handling is the kind of spooky action at a distance that can break without anyone knowing it. A yaml test'd catch that.
return formatted; | ||
} | ||
|
||
private static Object formatObjectUnchecked (Object obj, DocValueFormat format) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space I think.
"Composite aggregation with invalid format": | ||
- skip: | ||
version: " - 7.99.99" | ||
reason: After key parse checking added in 7.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often add words like "to be backported" so that its super obvious if someone finds this before the backport lands that it'll be changing and isn't just a mistake.
if (formatted != null) { | ||
Object parsed; | ||
// Type Jank ahead | ||
if (obj.getClass() == BytesRef.class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid this layer of type jank? We already know the type from formatObjectUnchecked
. Maybe if the checking were performed in a callback or something? I'm not sure that'd be more readable, but maybe one less janky type based switch statement would be worth trying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I'm curious if there are any "fun" cases where folks were doing this sort of thing and it was accidentally working. But if it was they were cutting it really close anyway.
One of the reasons I wrote it the way I did, if your format is bad but happens to work for your data, the aggregation will return just fine. We don't check the format in a vacuum, we format & parse the actual after key value. So the only people who could get broken by this are people who already built a work-around for it on their side instead of fixing their format. Or, I guess, people who ran a composite with a bad format and then never used the after key. |
Yeah, I'm aware. Just feeling paranoid. It's the right thing, I think. LGTM |
If the after key cannot parse back to the same value we generated it from, when the client sends it back we will land on a wrong page. If this page is earlier, it is likely we will (eventually) generate the same wrong after key, resulting in an infinite loop as the client repeatedly ask to retrieve the same page or pages of data. This change fixes that by failing the composite aggregation (with an exception) if the after key is unparsable with the given format. We provide as much information about what failed as possible. Conflicts: server/src/main/java/org/elasticsearch/search/DocValueFormat.java
…74449) If the after key cannot parse back to the same value we generated it from, when the client sends it back we will land on a wrong page. If this page is earlier, it is likely we will (eventually) generate the same wrong after key, resulting in an infinite loop as the client repeatedly ask to retrieve the same page or pages of data. This change fixes that by failing the composite aggregation (with an exception) if the after key is unparsable with the given format. We provide as much information about what failed as possible.
It is possible that the format for the composite after key will generate a string that cannot be parsed back into the original value. This typically occurs either because of a format error (e.g. mixing month and week based date formats) or because the format resolution is coarser than the bucket resolution (e.g. you asked for hourly buckets but formatted the keys as days). When the user then sends back this after key for the next page, they will get an earlier page of data instead. This can result in the client getting into an infinite loop, constantly sending the same after key and getting back the same last page of data.
This PR will detect those unparsable after keys and cause the aggregation to throw an exception instead of returning the malformed key.