New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EQL: Add option for returning results from the tail of the stream #64869
Conversation
Introduce option for specifying whether the results are returned from the tail (end) of the stream or the head (beginning). Improve sequencing algorithm by significantly eliminating the number of in-flight sequences for spare datasets. Refactor the sequence class by eliminating some of the redundant code. Change matching behavior for tail sequences. Return results based on their first entry ordinal instead of insertion order (which was ordered on the last match ordinal). Randomize results position inside test suite. Close elastic#58646
Pinging @elastic/es-ql (:Query Languages/EQL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some clarification comments.
@@ -39,6 +39,7 @@ | |||
private QueryBuilder filter = null; | |||
private String timestampField = "@timestamp"; | |||
private String eventCategoryField = "event.category"; | |||
private String resultPosition = "head"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've kept the existing default for now to not make this PR even bigger or complicate backporting to 7.10.
I'll follow-up with a separate PR for changing the values themselves.
@@ -57,6 +58,7 @@ | |||
static final String KEY_SIZE = "size"; | |||
static final String KEY_FETCH_SIZE = "fetch_size"; | |||
static final String KEY_QUERY = "query"; | |||
static final String KEY_RESULT_POSITION = "result_position"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this name; any suggestions? /cc @jrodewig.
I had variations with order
but that's incorrect since the order of the results is always ascending, it's rather the position from the stream where the results are picked up.
However I don't think the name makes that obvious...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about oldest or newest sequences first.
Maybe change this in a true/false option and call it newest_first
with default true
. Overriding the default means false
, implicitly newest first == false == newest last == oldest first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is tricky. +1 to Andrei's idea of converting this to a Boolean with a param name of latest_first
or most_recent_first
. That would address most of my concerns.
If we decide to keep this as an enum, here are some suggestions in order of preference:
head_or_tail
orientation
withearliest
andlatest
valuesrecency
withearliest
andlatest
valuesresult_end
I don't think there is a common phrase that encapsulates both the head
/tail
concepts. end
may work but is probably a bit too vague to be useful here. The head
/tail
values make the purpose of the enum param much clearer. As these values are unlikely to change, I'd advocate for including them in the param name if we keep this as an enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most recent / orientation suggests a different order. When one does tail the results are still ordered ASC but one gets the last X results not the first X.
I ended up with head and tail since we already have those in the languages as pipes and it's actually what happens if you don't specify them. We could use something like default_pipe
but then that one is too cryptic...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up with head and tail since we already have those in the languages as pipes and it's actually what happens if you don't specify them.
That makes sense. Rather than forcing another term, it may be simpler and more intuitive to lean into that language:
head_or_tail
withhead
andtail
enum values- A
head
true/false option with a default offalse
OR atail
true/false option with a default oftrue
It seems like the other options are too vague or confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forcing "head" or "tail" in the name of the parameter will implicitly assume the user knows about the purpose of head or tail and I like it. So, theoretically, there shouldn't be a concern about some implied ordering the user would assume. In Elasticsearch there are some settings that have technical terms in them and, unless one reads the documentation and knows something more about Elasticsearch, that setting will not make sense. And I think the events selection window (from start or end of the time stream) can be categorized as a more advanced feature that needs a bit more knowledge about EQL. One has to understand which events are selected and that the order in which these events are returned is always ascending.
Also, since we assume the user knows about "head" and "tail" we can push this a bit more and add "default" and "pipe" to the name of the parameter and make it boolean: "default_head_pipe": true
to be more explicit. Using default_pipe
is too abstract and it doesn't suggest a limited list of pipes to choose from, whereas default_head_pipe
refers explicitly to head
and its negation (a value of false
) implies tail
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Booleans are more akin to on/off switches. default_head_pipe
false simply means there's no default head pipe not that there is a tail one instead.
protected EqlSearchResponse runRequest(EqlClient eqlClient, EqlSearchRequest request) throws IOException { | ||
return eqlClient.search(request, RequestOptions.DEFAULT); | ||
int timeout = Math.toIntExact(timeout().millis()); | ||
|
||
RequestConfig config = RequestConfig.copy(RequestConfig.DEFAULT) | ||
.setConnectionRequestTimeout(timeout) | ||
.setConnectTimeout(timeout) | ||
.setSocketTimeout(timeout) | ||
.build(); | ||
return eqlClient.search(request, RequestOptions.DEFAULT.toBuilder().setRequestConfig(config).build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a timeout setting (which is not straight-forward) to the base test.
expected_event_ids = [16, 60, 66, | ||
54, 55, 59, | ||
55, 59, 61, | ||
59, 61, 65, | ||
16, 60, 66, | ||
60, 66, 71, | ||
61, 65, 67, | ||
65, 67, 70, | ||
60, 66, 71] | ||
65, 67, 70] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously sequences were added based on their insertion order which is essentially the order of the last events in a sequence.
This can create some unexpected results as you can see on the left: [16,60,66] occurs after [55,59,61] and [59,61,65]. That is [16,60] occur after [59,61] because it's the last event that counts.
I've changed this to be the order of the first event (not last one) in the sequence since this is what is seen first.
Revert the sequence matching order from first to current/last.
Polishing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -56,15 +55,13 @@ void add(E element) { | |||
Ordinal ordinal = extractor.apply(element); | |||
if (start == null) { | |||
start = ordinal; | |||
} else if (stop == null) { | |||
} else if (start.compareTo(ordinal) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (start == null || start.compareTo(ordinal) > 0) {
start = ordinal;
}
if (stop == null || stop.compareTo(ordinal) < 0) {
stop = ordinal;
}
keepOnly(elements.peekLast()); | ||
} | ||
|
||
private void keepOnly(E element) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't seem to be used anywhere else except in trimToLast
. Is it really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you notice the last update, it was merged into trimToLast
* Move the window while preserving the same base. | ||
*/ | ||
private void tumbleWindow(int currentStage, ActionListener<Payload> listener) { | ||
if (currentStage > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combine these two conditions into one? if (currentStage > 0 && matcher.hasCandidates() == false) {
Raised #65038 to settle on the parameter name. |
…astic#64869) Introduce option for specifying whether the results are returned from the tail (end) of the stream or the head (beginning). Improve sequencing algorithm by significantly eliminating the number of in-flight sequences for spare datasets. Refactor the sequence class by eliminating some of the redundant code. Change matching behavior for tail sequences. Return results based on their first entry ordinal instead of insertion order (which was ordered on the last match ordinal). Randomize results position inside test suite. Close elastic#58646 (cherry picked from commit e85d9d1) (cherry picked from commit 6a68246)
…4869) (#65039) Introduce option for specifying whether the results are returned from the tail (end) of the stream or the head (beginning). Improve sequencing algorithm by significantly eliminating the number of in-flight sequences for spare datasets. Refactor the sequence class by eliminating some of the redundant code. Change matching behavior for tail sequences. Return results based on their first entry ordinal instead of insertion order (which was ordered on the last match ordinal). Randomize results position inside test suite. Close #58646 (cherry picked from commit e85d9d1)
…4869) (#65040) Introduce option for specifying whether the results are returned from the tail (end) of the stream or the head (beginning). Improve sequencing algorithm by significantly eliminating the number of in-flight sequences for spare datasets. Refactor the sequence class by eliminating some of the redundant code. Change matching behavior for tail sequences. Return results based on their first entry ordinal instead of insertion order (which was ordered on the last match ordinal). Randomize results position inside test suite. Close #58646 (cherry picked from commit e85d9d1) (cherry picked from commit 452c674)
Introduce option for specifying whether the results are returned from
the tail (end) of the stream or the head (beginning).
Improve sequencing algorithm by significantly eliminating the number
of in-flight sequences for spare datasets.
Refactor the sequence class by eliminating some of the redundant code.
Change matching behavior for tail sequences.
Return results based on their first entry ordinal instead of
insertion order (which was ordered on the last match ordinal).
Randomize results position inside test suite.
Close #58646