Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Add option for returning results from the tail of the stream #64869

Merged
merged 6 commits into from Nov 14, 2020

Conversation

costin
Copy link
Member

@costin costin commented Nov 10, 2020

Introduce option for specifying whether the results are returned from
the tail (end) of the stream or the head (beginning).
Improve sequencing algorithm by significantly eliminating the number
of in-flight sequences for spare datasets.
Refactor the sequence class by eliminating some of the redundant code.
Change matching behavior for tail sequences.
Return results based on their first entry ordinal instead of
insertion order (which was ordered on the last match ordinal).
Randomize results position inside test suite.

Close #58646

Introduce option for specifying whether the results are returned from
the tail (end) of the stream or the head (beginning).
Improve sequencing algorithm by significantly eliminating the number
of in-flight sequences for spare datasets.
Refactor the sequence class by eliminating some of the redundant code.
Change matching behavior for tail sequences.
Return results based on their first entry ordinal instead of
insertion order (which was ordered on the last match ordinal).
Randomize results position inside test suite.

Close elastic#58646
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/EQL)

@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Nov 10, 2020
Copy link
Member Author

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some clarification comments.

@@ -39,6 +39,7 @@
private QueryBuilder filter = null;
private String timestampField = "@timestamp";
private String eventCategoryField = "event.category";
private String resultPosition = "head";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've kept the existing default for now to not make this PR even bigger or complicate backporting to 7.10.
I'll follow-up with a separate PR for changing the values themselves.

@@ -57,6 +58,7 @@
static final String KEY_SIZE = "size";
static final String KEY_FETCH_SIZE = "fetch_size";
static final String KEY_QUERY = "query";
static final String KEY_RESULT_POSITION = "result_position";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this name; any suggestions? /cc @jrodewig.
I had variations with order but that's incorrect since the order of the results is always ascending, it's rather the position from the stream where the results are picked up.
However I don't think the name makes that obvious...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about oldest or newest sequences first.
Maybe change this in a true/false option and call it newest_first with default true. Overriding the default means false, implicitly newest first == false == newest last == oldest first.

Copy link
Contributor

@jrodewig jrodewig Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is tricky. +1 to Andrei's idea of converting this to a Boolean with a param name of latest_first or most_recent_first. That would address most of my concerns.

If we decide to keep this as an enum, here are some suggestions in order of preference:

  • head_or_tail
  • orientation with earliest and latest values
  • recency with earliest and latest values
  • result_end

I don't think there is a common phrase that encapsulates both the head/tail concepts. end may work but is probably a bit too vague to be useful here. The head/tail values make the purpose of the enum param much clearer. As these values are unlikely to change, I'd advocate for including them in the param name if we keep this as an enum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most recent / orientation suggests a different order. When one does tail the results are still ordered ASC but one gets the last X results not the first X.
I ended up with head and tail since we already have those in the languages as pipes and it's actually what happens if you don't specify them. We could use something like default_pipe but then that one is too cryptic...

Copy link
Contributor

@jrodewig jrodewig Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up with head and tail since we already have those in the languages as pipes and it's actually what happens if you don't specify them.

That makes sense. Rather than forcing another term, it may be simpler and more intuitive to lean into that language:

  • head_or_tail with head and tail enum values
  • A head true/false option with a default of false OR a tail true/false option with a default of true

It seems like the other options are too vague or confusing.

Copy link
Contributor

@astefan astefan Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forcing "head" or "tail" in the name of the parameter will implicitly assume the user knows about the purpose of head or tail and I like it. So, theoretically, there shouldn't be a concern about some implied ordering the user would assume. In Elasticsearch there are some settings that have technical terms in them and, unless one reads the documentation and knows something more about Elasticsearch, that setting will not make sense. And I think the events selection window (from start or end of the time stream) can be categorized as a more advanced feature that needs a bit more knowledge about EQL. One has to understand which events are selected and that the order in which these events are returned is always ascending.

Also, since we assume the user knows about "head" and "tail" we can push this a bit more and add "default" and "pipe" to the name of the parameter and make it boolean: "default_head_pipe": true to be more explicit. Using default_pipe is too abstract and it doesn't suggest a limited list of pipes to choose from, whereas default_head_pipe refers explicitly to head and its negation (a value of false) implies tail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Booleans are more akin to on/off switches. default_head_pipe false simply means there's no default head pipe not that there is a tail one instead.

Comment on lines 126 to +134
protected EqlSearchResponse runRequest(EqlClient eqlClient, EqlSearchRequest request) throws IOException {
return eqlClient.search(request, RequestOptions.DEFAULT);
int timeout = Math.toIntExact(timeout().millis());

RequestConfig config = RequestConfig.copy(RequestConfig.DEFAULT)
.setConnectionRequestTimeout(timeout)
.setConnectTimeout(timeout)
.setSocketTimeout(timeout)
.build();
return eqlClient.search(request, RequestOptions.DEFAULT.toBuilder().setRequestConfig(config).build());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a timeout setting (which is not straight-forward) to the base test.

Comment on lines 271 to 277
expected_event_ids = [16, 60, 66,
54, 55, 59,
55, 59, 61,
59, 61, 65,
16, 60, 66,
60, 66, 71,
61, 65, 67,
65, 67, 70,
60, 66, 71]
65, 67, 70]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously sequences were added based on their insertion order which is essentially the order of the last events in a sequence.
This can create some unexpected results as you can see on the left: [16,60,66] occurs after [55,59,61] and [59,61,65]. That is [16,60] occur after [59,61] because it's the last event that counts.
I've changed this to be the order of the first event (not last one) in the sequence since this is what is seen first.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -56,15 +55,13 @@ void add(E element) {
Ordinal ordinal = extractor.apply(element);
if (start == null) {
start = ordinal;
} else if (stop == null) {
} else if (start.compareTo(ordinal) > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (start == null || start.compareTo(ordinal) > 0) {
   start = ordinal;
}
if (stop == null || stop.compareTo(ordinal) < 0) {
   stop = ordinal;
}

keepOnly(elements.peekLast());
}

private void keepOnly(E element) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't seem to be used anywhere else except in trimToLast. Is it really necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you notice the last update, it was merged into trimToLast

* Move the window while preserving the same base.
*/
private void tumbleWindow(int currentStage, ActionListener<Payload> listener) {
if (currentStage > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine these two conditions into one? if (currentStage > 0 && matcher.hasCandidates() == false) {

@costin costin merged commit e85d9d1 into elastic:master Nov 14, 2020
@costin costin deleted the fix/58646 branch November 14, 2020 10:25
@costin
Copy link
Member Author

costin commented Nov 14, 2020

Raised #65038 to settle on the parameter name.

costin added a commit to costin/elasticsearch that referenced this pull request Nov 14, 2020
…astic#64869)

Introduce option for specifying whether the results are returned from
the tail (end) of the stream or the head (beginning).
Improve sequencing algorithm by significantly eliminating the number
of in-flight sequences for spare datasets.
Refactor the sequence class by eliminating some of the redundant code.
Change matching behavior for tail sequences.
Return results based on their first entry ordinal instead of
insertion order (which was ordered on the last match ordinal).
Randomize results position inside test suite.

Close elastic#58646

(cherry picked from commit e85d9d1)
(cherry picked from commit 6a68246)
costin added a commit that referenced this pull request Nov 14, 2020
…4869) (#65039)

Introduce option for specifying whether the results are returned from
the tail (end) of the stream or the head (beginning).
Improve sequencing algorithm by significantly eliminating the number
of in-flight sequences for spare datasets.
Refactor the sequence class by eliminating some of the redundant code.
Change matching behavior for tail sequences.
Return results based on their first entry ordinal instead of
insertion order (which was ordered on the last match ordinal).
Randomize results position inside test suite.

Close #58646

(cherry picked from commit e85d9d1)
costin added a commit that referenced this pull request Nov 14, 2020
…4869) (#65040)

Introduce option for specifying whether the results are returned from
the tail (end) of the stream or the head (beginning).
Improve sequencing algorithm by significantly eliminating the number
of in-flight sequences for spare datasets.
Refactor the sequence class by eliminating some of the redundant code.
Change matching behavior for tail sequences.
Return results based on their first entry ordinal instead of
insertion order (which was ordered on the last match ordinal).
Randomize results position inside test suite.

Close #58646

(cherry picked from commit e85d9d1)
(cherry picked from commit 452c674)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EQL: Most recent matches returned by default
5 participants