-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Include out-of-order as well as in-order terms in reverse search #950
[ML] Include out-of-order as well as in-order terms in reverse search #950
Conversation
When categories include messages with different token orders, we were not including the tokens whose order varied between the messages in the terms of the reverse search. This change adds these out-of-order tokens that are part of the category definition back into the reverse search terms. Fixes the easy part of elastic#949
// reverse search creation are tested more thoroughly in the unit tests for | ||
// their respective classes, but this test helps to confirm that they work | ||
// together) | ||
BOOST_TEST_REQUIRE(output.find("\"terms\":\"Killing job\"") != std::string::npos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix of this PR makes terms
here Killing job
where before this PR it would have been empty.
// their respective classes, but this test helps to confirm that they work | ||
// together) | ||
BOOST_TEST_REQUIRE(output.find("\"terms\":\"Killing job\"") != std::string::npos); | ||
BOOST_TEST_REQUIRE(output.find("\"regex\":\".*\"") != std::string::npos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex is still poor, .*
, after this PR. A followup that will not be backported as far will fix this.
See #949 (comment) for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somebody more familiar with C++ might want to review as well. The proof is in the pudding (the tests in this case).
I wonder how this will effect existing ML Jobs? Do we adjust the terms when we expand categories?
Yes, we will if some aspect of the category changes as a result of assigning a new message to it. But it will only make the category definitions stored in the results index better, so I don't think it's a problem. The category definition stored in the opaque categorizer state that the C++ uses is not changed at all by this fix. The problem being fixed here is that the way that C++ internal state was reported in a more friendly format was losing some information in this edge case. The other part of the fix to #949 will involve changing the C++ internal state, and that is why it is too difficult/dangerous to include in a 6.8 patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a minor suggestion on variable naming.
size_t end(category.outOfOrderCommonTokenIndex()); | ||
for (size_t index = 0; index < end; ++index) { | ||
size_t tokenId(baseTokenIds[index].first); | ||
bool first{true}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this name. Maybe firstInOrderToken
would be clearer?
…earch When categories include messages with different token orders, we were not including the tokens whose order varied between the messages in the terms of the reverse search. This change adds these out-of-order tokens that are part of the category definition back into the reverse search terms. Backport of elastic#950
…earch When categories include messages with different token orders, we were not including the tokens whose order varied between the messages in the terms of the reverse search. This change adds these out-of-order tokens that are part of the category definition back into the reverse search terms. Backport of elastic#950
…earch When categories include messages with different token orders, we were not including the tokens whose order varied between the messages in the terms of the reverse search. This change adds these out-of-order tokens that are part of the category definition back into the reverse search terms. Backport of elastic#950
…earch (#954) When categories include messages with different token orders, we were not including the tokens whose order varied between the messages in the terms of the reverse search. This change adds these out-of-order tokens that are part of the category definition back into the reverse search terms. Backport of #950
…earch (#956) When categories include messages with different token orders, we were not including the tokens whose order varied between the messages in the terms of the reverse search. This change adds these out-of-order tokens that are part of the category definition back into the reverse search terms. Backport of #950
…earch (#955) When categories include messages with different token orders, we were not including the tokens whose order varied between the messages in the terms of the reverse search. This change adds these out-of-order tokens that are part of the category definition back into the reverse search terms. Backport of #950
When categories include messages with different token orders,
we were not including the tokens whose order varied between
the messages in the terms of the reverse search. This change
adds these out-of-order tokens that are part of the category
definition back into the reverse search terms.
Fixes the easy part of #949