Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Frequent items extension pruning #92322

Merged

Conversation

hendrikmuhs
Copy link
Contributor

@hendrikmuhs hendrikmuhs commented Dec 13, 2022

Especially machine-generated data contains data that always appears together, e.g. a URI and a path which is the same URI but without the protocol. Such a case is called an "extension". This PR implements so called extension pruning, it avoids iterating over those extensions to build all permutations of subsets from a superset. All with the same item counts.

This fixes problems with datasets based on APM and logging. For those a benchmark of the query execution time reduces from 19434198 ms to 103321 ms (factor 188).

For more "natural" data sets like the ones used for rally-tracks this PR has no effect as pruning doesn't kick in here.

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Dec 13, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @hendrikmuhs, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@hendrikmuhs hendrikmuhs force-pushed the frequent-items-extension-pruning-3 branch from 959a1b2 to cfd5f10 Compare December 14, 2022 13:14
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hendrikmuhs hendrikmuhs merged commit 7341cfb into elastic:main Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants