Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] add new normalize_above parameter to p_value significant terms heuristic #78833

Conversation

benwtrent
Copy link
Member

This commit adds the new normalize_above parameter to the p_value significant
terms heuristic.

This parameter allows for consistent significance results at various scales. When a total count (in or out of the set background set) is above the normalize_above parameter, both the total set and the set including the term are scaled by normalize_above/count where count is term in the set or total set size.

@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Oct 7, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@benwtrent benwtrent force-pushed the feature/ml-p_value-add-new-normalizing-param branch from 103926f to ef26aa9 Compare October 7, 2021 14:27
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from a couple of nits

@benwtrent benwtrent merged commit 843fa42 into elastic:master Oct 12, 2021
@benwtrent benwtrent deleted the feature/ml-p_value-add-new-normalizing-param branch October 12, 2021 14:38
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Oct 12, 2021
…euristic (elastic#78833)

This commit adds the new normalize_above parameter to the p_value significant
terms heuristic.

This parameter allows for consistent significance results at various scales. When a total count (in or out of the set background set) is above the normalize_above parameter, both the total set and the set including the term are scaled by normalize_above/count where count is term in the set or total set size.
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Oct 12, 2021
benwtrent added a commit that referenced this pull request Oct 12, 2021
…euristic (#78833) (#78999)

This commit adds the new normalize_above parameter to the p_value significant
terms heuristic.

This parameter allows for consistent significance results at various scales. When a total count (in or out of the set background set) is above the normalize_above parameter, both the total set and the set including the term are scaled by normalize_above/count where count is term in the set or total set size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants