Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Field stats endpoint does not need to use sampler aggregation #74595

Closed
wylieconlon opened this issue Aug 6, 2020 · 5 comments
Closed
Labels
Feature:Lens Feature:UnifiedFieldList The unified field list component used by Lens & Discover Team:Visualizations Visualization editors, elastic-charts and infrastructure technical debt Improvement of the software architecture and operational architecture

Comments

@wylieconlon
Copy link
Contributor

Originally, we thought that the sampler aggregation would behave like a random sampling query, with improved performance across large datasets. This is not what the sampler aggregation actually does, which means that we are doing more work instead of less. This aggregation can be removed entirely.

@wylieconlon wylieconlon added technical debt Improvement of the software architecture and operational architecture Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Aug 6, 2020
@wylieconlon wylieconlon added this to Long-term goals in Lens via automation Aug 6, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@wylieconlon wylieconlon moved this from Long-term goals to Tech Debt in Lens Aug 6, 2020
@flash1293
Copy link
Contributor

@wylieconlon Are you sure not using sampler is better than using it in cases where it really matters (super large data sets)?

From the documentation you linked:

Example use cases

  • Reducing the running cost of aggregations that can produce useful results using only samples e.g. significant_terms

Probably missing a nuance here.

@wylieconlon
Copy link
Contributor Author

@flash1293 Because we don't use any of those aggregations when calculating the samples, that part is not relevant. The other example use case is potentially relevant, with caveats:

Tightening the focus of analytics to high-relevance matches rather than the potentially very long tail of low-quality matches

This part actually might be relevant if the user has added a rank-affecting query to the Lens editor before clicking on the preview. Exact match queries wouldn't have any effect, but a query with OR would affect the rank, as would a wildcard query.

Because it's such a narrow subset of queries that affect the results, I would say that the sampler is not useful.

@flash1293
Copy link
Contributor

Ah, I forgot we aren't using aggregations for gathering the stats - makes total sense in that case, thanks.

@stratoula
Copy link
Contributor

We have changed the implementation so this is not valid anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Lens Feature:UnifiedFieldList The unified field list component used by Lens & Discover Team:Visualizations Visualization editors, elastic-charts and infrastructure technical debt Improvement of the software architecture and operational architecture
Projects
None yet
Development

No branches or pull requests

5 participants