Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dataset quality] using msearch to speed up degradedDocs query #183023

Conversation

yngrdyn
Copy link
Contributor

@yngrdyn yngrdyn commented May 9, 2024

Relates to #179227.

After gathering some numbers around possible tweaks of the current degradedDocs query (more information), I decided to move forward and split the query to reduce the time taken by elastic search aggregating on data streams.

This PR contains the following changes:

  • mSearch method was added to DatasetQualityESClient to allow the usage of multi search.
  • degradedDocsRt was changed to now include not only the amount of degradedDocs but also the total docs for the datastreams within the timerange selected

Nothing visible has changed in terms of functionality

Screen.Recording.2024-05-09.at.12.33.53.mov

@yngrdyn yngrdyn requested a review from a team as a code owner May 9, 2024 10:32
@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label May 9, 2024
@yngrdyn yngrdyn self-assigned this May 9, 2024
@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@yngrdyn yngrdyn added the release_note:skip Skip the PR/issue when compiling release notes label May 9, 2024
@yngrdyn yngrdyn force-pushed the 179227-dataset-quality-adjust-degraded-docs-query branch from 832eab2 to b81cf2e Compare May 9, 2024 10:37
},
// total docs per dataset
{
size: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need this query to get the total docs? don't you automatically get the total doc counts from bucket.doc_count?

my thoughts was the we just need to add 1 more prop and assign the value to it.

Screenshot 2024-05-09 at 12 45 49

Copy link
Contributor Author

@yngrdyn yngrdyn May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in the first query we just get the total number of documents with _ignored not null. A bucket in the first query will look like

{
  "key": {
    "dataset": "apm.error",
    "namespace": "default"
  },
  "doc_count": 102
},

Notice that we are not doing the nested aggregation anymore, which I have the theory is the most expensive one. And yes, we do need the total documents in the timerange to get the ratio (percentages).

Copy link
Contributor

@mohamedhamed-ahmed mohamedhamed-ahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the quick change 🚀

@kibana-ci
Copy link
Collaborator

kibana-ci commented May 9, 2024

💚 Build Succeeded

Metrics [docs]

Canvas Sharable Runtime

The Canvas "shareable runtime" is an bundle produced to enable running Canvas workpads outside of Kibana. This bundle is included in third-party webpages that embed canvas and therefor should be as slim as possible.

id before after diff
module count - 5407 +5407
total size - 8.8MB +8.8MB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
datasetQuality 36.2KB 36.3KB +19.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @yngrdyn

@yngrdyn yngrdyn merged commit f82d640 into elastic:main May 9, 2024
18 checks passed
@kibanamachine kibanamachine added v8.15.0 backport:skip This commit does not require backporting labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:project-deploy-observability Create an Observability project release_note:skip Skip the PR/issue when compiling release notes v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants