Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[System] Update cpu, load, memory dataset with dimensions #5160

Merged
merged 5 commits into from Feb 2, 2023

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented Feb 1, 2023

We are in the progress of starting to adopt TSDB. To enabled TSDB, dimension fields are required. In this PR, dimensions are added to the following dataset:

  • system.cpu
  • system.load
  • system.memory

The dimensions that were selected are:

  • agent.id
  • container.id
  • host.name
  • cloud.instance.id
  • cloud.project.id
  • cloud.provider

These 4 dimensions should set a good foundation for the dataset selected above. It should make it work well in host, cloud and container environments. Not all dimensions always have to exist. We still need to define a more generic set of dimensions for integrations.

The goal of this PR is to enable us testing TSDB on common dataset. It also allows Kibana developers to have dataset ready for testing.

As long as TSDB is not enabled and on older versions of the stack, this change will not have any effect.

We are in the progress of starting to adopt TSDB. To enabled TSDB, dimension fields are required. In this PR, dimensions are added to the following dataset:

* system.cpu
* system.load
* system.memory

The dimensions that were selected are:

* agent.id
* container.id
* host.name
* cloud.instance.id

These 4 dimensions should set a good foundation for the dataset selected above. It should make it work well in host, cloud and container environments. Not all dimensions always have to exist. We still need to define a more generic set of dimensions for integrations.

The goal of this PR is to enable us testing TSDB on common dataset. It also allows Kibana developers to have dataset ready for testing.

As long as TSDB is not enabled and on older versions of the stack, this change will not have any effect.
@ruflin ruflin requested a review from a team as a code owner February 1, 2023 09:26
@ruflin ruflin requested review from rdner and cmacknz February 1, 2023 09:26
@@ -196,9 +199,11 @@
description: >
OS codename, if any.

- name: cpu.pct
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field also exists in fields.yml, seems it was a duplicate.

@elasticmachine
Copy link

elasticmachine commented Feb 1, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-02-02T07:45:31.843+0000

  • Duration: 16 min 44 sec

Test stats 🧪

Test Results
Failed 0
Passed 194
Skipped 0
Total 194

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Feb 1, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (3/3) 💚
Files 100.0% (4/4) 💚
Classes 100.0% (4/4) 💚
Methods 60.759% (48/79) 👍 27.426
Lines 99.533% (2771/2784) 👎 -0.467
Conditionals 100.0% (0/0) 💚

@ruflin ruflin self-assigned this Feb 1, 2023
@agithomas
Copy link
Contributor

When we add a field as a dimension would there be performance benefits while querying the data ?

If yes, we may need to rethink what must go as a dimension fields.

using instance name instead of instance id, container name instead of container ID may be useful from a performance standpoint.

@martijnvg
Copy link
Member

When we add a field as a dimension would there be performance benefits while querying the data ?

I don't think there is a direct query performance benefits from less or more dimensions. It is important that it uniquely defines a time serie. Internally we use dimensions to create a tsid and the tsid is used for routing and index sorting. So it does have a direct impact on how well the data gets stored (similar metrics that get routed to the same shard and sorted next together in that shard compress better, which means lower disk space usage, less memory usage, which then can positively impact query performance).

@agithomas
Copy link
Contributor

agithomas commented Feb 1, 2023

Wouldn't it be safe to additionally include - cloud.project.id, cloud.provider as dimension fields ?

Not all apps will be container based. There could be monolithic apps that run on VMs. Host names can remain same in a multi-regional setup. In such scenario, including cloud.project_id may be needed.

@ruflin
Copy link
Member Author

ruflin commented Feb 1, 2023

Good idea, I added the two additional fields.

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. nit on the version number.

One side question, many of these fields are ECS fields and we could/should be using the external definition so all of the descriptions are kept up to date automatically. Any idea how adding the demension key works with that?

packages/system/changelog.yml Outdated Show resolved Hide resolved
ruflin and others added 2 commits February 2, 2023 08:43
Co-authored-by: Lee E Hinman <57081003+leehinman@users.noreply.github.com>
@ruflin
Copy link
Member Author

ruflin commented Feb 2, 2023

One side question, many of these fields are ECS fields and we could/should be using the external definition so all of the descriptions are kept up to date automatically. Any idea how adding the demension key works with that?

I had quickly tried that and it didn't work. Somehow my dimension setting was overwritten. But I think it is the path we need to investigate I just didn't want to block this change here on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants