Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First refactor of the system module - system/cpu and system/core #25771

Merged
merged 37 commits into from
Jun 14, 2021

Conversation

fearful-symmetry
Copy link
Contributor

@fearful-symmetry fearful-symmetry commented May 18, 2021

What does this PR do?

This is part of a long-discussed refactor of the system module focused on preparing the system module for fleet GA. This is just system/cpu and system/core, and by starting off the refactor with just two metricsets, I'm hoping to get feedback on the new structure of the system integration, as the design of a separate metrics library, with different implementation around wrapper functions, will be a template for the refactor going forward. This design is based on the go standard library, which takes a similar approach to implementing OS-specific APIs. see os/file*.go for an example of this.

As this change is entirely backwards compatible, we don't need any special logic to determine if we're running on fleet or metricbeat. That will probably become an issue in future PRs, where we'll want wrappers to prevent us from changing field names or metricset locations in metricbeat.

Right now, the only part of gosigar that remains is sys/windows, which I'd like feedback from @narph on. Most of this library is autogenerated by go generate, so I'm not sure it makes sense to "break it up" by metricset, as it just makes more work if we have to update it in the future. Do we want to to move this library into libbeat or metricbeat/module/system, or should we spit apart and re-generate the code based on what API calls are used by what metricset?

I'm also fairly ambivalent about the public API here. The design of Percentages(*common.MapStr) is mostly a matter of convenience for the conditional logic that fills in the event MapStr, and returning separate MapStr objects would require us to just do a deep merge later. Still, I'm open to other ideas, if someone has them.

EDIT: I'm still thinking of "less ugly" ways of dealing with OS-specific data, as opposed to just returning an opaque MapStr. I wonder if we can add OS fields to the tag structs, and then implement a MapStr unmarshaller that filters the values based on the OS. For example:

type cpu struct {
...
    nice uint64 `platform:"linux,darwin",field:"nice.pct"`
}

Why is it important?

This refactor has a few goals in mind:

  • Code should live where it's used. Code from libbeat/metric/ and gosigar that was unused outside outside of the system module has been moved into the new cpu/metrics library.
  • APIs should return only data that is appropriate for the OS. In the past we've had issues with "generic" APIs that return structs, which pass on fields that aren't appropriate for all OSes. This isn't Rust, we don't have an Option<T> type we can put on struct fields or anything, so the next best thing are hashmaps, which aren't perfect. We've worked around this in the past by developing a mess of if runtime.GOOS == ... logic, which isn't particularly pretty.
  • Centralize OS-specific logic. We currently have OS-specific knowledge and data manipulation spread across various libraries, which results in a lot of duplicated code logic. With this refactor, metrics_PLATFORM.go is the only component that needs to worry about what is compatible with what, and logic that cares about OS-specific metrics remains in one place.
  • Deprecate gosigar. Large portions of this library are exclusively used by the system module., and most other beats components use go-sysinfo instead.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Get feedback on gosigar/sys/windows
  • Decide if I want to move metrics up the library, as it's used by two metricsets.

How to test this PR locally

  • Pull down PR
  • On your host, build metricbeat, and run the system/cpu and system/core metricsets to make sure we're generating data correctly.
  • To test on other OSes, use the vagrant boxes in the root beats repo. The freebsd and openbsd boxes have been updated, and should work without much further tinkering.

@fearful-symmetry fearful-symmetry requested review from narph and a team May 18, 2021 19:07
@fearful-symmetry fearful-symmetry self-assigned this May 18, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 18, 2021
@fearful-symmetry fearful-symmetry added the Team:Integrations Label for the Integrations team label May 18, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 18, 2021
@fearful-symmetry fearful-symmetry added the Metricbeat Metricbeat label May 18, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented May 18, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: fearful-symmetry commented: /test

  • Start Time: 2021-06-14T21:53:26.437+0000

  • Duration: 62 min 15 sec

  • Commit: 4f780b1

Test stats 🧪

Test Results
Failed 0
Passed 932
Skipped 187
Total 1119

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 932
Skipped 187
Total 1119

@fearful-symmetry
Copy link
Contributor Author

@urso / @jsoriano fiddled with the opt names again, and also removed all but the bare essentials from the package. I figured it'll be better to add methods as we need them.

@fearful-symmetry
Copy link
Contributor Author

/test

1 similar comment
@fearful-symmetry
Copy link
Contributor Author

/test

if err != nil {
errs = append(errs, err)
}
cpuData.User = metrics.OptUintWith(user)
Copy link

@urso urso Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never return early. As we have an OptUint, do we really want to report 0, or shall the optional be unset?

Not really required to change this here, but ways to reduce repetition:

tryParseUint := func(name, field string) (v metrics.OptUint) {
  u, err := touint(field)
  if err != nil {
    errs = append(errs, fmt.Errorf("failed to parse %v: %w", name, field))
  } else {
    v = metrics.OptUintWith(u)
  }
  return v
}

and use it like:

cpuData.User = tryParseUint("user", fields[1])
cpuData.Nice = tryparseUint("nice", fields[2])

Now the error message tells you which field failed to parse, plus we don't have to live with all those if-else blocks.
The tryParseUint not return an uninitialized Opt value if parsing failed. The curent code just returns 0. Not sure which one would fit better here. But given we have an optional type I would assume we don't use 0.

For 'bigger' use cases one might want to turn the closure into a custom type.

In case you want to allow custom parser functions (hex vs. decimal representation) the helper could be written like this:

tryParse := func(u uint, err error) (v metrics.OptUint) {
  if err != nil {
    errs = append(errs, error)
  } else {
    v = metrics.OptUintWith(u)
  }
  return v
}

This function will be used like this (if number of returned values matches number of parameters we can just chain them):

cpuData.User = tryParse(touint(fields[1])
cpuData.Nice = tryPaese(touint(fields[2])

Unfortunately with that approach we loose the nice error messages.

}

// Value returns the value, and the bool indicating if it exists
func (opt OptUint) Value() (uint64, bool) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the name Value vs. Get. But given that we already have IsZero, I would not return the second bool.

// ValueOrZero returns the stored value, or zero
// Please do not use this for populating reported data,
// as we actually want to avoid sending zeros where values are functionally null
func (opt OptUint) ValueOrZero() uint64 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we can configure the value to return if opt is not set. The 0 value is the neutral element for sumation, but not multiplication. Instead of having a ValueOrOne, try have a small API and give me ValueOr(0) or ValueOr(1).

@fearful-symmetry
Copy link
Contributor Author

Thanks for the suggestion @urso ! ended up going with your idea for the ParseUint thing, and I just removed the Value method, since nothing is actually using it.

@fearful-symmetry
Copy link
Contributor Author

Oooops. That's what I get for fixing things at 9PM. Fixed again.

@fearful-symmetry
Copy link
Contributor Author

/test

@fearful-symmetry fearful-symmetry merged commit 2871d29 into elastic:master Jun 14, 2021
michalpristas pushed a commit to michalpristas/beats that referenced this pull request Jun 17, 2021
…stic#25771)

* initial commit

* finux linux refactor

* fix up freebsd

* port main metricset, start openbsd

* start work on openbsd vagrantfile

* refactors of API, add darwin support

* fix darwin implementation

* refactor API, move tests, remove old code, take a crack at AIX

* fix aix init func

* fix tests

* regenerate core data.json

* small fixes, fix host field

* update tests

* run correct mage commands

* try to fix system tests

* more fixes for windows python tests

* refactor CPU struct, use reflection

* refactor reflection code, add validation

* move metrics to its own internal folder

* move directories, again

* move directories, again

* use optional Uint type

* cleanup opt files

* move around naming of opt types

* fix up if block

* change opt names

* move around opt methods, cpu stat reader refactor

* fix IsZero usage

* add changelog
fearful-symmetry added a commit to fearful-symmetry/beats that referenced this pull request Jun 17, 2021
…stic#25771)

* initial commit

* finux linux refactor

* fix up freebsd

* port main metricset, start openbsd

* start work on openbsd vagrantfile

* refactors of API, add darwin support

* fix darwin implementation

* refactor API, move tests, remove old code, take a crack at AIX

* fix aix init func

* fix tests

* regenerate core data.json

* small fixes, fix host field

* update tests

* run correct mage commands

* try to fix system tests

* more fixes for windows python tests

* refactor CPU struct, use reflection

* refactor reflection code, add validation

* move metrics to its own internal folder

* move directories, again

* move directories, again

* use optional Uint type

* cleanup opt files

* move around naming of opt types

* fix up if block

* change opt names

* move around opt methods, cpu stat reader refactor

* fix IsZero usage

* add changelog

(cherry picked from commit 2871d29)
mdelapenya added a commit to mdelapenya/beats that referenced this pull request Jun 21, 2021
* master: (25 commits)
  Fix UBI source URL (elastic#26384)
  Skip test_rotating_file in osx and windows (elastic#26379)
  Remove outdated k8s manifests for managed elastic-agent (elastic#26368)
  Enable agent to send custom headers to kibana/ES (elastic#26275)
  [Automation] Update elastic stack version to 8.0.0-943ef2c0 for testing (elastic#26354)
  Make the Syslog input GA (elastic#26293)
  Move Kerberos FAST config flag to shared kerberos config (elastic#26141)
  Add k8s cluster identifiers (elastic#26056)
  Store message from MongoDB json logs in message field (elastic#26338)
  update threatintel ECS version (elastic#26274)
  update envoyproxy ECS version (elastic#26277)
  [Filebeat] [MongoDB] Support MongoDB 4.4 json logs (elastic#24774)
  Update go-structform to 0.0.9 (elastic#26251)
  Forward port 7.13.2 changelog to master (elastic#26323)
  Updated filter expression for filtering 86 artifacts (elastic#26313)
  Osquerybeat: Align with the rest of the beats, set the ECS version (elastic#26324)
  [Packetbeat] Add `url.extension` to Packetbeat HTTP events (elastic#25999)
  Change link to snapshots in README (elastic#26317)
  Don't include full ES index template in errors (elastic#25743)
  First refactor of the system module - system/cpu and system/core (elastic#25771)
  ...
fearful-symmetry added a commit that referenced this pull request Jun 21, 2021
) (#26359)

* initial commit

* finux linux refactor

* fix up freebsd

* port main metricset, start openbsd

* start work on openbsd vagrantfile

* refactors of API, add darwin support

* fix darwin implementation

* refactor API, move tests, remove old code, take a crack at AIX

* fix aix init func

* fix tests

* regenerate core data.json

* small fixes, fix host field

* update tests

* run correct mage commands

* try to fix system tests

* more fixes for windows python tests

* refactor CPU struct, use reflection

* refactor reflection code, add validation

* move metrics to its own internal folder

* move directories, again

* move directories, again

* use optional Uint type

* cleanup opt files

* move around naming of opt types

* fix up if block

* change opt names

* move around opt methods, cpu stat reader refactor

* fix IsZero usage

* add changelog

(cherry picked from commit 2871d29)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team v7.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants