Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Optimize locking for SNMP MIBs loading. #10206

Merged
merged 7 commits into from
Dec 7, 2021

Conversation

srebhan
Copy link
Member

@srebhan srebhan commented Dec 2, 2021

This PR optimizes locking for the SNMP MIBs loading. This is necessary as the current loading of those files is completely serial. With large setups with dozends or hundreds of SNMP plugins this might take long...

@Hipska will provide measurements... :-)

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Dec 2, 2021
@Hipska Hipska added area/snmp plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Dec 2, 2021
@Hipska Hipska self-assigned this Dec 2, 2021
@Hipska
Copy link
Contributor

Hipska commented Dec 2, 2021

My setup is using around 1700 snmp input instances.

Current release (1.20.4)

2021-12-02T14:12:40Z I! Starting Telegraf 1.20.4
2021-12-02T14:12:41Z D! [agent] Initializing plugins
2021-12-02T14:12:41Z D! [agent] Connecting outputs
...

Note: This version still used snmptranslate and snmptable on the fly when the input is started. (this takes around 10s on a fast system)

Current master (1.21.0 RC0)

2021-12-02T14:19:43Z I! Starting Telegraf 1.21.0
2021-12-02T14:19:44Z D! [agent] Initializing plugins

After this, I waited for 30 minutes until I killed it. CPU was high and memory usage grew steadily to 14GiB during that time.

Artefact 0e707eb

2021-12-02T15:05:14Z I! Starting Telegraf 1.21.0-0e707ebc
2021-12-02T15:05:16Z D! [agent] Initializing plugins

Similar behaviour as RC0, stopped after 23 minutes and 10GiB of memory usage.

@srebhan
Copy link
Member Author

srebhan commented Dec 2, 2021

With the added benchmark, i.e. loading the testdata directory multiple times I see the following

Current master

goos: linux
goarch: amd64
pkg: github.com/influxdata/telegraf/plugins/inputs/snmp
cpu: AMD Ryzen 7 3700X 8-Core Processor             
BenchmarkMibLoading
BenchmarkMibLoading-16                 3         438131129 ns/op
PASS

Optimized locking

goos: linux
goarch: amd64
pkg: github.com/influxdata/telegraf/plugins/inputs/snmp
cpu: AMD Ryzen 7 3700X 8-Core Processor             
BenchmarkMibLoading
BenchmarkMibLoading-16                 3         452756498 ns/op
PASS

Caching

goos: linux
goarch: amd64
pkg: github.com/influxdata/telegraf/plugins/inputs/snmp
cpu: AMD Ryzen 7 3700X 8-Core Processor             
BenchmarkMibLoading
BenchmarkMibLoading-16           4237710               288.9 ns/op
PASS

@reimda
Copy link
Contributor

reimda commented Dec 2, 2021

@Hipska Do you see acceptable performance with 336b0ce?

@Hipska
Copy link
Contributor

Hipska commented Dec 3, 2021

Artefact 336b0ce

2021-12-03T08:35:15Z I! Starting Telegraf 1.21.0-336b0ced
2021-12-03T08:35:16Z D! [agent] Initializing plugins
2021-12-03T08:35:21Z D! [agent] Connecting outputs
...

Whoa, this is what it should have been!

Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you willing to change the log messages a little bit while we are here?

internal/snmp/translate.go Outdated Show resolved Hide resolved
internal/snmp/translate.go Outdated Show resolved Hide resolved
internal/snmp/translate.go Outdated Show resolved Hide resolved
srebhan and others added 3 commits December 3, 2021 11:06
Co-authored-by: Thomas Casteleyn <thomas.casteleyn@me.com>
Co-authored-by: Thomas Casteleyn <thomas.casteleyn@me.com>
Co-authored-by: Thomas Casteleyn <thomas.casteleyn@me.com>
@Hipska Hipska added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Dec 3, 2021
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Dec 3, 2021

Copy link
Contributor

@MyaLongmire MyaLongmire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this Sven

@srebhan
Copy link
Member Author

srebhan commented Dec 3, 2021

@MyaLongmire thanks for making this possible in the first place. That PR was fun. :-)

@reimda reimda merged commit 2d420fb into influxdata:master Dec 7, 2021
MyaLongmire pushed a commit that referenced this pull request Dec 8, 2021
nward added a commit to nward/telegraf that referenced this pull request Jan 12, 2022
powersj pushed a commit to powersj/telegraf that referenced this pull request Jan 21, 2022
phemmer added a commit to phemmer/telegraf that referenced this pull request Feb 18, 2022
* origin/master: (133 commits)
  chore: restart service if it is already running and upgraded via RPM (influxdata#9970)
  feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10237)
  fix: Handle duplicate registration of protocol-buffer files gracefully. (influxdata#10188)
  fix(http_listener_v2): fix panic on close (influxdata#10132)
  feat: add Vault input plugin (influxdata#10198)
  feat: support aws managed service for prometheus (influxdata#10202)
  fix: Make telegraf compile on Windows with golang 1.16.2 (influxdata#10246)
  Update changelog
  feat: Modbus add per-request tags (influxdata#10231)
  fix: Implement NaN and inf handling for elasticsearch output (influxdata#10196)
  feat: add nomad input plugin (influxdata#10106)
  fix: Print loaded plugins and deprecations for once and test (influxdata#10205)
  fix: eliminate MIB dependency for ifname processor (influxdata#10214)
  feat: Optimize locking for SNMP MIBs loading. (influxdata#10206)
  feat: Add SMART plugin concurrency configuration option, nvme-cli v1.14+ support and lint fixes. (influxdata#10150)
  feat: update configs (influxdata#10236)
  fix(inputs/kube_inventory): set TLS server name config properly (influxdata#9975)
  fix: Sudden close of Telegraf caused by OPC UA input plugin (influxdata#10230)
  fix: bump github.com/eclipse/paho.mqtt.golang from 1.3.0 to 1.3.5 (influxdata#9913)
  fix: json_v2 parser timestamp setting (influxdata#10221)
  fix: ensure graylog spec fields not prefixed with '_' (influxdata#10209)
  docs: remove duplicate links in CONTRIBUTING.md (influxdata#10218)
  fix: pool detection and metrics gathering for ZFS >= 2.1.x (influxdata#10099)
  fix: parallelism fix for ifname processor (influxdata#10007)
  chore: Forbids "log" package only for aggregators, inputs, outputs, parsers and processors (influxdata#10191)
  docs: address documentation gap when running telegraf in k8s (influxdata#10215)
  feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10211)
  fix: mqtt topic extracting no longer requires all three fields (influxdata#10208)
  fix: windows service - graceful shutdown of telegraf (influxdata#9616)
  feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10201)
  feat: Modbus support multiple slaves (gateway feature) (influxdata#9279)
  fix: Revert unintented corruption of the Makefile from influxdata#10200. (influxdata#10203)
  chore: remove triggering update-config bot in CI (influxdata#10195)
  Update changelog
  feat: Implement deprecation infrastructure (influxdata#10200)
  fix: extra lock on init for safety (influxdata#10199)
  fix: resolve influxdata#10027 (influxdata#10112)
  fix: register bigquery to output plugins influxdata#10177 (influxdata#10178)
  fix: sysstat use unique temp file vs hard-coded (influxdata#10165)
  refactor: snmp to use gosmi (influxdata#9518)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/snmp feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants