Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何调整采集间隔到1s? #50

Open
qq732552048 opened this issue Jun 30, 2023 · 4 comments
Open

如何调整采集间隔到1s? #50

qq732552048 opened this issue Jun 30, 2023 · 4 comments

Comments

@qq732552048
Copy link

看exporter的间隔写的是60台交换机、900台服务器需要3s的时间,如何优化到1s内完成呢?

@gabrieleiannetti
Copy link
Contributor

在标准情况下,输出器在一台计算机上运行,并管理整个网络。因此,所需的时间较长。然而,可以为每个 "通道适配器(ca)"运行导出器,这样可以大大减少时间。

Question here is, how can the runtime of the exporter be improved...

I would suggest to run the exporter for each CA separately on each host instead of one host for the complete fabric.

@guilbaults Can you please provide an example how to run the exporter for one specific CA with port? I think I am missing something here... I do not get it running with ibquerryerrors for providing an example...

@guilbaults
Copy link
Owner

Within ibquerryerrors, I think --Ca and --Port is only intended to select the local Infiniband port to use (for setup where there is 2 cards or ports on different networks)

  --Ca, -C <ca>           Ca name to use
  --Port, -P <port>       Ca port number to use

Since ibquerryerrors scan and discover the entire fabric at every execution, I don't think it's possible to make it faster with ibquerryerrors

However, using the C API directly and running those commands in parallel to send the MAD packets to all the switches/ports should be fast enough to get the values in 1 second. With that API, it should also be possible to scan the entire fabric once in a while to discover new switches and nodes (and not at every cycle).

https://github.com/linux-rdma/infiniband-diags/blob/master/src/ibqueryerrors.c#L193
https://github.com/linux-rdma/infiniband-diags/blob/master/src/ibqueryerrors.c#L397

Redoing the exporter in Go and gathering the counters directly without using ibquerryerrors is something I had in mind, this will also fix #48 because it will also report nodes without errors. I currently do not really have time to redo this exporter.

@gabrieleiannetti
Copy link
Contributor

I thought to execute the exporter locally on each host to query its channel adapter.
Then the runtime should be pretty low instead of querying the whole fabric.

I think we can close the issue?

@qq732552048
Copy link
Author

谢谢您解答了我的疑惑,我明白怎么做了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants