Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/vcenter]: Optimize vCenter Receiver to use concurrency #31837

Open
schmikei opened this issue Mar 19, 2024 · 4 comments
Open

[receiver/vcenter]: Optimize vCenter Receiver to use concurrency #31837

schmikei opened this issue Mar 19, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request receiver/vcenter

Comments

@schmikei
Copy link
Contributor

Component(s)

receiver/vcenter

Is your feature request related to a problem? Please describe.

I'm lead to believe that larger environments have a harder time using the receiver due to the large number of requests that need to be made.

For larger environments the amount of requests can show how the current synchronous code has some room for improvement. I think once we scaled up it was pretty evident that we were bottlenecking on requests, particularly for VMs.

Describe the solution you'd like

From my recollection I don't think that the MetricsBuilder maintains some state that prevented easy parallelization, however we could parallelize/batch the requests to get better performance.

The tooling mitmproxy really helped us identify the issue as the massive number of requests made individually were stacking up to be quite a time sink.

Describe alternatives you've considered

No response

Additional context

For an environment of 2000 VMs it was taking sometimes minutes to complete a collection, based off of early experiments, hoping to cut it down to a collection interval of seconds.

@schmikei schmikei added enhancement New feature or request needs triage New item requiring triage labels Mar 19, 2024
Copy link
Contributor

Pinging code owners for receiver/vcenter: @djaglowski @schmikei. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@schmikei
Copy link
Contributor Author

Me and @StefanKurek plan to be taking this effort on, feel free to assign us to it

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Mar 19, 2024
@atoulme
Copy link
Contributor

atoulme commented Mar 23, 2024

maybe this spike can help: #30624

djaglowski pushed a commit that referenced this issue Apr 17, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Changes the method for collecting VMs used by the `vccenterreceiver` to
the more time-efficient `CreateContainerView` method. This is the first
step to addressing the issue linked below.

**Link to tracking Issue:** #31837

**Testing:** These changes were tested on an environment with 200+
virtual machines. The original collection time was ~80 seconds.
Collection times with these changes are ~40 seconds.

**Documentation:** N/A

---------

Co-authored-by: Stefan Kurek <stefan.kurek@observiq.com>
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
…lemetry#32201)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Changes the method for collecting VMs used by the `vccenterreceiver` to
the more time-efficient `CreateContainerView` method. This is the first
step to addressing the issue linked below.

**Link to tracking Issue:** open-telemetry#31837

**Testing:** These changes were tested on an environment with 200+
virtual machines. The original collection time was ~80 seconds.
Collection times with these changes are ~40 seconds.

**Documentation:** N/A

---------

Co-authored-by: Stefan Kurek <stefan.kurek@observiq.com>
djaglowski pushed a commit that referenced this issue May 13, 2024
**Description:** <Describe what has changed.>
There were already some improvements made as far as how networks calls
were made centered around Virtual Machines. This allowed collection
times to decrease from ~90s to ~27s in an environment with 1 Cluster, 2
Hosts, & 280 VMs.

Making similar changes for all resource types helped to further decrease
collection times. Now collection time has decreased from ~27s to <~3s
for the same environment.

Here's a general list of the changes made:
- Now makes all network calls (per datacenter) first and stores returned
data.
- Processes this data afterwards to convert to OTEL resources/metrics
(refactored to new file).
- Moves all metric recording to metrics.go to keep consistent.
- Moves all resource builder creation to resources.go to keep
consistent.
- Updates/fixes tests.

**Link to tracking Issue:** <Issue number if applicable>
#31837 Although this issue prescribes a solution to the problem
(goroutines) which ended up not being necessary

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit Tests & Integration Tests Passing as well as Manual Testing in
Local Environments

**Documentation:** <Describe the documentation added.>
N/A
jlg-io pushed a commit to jlg-io/opentelemetry-collector-contrib that referenced this issue May 14, 2024
…metry#32991)

**Description:** <Describe what has changed.>
There were already some improvements made as far as how networks calls
were made centered around Virtual Machines. This allowed collection
times to decrease from ~90s to ~27s in an environment with 1 Cluster, 2
Hosts, & 280 VMs.

Making similar changes for all resource types helped to further decrease
collection times. Now collection time has decreased from ~27s to <~3s
for the same environment.

Here's a general list of the changes made:
- Now makes all network calls (per datacenter) first and stores returned
data.
- Processes this data afterwards to convert to OTEL resources/metrics
(refactored to new file).
- Moves all metric recording to metrics.go to keep consistent.
- Moves all resource builder creation to resources.go to keep
consistent.
- Updates/fixes tests.

**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#31837 Although this issue prescribes a solution to the problem
(goroutines) which ended up not being necessary

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit Tests & Integration Tests Passing as well as Manual Testing in
Local Environments

**Documentation:** <Describe the documentation added.>
N/A
@StefanKurek
Copy link
Contributor

@schmikei I think this can be closed now with the recent performance enhancements. I'll let you decide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request receiver/vcenter
Projects
None yet
Development

No branches or pull requests

4 participants