Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/vcenter] No Explicit Distinction Between VMs & VM Templates #32821

Closed
StefanKurek opened this issue May 2, 2024 · 9 comments
Closed
Labels
bug Something isn't working needs triage New item requiring triage receiver/vcenter

Comments

@StefanKurek
Copy link
Contributor

Component(s)

receiver/vcenter

What happened?

Description

Currently this receiver makes no distinction between VMs & VM Templates. VM Templates can be used to quickly spin up a VM, but can not be turned on/active themselves. In the vCenter UI, these two resource types are differentiated throughout.

One area in the receiver where this is most noticeable is the vcenter.cluster.vm.count metric (it includes the count of VM templates in the value where the vm_count_power_state is 'poweredOff`).

Another is how VM & VM templates resources that are returned aren't explicitly differentiated. They have some implicit differences though.

Currently VM Templates are represented as Resources, but do NOT have the Resource Attributes vcenter.resource_pool.name, vcenter.resource_pool.inventory_path, vcenter.virtual_app.name, or vcenter.virtual_app.inventory_path. Technically a standard VM could also be returned without these attributes as well if permissions were incorrect for grabbing Resource Pools from the API (although this would be a prerequisite error and should be extremely rare).

Some metrics will also never return for a VM Template Resource, but this occurs for VMs under certain conditions as well.

Steps to Reproduce

Collect against any vCenter environment with both VMs & VM Templates.

Expected Result

See some sort of differentiation in the vcenter.cluster.vm.count metric between VMs & VM templates. Also be able to see some explicit Resource Attribute indicator that a resource is a VM template rather than a standard VM.

Actual Result

vcenter.cluster.vm.count wraps templates up in its returned values. Returned Resources that represent VM Templates don't have explicit indicators that they are templates.

Collector version

v1.6.0/v0.99.0

Environment information

No response

OpenTelemetry Collector configuration

extensions:
  basicauth/prom:
    client_auth:
      username: [PROMUSER]
      password: [PROMPASS]

exporters:
  prometheusremotewrite:
    endpoint: [PROMENDPOINT]
    auth:
      authenticator: basicauth/prom
    resource_to_telemetry_conversion:
      enabled: true # Convert resource attributes to metric labels

processors:
  batch:
    # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor

receivers:
  vcenter:
    endpoint: https://[VCENTERHOST]
    username: [VCENTERUSER]
    password: [VCENTERPASS]
    tls:
      insecure: true
    collection_interval: 1m
    initial_delay: 1s

service:
  extensions: [basicauth/prom]
  pipelines:
    metrics:
      receivers: [vcenter]
      processors: [batch]
      exporters: [prometheusremotewrite]

Log output

No response

Additional context

No response

@StefanKurek StefanKurek added bug Something isn't working needs triage New item requiring triage labels May 2, 2024
@StefanKurek
Copy link
Contributor Author

StefanKurek commented May 2, 2024

@djaglowski @schmikei I think for the cluster count metric, this is decently straighforward. We could either

  1. Add in a new boolean metric attribute which indicates if the returned value is for templates vs standard VMs. Some follow up questions need to be asked for this route though. Should the template counts be reported with the attribute vm_count_power_state as "poweredOff" (technically true...but a little weird). Should we omit this attribute entirely for a VM template count value (is this possible)?
  2. Add an entirely new metric for VM template count.

For differentiating explicitly between VM and VM Template resources we could just add in a boolean Resource Attribute to tack on to VM Template resources. The main problem is that I don't think there is any way to 100% guarantee that a returned VM template is a template. If it doesn't have a parent resource pool property, that is a "really good" indicator that it is a template. But like I mentioned above, it could happen from bad permissions. IMO, this is still a good enough approach, and better than keeping VM Templates lumped with VMs.

Let me know if you have thoughts.

@djaglowski
Copy link
Member

I would suggest just excluding templates from the existing count and adding a new metric specifically for templates.

For differentiating explicitly between VM and VM Template resources we could just add in a boolean Resource Attribute to tack on to VM Template resources. The main problem is that I don't think there is any way to 100% guarantee that a returned VM template is a template. If it doesn't have a parent resource pool property, that is a "really good" indicator that it is a template. But like I mentioned above, it could happen from bad permissions. IMO, this is still a good enough approach, and better than keeping VM Templates lumped with VMs.

I don't see an obvious solution here. I'm surprised there isn't something in the API which indicates clearly whether or not it's a template. If the API can't indicate it, then I don't think we should attempt to infer it.

@StefanKurek
Copy link
Contributor Author

StefanKurek commented May 2, 2024

@djaglowski that would also imply that we can't change the current VM count metric for Clusters either. All I can say is that in our two environments, all full VMs are returned with a Resource Pool & all VM templates do not have this property (A Resource Pool is required to be specified when creating a VM).

@StefanKurek
Copy link
Contributor Author

I'll add on this. I can understand the position of not doing anything because the API does not make the differentiation clear. But without doing anything, we are guaranteed to be incorrect about VMs vs Templates when both exist in an environment. If we try to differentiate, we have the possibility to be incorrect (I can't guarantee it but like I said I'm pretty sure it would only happen if for some reason the user had permission to read VM data but not Resource Pool data). But we should always be correct as long as this isn't the case.

@StefanKurek
Copy link
Contributor Author

@djaglowski OK, I'm a liar. On another pass of the nested properties I found an explicit template boolean one for VirtualMachine resources. See, all I needed was someone to challenge me for me to dig and find out how to do it correctly 15 minutes later 😆 .

Ultimately this means that it would be easy to add a new Resource Attribute that gets added to template resources.

@djaglowski
Copy link
Member

For differentiating explicitly between VM and VM Template resources we could just add in a boolean Resource Attribute to tack on to VM Template resources

Would it make sense to just apply different names to one or more existing resource attributes? e.g. ...vm_template.name

Copy link
Contributor

github-actions bot commented May 2, 2024

Pinging code owners for receiver/vcenter: @djaglowski @schmikei @StefanKurek. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@StefanKurek
Copy link
Contributor Author

StefanKurek commented May 2, 2024

For differentiating explicitly between VM and VM Template resources we could just add in a boolean Resource Attribute to tack on to VM Template resources

Would it make sense to just apply different names to one or more existing resource attributes? e.g. ...vm_template.name

@djaglowski I considered this. And this could work as well. But then the question comes up of should we do the same thing for the various metric names that still apply to templates (make new metrics replacing vm with vm_template).

Ultimately I don't feel strongly one way or the other as far as the resource attributes go.

@djaglowski
Copy link
Member

I don't think it's necessary to change the metric names as well. My suggestion for adding new resource attributes is because the project more or less differentiates between resource types based on the presence of attributes, rather than by having attributes that explicitly state the type.

djaglowski pushed a commit that referenced this issue May 3, 2024
**Description:** <Describe what has changed.>
- The Cluster `vcenter.cluster.vm.count` metric now does not include
templates in the returned values.
- A new (default disabled with warning) Cluster
`vcenter.cluster.vm_template.count`metric is added to count only VM
templates.
- VM Template Resource Attributes `vcenter.vm_template.name` and
`vcenter.vm_template.id` have been added (default disabled with
warning). If disabled, resources will no longer be created for VM
templates.
- Trimmed down VM Template metrics to a single expected metric.
- Fixed integration test build tag that was accidentally removed in
previous PR

**Link to tracking Issue:** <Issue number if applicable>
#32821 

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit/integration tests updated and tested. Local environment tested.

**Documentation:** <Describe the documentation added.>
New documentation generated based on the metadata.
djaglowski pushed a commit that referenced this issue May 8, 2024
…0 Release (#32913)

**Description:** <Describe what has changed.>
A number of configurations were disabled by default and had warnings
that they were going to be enabled in v0.101.0 (1 metric had a warning
that it was going to be removed).

Now that v0.100.0 has been release, I have removed all of these
warnings, and made the modifications that the warnings "warned" about. I
have also updated the tests to reflect this.

**Link to tracking Issue:** <Issue number if applicable>
#32803 #32805 #32821 #32531 #32557

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit/integration tests updated and tested. Local environment tested.

**Documentation:** <Describe the documentation added.>
New documentation generated based on the metadata.
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
…try#32831)

**Description:** <Describe what has changed.>
- The Cluster `vcenter.cluster.vm.count` metric now does not include
templates in the returned values.
- A new (default disabled with warning) Cluster
`vcenter.cluster.vm_template.count`metric is added to count only VM
templates.
- VM Template Resource Attributes `vcenter.vm_template.name` and
`vcenter.vm_template.id` have been added (default disabled with
warning). If disabled, resources will no longer be created for VM
templates.
- Trimmed down VM Template metrics to a single expected metric.
- Fixed integration test build tag that was accidentally removed in
previous PR

**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#32821 

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit/integration tests updated and tested. Local environment tested.

**Documentation:** <Describe the documentation added.>
New documentation generated based on the metadata.
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
…0 Release (open-telemetry#32913)

**Description:** <Describe what has changed.>
A number of configurations were disabled by default and had warnings
that they were going to be enabled in v0.101.0 (1 metric had a warning
that it was going to be removed).

Now that v0.100.0 has been release, I have removed all of these
warnings, and made the modifications that the warnings "warned" about. I
have also updated the tests to reflect this.

**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#32803 open-telemetry#32805 open-telemetry#32821 open-telemetry#32531 open-telemetry#32557

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit/integration tests updated and tested. Local environment tested.

**Documentation:** <Describe the documentation added.>
New documentation generated based on the metadata.
jlg-io pushed a commit to jlg-io/opentelemetry-collector-contrib that referenced this issue May 14, 2024
…try#32831)

**Description:** <Describe what has changed.>
- The Cluster `vcenter.cluster.vm.count` metric now does not include
templates in the returned values.
- A new (default disabled with warning) Cluster
`vcenter.cluster.vm_template.count`metric is added to count only VM
templates.
- VM Template Resource Attributes `vcenter.vm_template.name` and
`vcenter.vm_template.id` have been added (default disabled with
warning). If disabled, resources will no longer be created for VM
templates.
- Trimmed down VM Template metrics to a single expected metric.
- Fixed integration test build tag that was accidentally removed in
previous PR

**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#32821 

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit/integration tests updated and tested. Local environment tested.

**Documentation:** <Describe the documentation added.>
New documentation generated based on the metadata.
jlg-io pushed a commit to jlg-io/opentelemetry-collector-contrib that referenced this issue May 14, 2024
…0 Release (open-telemetry#32913)

**Description:** <Describe what has changed.>
A number of configurations were disabled by default and had warnings
that they were going to be enabled in v0.101.0 (1 metric had a warning
that it was going to be removed).

Now that v0.100.0 has been release, I have removed all of these
warnings, and made the modifications that the warnings "warned" about. I
have also updated the tests to reflect this.

**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#32803 open-telemetry#32805 open-telemetry#32821 open-telemetry#32531 open-telemetry#32557

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit/integration tests updated and tested. Local environment tested.

**Documentation:** <Describe the documentation added.>
New documentation generated based on the metadata.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage receiver/vcenter
Projects
None yet
Development

No branches or pull requests

3 participants