Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

double reporting metrics for hpa and cronjob in certain k8s versions #26551

Closed
jinja2 opened this issue Sep 8, 2023 · 3 comments
Closed

double reporting metrics for hpa and cronjob in certain k8s versions #26551

jinja2 opened this issue Sep 8, 2023 · 3 comments
Labels
bug Something isn't working receiver/k8scluster

Comments

@jinja2
Copy link
Contributor

jinja2 commented Sep 8, 2023

Component(s)

receiver/k8scluster

What happened?

Description

Receiver double reports metrics for hpa and cronjob when running in certain versions of k8s cluster. The k8scluster receiver currently makes 2 calls to get hpa objects when run in k8s versions 1.23 to 1.25. We collect once with the v2beta2 api and a 2nd time in the v2 format. Both of these calls will return the same hpa objects, irrespective of which apiVersion was used to create the hpa. K8s api server handles the conversion between versions transparently: all the different versions are representations of the same persisted data.

So if a user originally created an hpa using the v2beta2 version of its api, the receiver can query that hpa using either the v2beta2 or the v2 version. This means that if we stop collecting with the v2beta2 api, the only clusters which will be impacted are those still running k8s v1.22 and lower, since the v2 api version is available starting 1.23.

Upstream k8s currently maintains 1.25 to 1.28. Below are EOL dates for the major managed clusters, all of which have no support for 1.22 (the last version which would need the hpa beta):

https://endoflife.date/google-kubernetes-engine
https://endoflife.date/azure-kubernetes-service
https://endoflife.date/amazon-eks

So we will still be compatible with latest - 5 here.

Steps to Reproduce

Collect hpa metrics from k8s cluster running version 1.23 to 1.25.
Here's the test setup scripts and the e2e test I performed. In the test, I create an hpa hpa-v2 with apiVersion v2 and an hpa hpa-v2beta2 with the apiVersion v2beta2. The metrics we get from the collector for these 2 hpas are here. We see the same metrics twice for the 2 hpa due to the receiver querying in 2 api formats.

Expected Result

There should be no duplicate metrics for hpa.

Actual Result

There are duplicate metrics for hpa.

image

Collector version

latest

Environment information

Environment

k8s v1.23

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@jinja2 jinja2 added bug Something isn't working needs triage New item requiring triage labels Sep 8, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dmitryax
Copy link
Member

dmitryax commented Sep 8, 2023

@jinja2 thank you for putting this together. This is very informative.

I believe we can drop the duplicated v2beta2 versions along with 1.22- support (we should document that).

Going forward we should probably still keep the discovery logic, but instead of calling both deprecated/new versions, we should call the latest supported by the cluster, right?

@jinja2
Copy link
Contributor Author

jinja2 commented Sep 11, 2023

Going forward we should probably still keep the discovery logic, but instead of calling both deprecated/new versions, we should call the latest supported by the cluster, right?

I think we'd be okay without the extra logic to support multiple apiVersions, at least for the objects we are querying in the receiver right now. They are fairly stable native k8s kinds. But if we find ourself having to pick between versions, yes, we should update the discovery logic. The api exposes the preferred version for a group kind, and we should check the in-code options against that. I'll create a new issue for this, and work on it. We can go ahead with the api removal in this issue which is holding up auto-upgrades in some of the managed clusters.

Jfyi for users, you can still force upgrade the clusters (even with the fix applied, I think GKE will only attempt the upgrade 30days after the calls for deprecated api stop) to 1.26 and the older versions of collectors will stop querying the removed api.

dmitryax added a commit that referenced this issue Sep 11, 2023
…es (#26516)

**Description:**
Remove support for deprecated Kubernetes API resources: 
- `batch/v1beta1`
- `autoscaling/v2beta2`

This also resolves the issue with double reporting metrics for hpa and
cronjob in certain k8s versions.

**Link to tracking Issue(s):**
#23612 #26551

**Testing:**
From source, built the custom image
[coolboi567/otelcontribcol:0.83.1](https://hub.docker.com/layers/coolboi567/otelcontribcol/0.83.1/images/sha256-fe111e1dff87a26eb64217a8505b84679263c8d7b3ffa16656bba1c1865052d5?context=explore)
and tested in kubernetes cluster `v1.25` as well as `v1.27`.

In K8s `v1.25`, we no longer see the warnings in the logs about usage of
deprecated APIs.

**Documentation:**
N/A

---------

Signed-off-by: Prashant Shahi <me@prashantshahi.dev>
Co-authored-by: Dmitrii Anoshin <anoshindx@gmail.com>
jorgeancal pushed a commit to jorgeancal/opentelemetry-collector-contrib that referenced this issue Sep 18, 2023
…es (open-telemetry#26516)

**Description:**
Remove support for deprecated Kubernetes API resources: 
- `batch/v1beta1`
- `autoscaling/v2beta2`

This also resolves the issue with double reporting metrics for hpa and
cronjob in certain k8s versions.

**Link to tracking Issue(s):**
open-telemetry#23612 open-telemetry#26551

**Testing:**
From source, built the custom image
[coolboi567/otelcontribcol:0.83.1](https://hub.docker.com/layers/coolboi567/otelcontribcol/0.83.1/images/sha256-fe111e1dff87a26eb64217a8505b84679263c8d7b3ffa16656bba1c1865052d5?context=explore)
and tested in kubernetes cluster `v1.25` as well as `v1.27`.

In K8s `v1.25`, we no longer see the warnings in the logs about usage of
deprecated APIs.

**Documentation:**
N/A

---------

Signed-off-by: Prashant Shahi <me@prashantshahi.dev>
Co-authored-by: Dmitrii Anoshin <anoshindx@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/k8scluster
Projects
None yet
Development

No branches or pull requests

4 participants