Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/resourcedetection]: Add support for k8s.cluster.name where possible #26794

Closed
gouthamve opened this issue Sep 18, 2023 · 22 comments · Fixed by #28649
Closed

[processor/resourcedetection]: Add support for k8s.cluster.name where possible #26794

gouthamve opened this issue Sep 18, 2023 · 22 comments · Fixed by #28649
Assignees
Labels
enhancement New feature or request processor/resourcedetection Resource detection processor Stale

Comments

@gouthamve
Copy link
Member

Component(s)

processor/k8sattributes

Is your feature request related to a problem? Please describe.

When using the k8sattributes processor, I shouldn't need to declare any attributes manually. I still need to manually specify k8s.cluster.name as the processor doesn't support it yet.

There is no standard API to get the cluster name, but GKE and EKS make it easy to detect. We should support this in the processor and provide k8s.cluster.name where possible.

Describe the solution you'd like

For GKE, See: https://github.com/open-telemetry/opentelemetry-js-contrib/blob/deb9aa441dc7d2b0fd5ec11b41c934a1e93134fd/detectors/node/opentelemetry-resource-detector-gcp/src/detectors/GcpDetector.ts#L81 (it gets it from curl http://metadata/computeMetadata/v1/instance/attributes/cluster-name -H "Metadata-Flavor: Google")

For EKS see: https://github.com/open-telemetry/opentelemetry-js-contrib/blob/deb9aa441dc7d2b0fd5ec11b41c934a1e93134fd/detectors/node/opentelemetry-resource-detector-aws/src/detectors/AwsEksDetector.ts#L86 (it gets it from the cert name on kubernetes.default.svc, iiuc)

Describe alternatives you've considered

No response

Additional context

No response

@gouthamve gouthamve added enhancement New feature or request needs triage New item requiring triage labels Sep 18, 2023
@github-actions github-actions bot added the processor/k8sattributes k8s Attributes processor label Sep 18, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

I wonder if this would be better suited for the resourcedetectionprocessor where we already have some GKE and EKS specific functionality

@crobert-1
Copy link
Member

The resource detection processor is the best place for this. The k8s cluster name is actually already included for GKE, but not for EKS. I think it's a good idea.

@crobert-1 crobert-1 added processor/resourcedetection Resource detection processor and removed processor/k8sattributes k8s Attributes processor needs triage New item requiring triage labels Sep 19, 2023
@github-actions
Copy link
Contributor

Pinging code owners for processor/resourcedetection: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth TylerHelmuth changed the title k8sattributes: Add support for k8s.cluster.name where possible [processor/resourcedetection]: Add support for k8s.cluster.name where possible Sep 19, 2023
@crobert-1
Copy link
Member

I'll work to add this.

@crobert-1 crobert-1 self-assigned this Sep 19, 2023
@jinja2
Copy link
Contributor

jinja2 commented Sep 21, 2023

re: EKS cluster name, the example linked library is getting the cluster name from configmap cluster-info from namespace amazon-cloudwatch. I think the referenced configmap is only created in an eks cluster if cloudwatch agent with fluentd is installed (checked an eks cluster and this is definitely not available by default). So I don't think we should be adopting the same approach.

The aws-auth configmap which is used in the detector to verify if the collector is running in an EKS cluster also does not have any info which can be used to derive the cluster name.

EC2 instances run as workers in eks should have tags such as kubernetes.io/cluster/CLUSTER_NAME, eks:cluster-name or the billing tag aws:eks:cluster-name. So checking the tags might be a viable approach. One issue with this might be access to querying tags. Tags are not available from instance metadata by default, so the IAM role assigned to the node might need updating (usually the AWS managed role AmazonEKSWorkerNodePolicy is used and it allows ec2:DescribeInstance).

@crobert-1
Copy link
Member

crobert-1 commented Sep 21, 2023

Thanks for calling this out @jinja2. I was planning on re-using the opentelemetry-go-contrib's method of detecting the cluster name, but it looks like it's using the same methodology you're referring to here. There's also an open issue that echoes your comment here. (There's a comment here pointing out the same cloudwatch dependency problem.)

I'll try these others options, thanks again!

@jsirianni
Copy link
Member

I ran into this today, I need a way to detect cluster name using a generic config running in multiple clouds (GKE, AKS, EKS). It looks like EKS is actively being worked on. Does AKS require a contributor?

@crobert-1
Copy link
Member

I ran into this today, I need a way to detect cluster name using a generic config running in multiple clouds (GKE, AKS, EKS). It looks like EKS is actively being worked on. Does AKS require a contributor?

@jsirianni It'd be great to add that as well, you're more than welcome to work on it if you'd like!

@jsirianni
Copy link
Member

I ran into this today, I need a way to detect cluster name using a generic config running in multiple clouds (GKE, AKS, EKS). It looks like EKS is actively being worked on. Does AKS require a contributor?

@jsirianni It'd be great to add that as well, you're more than welcome to work on it if you'd like!

Great, I will give it a shot 👍

@jsirianni
Copy link
Member

@crobert-1 I have spent some time playing with the Azure metadata service used by resource detection here https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/internal/metadataproviders/azure/metadata.go.

They do not expose the cluster name in a useful way. I am not sure we can detect it easily without using the Azure SDK (and probably requiring some form of authentication).

I did notice that the DataDog exporter attempts to detect the cluster name, but the solution is not very robust. https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/datadogexporter/internal/hostmetadata/internal/azure/provider.go#L36.

When you create an AKS cluster, the default behavior is for Azure to create a resource group for the node pool, with its name derived from the cluster's resource group, name and location.

It usually looks like this:

# resource group: otel
# cluster name: test
# location: northcentralus
MC_otel_test_northcentralus

But it could easily be something like MC_my_resource_group_my_cluster_northcentralus

This is not a great solution because resource name and cluster name can contain underscores. The user can also specify their cluster's node pool's resource group name, instead of using the default option.

Do you have an opinion on how we should proceed?

@crobert-1
Copy link
Member

I agree that name parsing doesn't seem like a very good option here. I'm not very familiar with AKS and Azure's go SDK, so I don't have any good direction there either.

I think it comes down to what the user wants. If they just want something to identify the cluster with the understanding it could be wrong, then parsing may not be too bad here. We'd have to make sure it was clear to users that it may be wrong if we went this route.

It took a while to put together the proper EKS methods and libraries to use to get to the cluster name, so it might be hidden somewhere in the AKS documentation, I'm not sure.

@jinja2
Copy link
Contributor

jinja2 commented Oct 12, 2023

I think it is acceptable to best effort detect the cluster name the way the exporter does it (bail if more than 3 underscores) as long as it is documented as such. From AKS docs, I see they have a reserved node label kubernetes.azure.com/cluster. Can you check if this is set to just the cluster name? We might not want to add k8s api call to the mix here since it is supposed to be a cloud provider specific detector, but this might be useful.

@jsirianni
Copy link
Member

I think it is acceptable to best effort detect the cluster name the way the exporter does it (bail if more than 3 underscores) as long as it is documented as such. From AKS docs, I see they have a reserved node label kubernetes.azure.com/cluster. Can you check if this is set to just the cluster name? We might not want to add k8s api call to the mix here since it is supposed to be a cloud provider specific detector, but this might be useful.

Good call out on the label.

My test cluster gives me the following where my cluster name is jsirianni in resource group devops:

service_azure@Azure:~$ kubectl get node aks-agentpool-99263437-vmss000000 -o json | jq .metadata.labels
{
  "agentpool": "agentpool",
...
  "kubernetes.azure.com/agentpool": "agentpool",
  "kubernetes.azure.com/cluster": "MC_devops_jsirianni_northcentralus",
 ...
  "topology.kubernetes.io/region": "northcentralus",
  "topology.kubernetes.io/zone": "0"
}

The label's value looks similar to what we get from the metadata service. I will play around with different resource group and cluster names to see how it behaves. If it is always the same as the metadata service, perhaps we can just use the value of resourceGroupName as the cluster name?

  1. best effort parsing
  2. fall back on using the unparsed value

@jsirianni
Copy link
Member

I tested and this is what I have observed. The node label "kubernetes.azure.com/cluster" matches the value for ResourceGroupName, which is already available to the resource detection processor, just unused.

This value differs from the actual name of the cluster, but it might be enough to uniquely identify the cluster without over-complicating the implementation.

My cluster with name "stage" was deployed to resource group "devops" and used the default generated infrastructure resource group. It ends up with the following value:

MC_devops_stage_northcentralus

My second cluster, production_cluster was deployed to resource group devops and has the node pool resource group prod_infra. It ends up with the following value:

prod_infra

If I try to deploy a third cluster with name "test" but with infrastructure group prod_infra (used by my previous cluster), azure rejects it with:

A resource group already exists in subscription "redacted" with the same name

Additionally, Azure prefers that users do not override the default infrastructure resource group.

Screenshot from 2023-10-12 13-28-30

Screenshot from 2023-10-12 13-28-41

In summary, the IMDS endpoint already in use provides this exact value, and I think we can rely on it to uniquely identify the cluster.

@crobert-1 @jinja2 Unless there is an objection, I am going to move forward with a pull request using the value of ResourceGroupName to set k8s.cluster.name.

@jinja2
Copy link
Contributor

jinja2 commented Oct 12, 2023

My second cluster, production_cluster was deployed to resource group devops and has the node pool resource group prod_infra. It ends up with the following value:

prod_infra

To clarify, if the nodepool is created in a different resource group, the ResourceGroupName from IMDS does not have the aks cluster name right? Can nodepools in the cluster have different resource groups (e.g. the default nodepool was created in the cluster group, but another nodepool with separate resource group is added)? If yes, then this would result in nodes reporting different cluster names if defaulted to ResourceGroupName. And I think we are better of not setting the attr is the ResourceGroupName does not match MC_resourcegroup_cluster_region pattern.

Is the label value "kubernetes.azure.com/cluster" on the node still MC_resourcegroup_cluster_region irrespective of its pool's resource group?

@jsirianni
Copy link
Member

I may have worded it poorly, it is not "node pool resource group", it is "Infrastructure Resource group" and it is set when creating the cluster. It is also not usable by more than one cluster.

Screenshot from 2023-10-12 14-02-31

Azure does not allow you to define the resource group that the node pool will belong to, so all node pools should fall under the same infrastructure resource group.

Screenshot from 2023-10-12 14-02-44

To test this, I have deployed a new cluster and attached a second node pool. I was unable to define an alternative resource group. When checking the node labels, all nodes in the cluster match.

service_azure@Azure:~$ kubectl get node -o yaml | grep kubernetes.azure.com/cluster:
      kubernetes.azure.com/cluster: MC_devops_test_northcentralus
      kubernetes.azure.com/cluster: MC_devops_test_northcentralus
      kubernetes.azure.com/cluster: MC_devops_test_northcentralus

@jinja2
Copy link
Contributor

jinja2 commented Oct 16, 2023

Thank you for trying out the different combinations. To summarize my understanding of AKS cluster and resource group -

There are 2 resourceGroups in play in an AKS cluster. One is the resource group in which AKS k8s service is created, and second is the resource group in which VMs, network, etc. infra resources are created. This 2nd group name is what we see when querying a VM's metadata endpoint. By default, this 2nd group is named as MC_AKSResourceGroup_ClusterName_Location. But it is possible for users to provide a custom node resource group name.

I assume we want the k8s.cluster.name attribute value to match the name of the AKS cluster resource (result of az aks list --resource-group groupName --query "[].name"), and not the name of the node's resource group. In this case, we seem to have only one option, parse the resource group name from VM's metadata, and if it matches this pattern MC_AKSResourceGroup_ClusterName_Location extract the cluster name. I don't think we should fallback to setting it to the full resourceGroupName of the VM which could be a custom group, and not what users think of as the cluster name. There doesn't seem to be a better way to extract this information from metadata endpoint, and the parsing/extraction will work iff, the node-resource-group is not explicitly set and the cluster's name and resource group name do not have underscores. This seems very restrictive, and maybe not worth adding after all. There is also the edge case that user could have their custom group name also match the pattern. What do others think about this?

@jsirianni
Copy link
Member

Hi @jinja2

Sorry for the delayed response, I have a draft PR here that I will open against this repo if we think it is satisfactory.

There are 2 resourceGroups in play in an AKS cluster. One is the resource group in which AKS k8s service is created, and second is the resource group in which VMs, network, etc. infra resources are created. This 2nd group name is what we see when querying a VM's metadata endpoint. By default, this 2nd group is named as MC_AKSResourceGroup_ClusterName_Location. But it is possible for users to provide a custom node resource group name.

I assume we want the k8s.cluster.name attribute value to match the name of the AKS cluster resource (result of az aks list --resource-group groupName --query "[].name"), and not the name of the node's resource group. In this case, we seem to have only one option, parse the resource group name from VM's metadata, and if it matches this pattern MC_AKSResourceGroup_ClusterName_Location extract the cluster name. I don't think we should fallback to setting it to the full resourceGroupName of the VM which could be a custom group, and not what users think of as the cluster name. There doesn't seem to be a better way to extract this information from metadata endpoint, and the parsing/extraction will work iff, the node-resource-group is not explicitly set and the cluster's name and resource group name do not have underscores. This seems very restrictive, and maybe not worth adding after all. There is also the edge case that user could have their custom group name also match the pattern. What do others think about this?

I think your assessment is accurate. When querying the IMDS endpoint, the resource group name in the response is sufficient to identify the cluster.

If we can extract the cluster name, that is ideal, however, falling back onto the full group name is fine because it is unique within the azure account. I tried to create multiple AKS clusters (in different resource groups) using the same name for the infrastructure resource group, and Azure prevented me from proceeding. It shows an error saying that the name is already in use.

dmitryax pushed a commit that referenced this issue Nov 21, 2023
…nvironment (#28649)

**Description:** 
This enhancement detects the k8s cluster name in EKS. The solution uses
EC2 instance tags to determine the cluster name, which means it will
only work on EC2 (as noted in documentation updates).

Resolves #26794

---------

Co-authored-by: bryan-aguilar <46550959+bryan-aguilar@users.noreply.github.com>
RoryCrispin pushed a commit to ClickHouse/opentelemetry-collector-contrib that referenced this issue Nov 24, 2023
…nvironment (open-telemetry#28649)

**Description:** 
This enhancement detects the k8s cluster name in EKS. The solution uses
EC2 instance tags to determine the cluster name, which means it will
only work on EC2 (as noted in documentation updates).

Resolves open-telemetry#26794

---------

Co-authored-by: bryan-aguilar <46550959+bryan-aguilar@users.noreply.github.com>
@dmitryax dmitryax reopened this Jan 8, 2024
@dmitryax
Copy link
Member

dmitryax commented Jan 8, 2024

EKS is done. AKS is still in progress #29328

dmitryax pushed a commit that referenced this issue Jan 29, 2024
… name (#29328)

**Description:**

Added best effort support for detecting Azure Kubernetes Service cluster
name: `k8s.cluster.name`.

The cluster name can be extracted from the cluster's "resource group
name" which is retrieved using existing functionality. The
`parseClusterName` function has comments explaining the limitations.

**Link to tracking Issue:**

#26794

**Testing:**

I added unit tests for each scenario, and have tested against live AKS
clusters that fit each scenario. I am happy to spin these up if anyone
has any questions.

Added `k8s.cluster.name` to the list of AKS resource attributes.
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Feb 1, 2024
… name (open-telemetry#29328)

**Description:**

Added best effort support for detecting Azure Kubernetes Service cluster
name: `k8s.cluster.name`.

The cluster name can be extracted from the cluster's "resource group
name" which is retrieved using existing functionality. The
`parseClusterName` function has comments explaining the limitations.

**Link to tracking Issue:**

open-telemetry#26794

**Testing:**

I added unit tests for each scenario, and have tested against live AKS
clusters that fit each scenario. I am happy to spin these up if anyone
has any questions.

Added `k8s.cluster.name` to the list of AKS resource attributes.
anthoai97 pushed a commit to anthoai97/opentelemetry-collector-contrib that referenced this issue Feb 12, 2024
… name (open-telemetry#29328)

**Description:**

Added best effort support for detecting Azure Kubernetes Service cluster
name: `k8s.cluster.name`.

The cluster name can be extracted from the cluster's "resource group
name" which is retrieved using existing functionality. The
`parseClusterName` function has comments explaining the limitations.

**Link to tracking Issue:**

open-telemetry#26794

**Testing:**

I added unit tests for each scenario, and have tested against live AKS
clusters that fit each scenario. I am happy to spin these up if anyone
has any questions.

Added `k8s.cluster.name` to the list of AKS resource attributes.
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 11, 2024
@crobert-1
Copy link
Member

Closing as k8s.cluster.name support has been added for EKS and AKS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/resourcedetection Resource detection processor Stale
Projects
None yet
6 participants