-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service principal with working Azure Roles as tf context is unable to authenticate via kubernetes provider block but az aks get-credentials and kubectl get pods -n xy works #20843
Comments
Linked comment with root cause analysis and advise to file it with azurerm provider: #issuecomment-1435228138 |
Hello @slzmruepp, thanks for opening this issue. According to your error message:
According to the provider code: func dataSourceKubernetesClusterRead(d *pluginsdk.ResourceData, meta interface{}) error {
client := meta.(*clients.Client).Containers.KubernetesClustersClient
subscriptionId := meta.(*clients.Client).Account.SubscriptionId
ctx, cancel := timeouts.ForRead(meta.(*clients.Client).StopContext, d)
defer cancel()
id := managedclusters.NewManagedClusterID(subscriptionId, d.Get("resource_group_name").(string), d.Get("name").(string))
resp, err := client.Get(ctx, id)
if err != nil {
if response.WasNotFound(resp.HttpResponse) {
return fmt.Errorf("%s was not found", id)
}
return fmt.Errorf("retrieving %s: %+v", id, err)
}
profileId := managedclusters.NewAccessProfileID(subscriptionId, d.Get("resource_group_name").(string), d.Get("name").(string), "clusterUser")
profile, err := client.GetAccessProfile(ctx, profileId)
if err != nil {
return fmt.Errorf("retrieving Access Profile for %s: %+v", id, err)
} It looks like your sp do has permission to query cluster's information, but when it tried to get AccessProfile it failed. Unlike azcli, Terraform provider must query more information to make data source complete. Personally I won't consider it as a bug since it's by design (please correct me if HashiCorp has different opinion). A workaround I can provide is we store these K8s credential data into a KeyVault, then assign sp2 permission to read secret and certificate from it. |
According to root cause analysis by @browley86, terraform azurerm provider queries the wrong (and soon to be deprecated API) which I don't think its "by design". Following the detailed analysis and writedown. This belongs in the terraform-provider-azurerm according to @browley86 because there is the code for the API query. Thats why I moved this issue and linked the original issue. Please read here: "I just wanted to shed a bit more light on the issue, the TLDR is that Terraform is calling a soon-to-be deprecated API. More specifically, based off the error message, the endpoint is calling: https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ContainerService/managedClusters/{resourceName}/accessProfiles/{roleName}/listCredential which the link above calls out will soon be deprecated and to use either the ListClusterUserCredentials or the ListClusterAdminCredentials API. While normally the "soon-to-be deprecated" will imply time to update the underlying APIs, the issue is that newer Azure Service Principal permissions are going to be scoped against the non-deprecated API which means that newer workflows with newly created service principals, such as calling the Azure az cli, will work and Terraform will fail unless the service principal was created with permissions scoped to the old API. I wanted to post the steps to re-crate but I really struggled to get curl running. Finally a found a post the illustrates az rest and hitting endpoints so I did the following" Also the tf code is according to the documentation. But it is not working as documented. Therefore I think it would be considered a bug. |
@slzmruepp I got your point, I'll try to reproduce this issue on my side, if everything goes smoothly, I'll try a fix. |
Hi @slzmruepp, I've did some study on Aks's permission. The role you've assigned to your sp2 is {
"id": "/providers/Microsoft.Authorization/roleDefinitions/3498e952-d568-435e-9b2c-8d77e338d7f7",
"properties": {
"roleName": "Azure Kubernetes Service RBAC Admin",
"description": "Lets you manage all resources under cluster/namespace, except update or delete resource quotas and namespaces.",
"assignableScopes": [
"/"
],
"permissions": [
{
"actions": [
"Microsoft.Authorization/*/read",
"Microsoft.Resources/subscriptions/operationresults/read",
"Microsoft.Resources/subscriptions/read",
"Microsoft.Resources/subscriptions/resourceGroups/read",
"Microsoft.ContainerService/managedClusters/listClusterUserCredential/action"
],
"notActions": [],
"dataActions": [
"Microsoft.ContainerService/managedClusters/*"
],
"notDataActions": [
"Microsoft.ContainerService/managedClusters/resourcequotas/write",
"Microsoft.ContainerService/managedClusters/resourcequotas/delete",
"Microsoft.ContainerService/managedClusters/namespaces/write",
"Microsoft.ContainerService/managedClusters/namespaces/delete"
]
}
]
}
} I've tried to reproduce this issue on my side but I've got error message I've used the following code to reproduce this issue, would you please help me by correcting my code so I can reproduce the error? Thanks: variable "client_id" {
default = ""
}
variable "client_secret" {
default = ""
}
variable "subscription_id" {
default = ""
}
variable "tenant_id" {
default = ""
}
provider "azurerm" {
features {}
client_id = var.client_id
client_secret = var.client_secret
subscription_id = var.subscription_id
tenant_id = var.tenant_id
}
provider "azuread" {
client_id = var.client_id
client_secret = var.client_secret
tenant_id = var.tenant_id
}
resource "azurerm_resource_group" "example" {
name = "f-20843-1"
location = "West Europe"
}
resource "azurerm_kubernetes_cluster" "example" {
name = "20843-aks1"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
dns_prefix = "exampleaks1"
role_based_access_control_enabled = true
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2_v2"
}
identity {
type = "SystemAssigned"
}
}
resource "azuread_application" "temp_aks" {
display_name = "temp_aks1"
}
resource "azuread_application_password" "example" {
application_object_id = azuread_application.temp_aks.object_id
}
resource "azuread_service_principal" "sp" {
application_id = azuread_application.temp_aks.application_id
}
provider "kubernetes" {
host = azurerm_kubernetes_cluster.example.kube_config.0.host
client_certificate = base64decode(azurerm_kubernetes_cluster.example.kube_config.0.client_certificate)
client_key = base64decode(azurerm_kubernetes_cluster.example.kube_config.0.client_key)
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.example.kube_config.0.cluster_ca_certificate)
}
resource "kubernetes_namespace" "test1" {
metadata {
name = "test1"
}
}
resource "azurerm_role_assignment" "binding" {
principal_id = azuread_service_principal.sp.object_id
scope = "${azurerm_kubernetes_cluster.example.id}/namespaces/test1"
role_definition_name = "Azure Kubernetes Service RBAC Admin"
depends_on = [kubernetes_namespace.test1]
}
output "azuread_app_id" {
value = azuread_application.temp_aks.application_id
}
output "azuread_application_password" {
sensitive = true
value = azuread_application_password.example.value
} |
Hi, yes it is in the description but I did not copy it in the tf code. I wrote in the Expected behavior section: "(the sp-2 of the tf context has Kubernetes User Role which should allow it to download the certs and auth for acting on the specific namespace.". So the sp-2 also has assigned the following Role (to a Security Group the SP is member of):
So the User Role should allow the sp-2 to fetch the cluster config (kubeconfig) according to the documentation. According to @browley86 this works not with the following (soon to be deprecated) API: https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ContainerService/managedClusters/{resourceName}/accessProfiles/{roleName}/listCredential Here is the code which tests the assumption:
Unfortunately, the terraform-provider-azurerm is using the former API which does not respect IAM Roles.
|
Thanks @slzmruepp for your detailed explanation, I'll try a pr to switch the API, but cannot promise a timeline. |
I added the missing pieces to the comment. So basically we create a group, add the service principal as member, and assign the group the two roles Kubernetes User and Kubernetes RBAC Admin for the namespace. We expect the Roles to be inherited by the group members. (The group would act also as breaking glass group to add human users to it in case that kubectl interactions are necessary) |
Hi @lonegunmanb instead of switching the backend API for the existing resource, it might be easier, faster, and less risky to just add a param to specify the API backend in the provider config. That way, for now, the default would be to keep the existing working |
This functionality has been released in v3.49.0 of the Terraform Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Is there an existing issue for this?
Community Note
Terraform Version
1.3.7
AzureRM Provider Version
3.45.0
Affected Resource(s)/Data Source(s)
data.azurerm_kubernetes_cluster
Terraform Configuration Files
Debug Output/Panic Output
Expected Behaviour
What should have happened?
We want the sp-2 with limited permissions to only be able to see and manage the project namespace for which it has the RBAC Admin rights anyway and only deploy to this namespace kube objects through terraform.
We want the provider configuration to work as documented (the sp-2 of the tf context has Kubernetes User Role which should allow it to download the certs and auth for acting on the specific namespace.
Actual Behaviour
What actually happened?
Despite the sp-2 has the appropriate roles which are verified by using az aks commands and kubectl commands to download kubeconfig and act on the specific namespace it has RBAC Admin role for, the kubernetes provider fails with 403 error.
Steps to Reproduce
(This allows the sp-2 to do everything in its namespace: kubectl list all -n var.aks_proj_ns works, kubectl list all does not work)
This is tested with az login sp-2 and executing kubectl commands in azure pipelines, it works.
If I grant sp-2 Contributor role on the aks-resource group, it works without error, but if we then do:
we get the error kubernetes_namespace.example unauthenticated (or similar)
Only if we than change the provider setup to following:
everything works as expected. But we grant the project sp-2 which should then have limited permissions contributor rights on the aks resource group (which is a no go) and also RBAC admin on the cluster which I don't even know where this comes from, I only suspect that this is inherited from the Contributor role on the resource group.
Important Factoids
This issue was already filed on the kubernetes provider but I was told I should file it here because the azurerm provider creates the bug
References
Linked Issue: #issue-1551239910
The text was updated successfully, but these errors were encountered: