Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE autopilot is always created with default service account II #9505

Comments

@tSte
Copy link

tSte commented Jul 5, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

This is duplicate of #8918 see #8918 (comment) - sorry for creating this, but I don't seem to have rights to re-open the original issue (?) and it doesn't seem to be any activity there.

@tSte tSte added the bug label Jul 5, 2021
@tSte
Copy link
Author

tSte commented Aug 11, 2021

@slevenick is there any update on this?

@slevenick
Copy link
Collaborator

I'm not sure how to proceed with this. This bug is due to a weird interaction between autopilot & the default service account field.

Basically, the API is not respecting the request that is sent with the service account. I'm not sure how gcloud is setting up the autopilot cluster with a non-default service account successfully. Can you capture the HTTP requests to see if that is happening in a single request, or if there is a later update to apply the service account?

@lrk
Copy link

lrk commented Sep 6, 2021

Hi, i run into the same problem.

@slevenick Is there any update on this subject ?

Best regards.

@tSte
Copy link
Author

tSte commented Sep 14, 2021

Sorry for late answer @slevenick - I was on vacation...

I executed this:

gcloud container --project "hmplayground" clusters create-auto "my-cluster" --region "europe-west3" --release-channel "regular" --network "projects/hmplayground/global/networks/my-vpc" --subnetwork "projects/hmplayground/regions/europe-west3/subnetworks/my-subnet" --cluster-secondary-range-name="my-pods" --services-secondary-range-name="my-services" --enable-master-authorized-networks --enable-private-nodes --master-ipv4-cidr="172.16.0.16/28" --service-account="my-gke-sa@hmplayground.iam.gserviceaccount.com" --scopes="logging-write,monitoring,storage-ro" --log-http

This is the request:

==== request start ====
uri: https://container.googleapis.com/v1/projects/hmplayground/locations/europe-west3/clusters?alt=json
method: POST
== headers start ==
b'X-Goog-User-Project': b'hmplayground'
b'accept': b'application/json'
b'accept-encoding': b'gzip, deflate'
b'authorization': --- Token Redacted ---
b'content-length': b'926'
b'content-type': b'application/json'
b'user-agent': b'google-cloud-sdk gcloud/344.0.0 command/gcloud.container.clusters.create-auto invocation-id/9db76483e82c490f9d34ad2fdffeda72 environment/None environment-version/None interactive/True from-script/False python/3.9.7 term/xterm-256color (Linux 5.13.13)'
== headers end ==
== body start ==
{"cluster": {"autopilot": {"enabled": true}, "ipAllocationPolicy": {"clusterSecondaryRangeName": "my-pods", "createSubnetwork": false, "servicesSecondaryRangeName": "my-services", "useIpAliases": true}, "masterAuthorizedNetworksConfig": {"enabled": true}, "name": "my-cluster", "network": "projects/hmplayground/global/networks/my-vpc", "nodePools": [{"config": {"oauthScopes": ["https://www.googleapis.com/auth/devstorage.read_only", "https://www.googleapis.com/auth/logging.write", "https://www.googleapis.com/auth/monitoring"], "serviceAccount": "my-gke-sa@hmplayground.iam.gserviceaccount.com"}, "initialNodeCount": 1, "name": "default-pool"}], "privateClusterConfig": {"enablePrivateNodes": true, "masterIpv4CidrBlock": "172.16.0.16/28"}, "releaseChannel": {"channel": "REGULAR"}, "subnetwork": "projects/hmplayground/regions/europe-west3/subnetworks/my-subnet"}, "parent": "projects/hmplayground/locations/europe-west3"}
== body end ==
==== request end ====
---- response start ----
status: 200
-- headers start --
-content-encoding: gzip
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
cache-control: private
content-length: 446
content-type: application/json; charset=UTF-8
date: Tue, 14 Sep 2021 14:03:39 GMT
server: ESF
transfer-encoding: chunked
vary: Origin, X-Origin, Referer
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-xss-protection: 0
-- headers end --
-- body start --
{
  "name": "operation-1631628219731-15754d1b",
  "zone": "europe-west3",
  "operationType": "CREATE_CLUSTER",
  "status": "RUNNING",
  "selfLink": "https://container.googleapis.com/v1/projects/306799302406/locations/europe-west3/operations/operation-1631628219731-15754d1b",
  "targetLink": "https://container.googleapis.com/v1/projects/306799302406/locations/europe-west3/clusters/my-cluster",
  "startTime": "2021-09-14T14:03:39.731893675Z"
}

-- body end --
total round trip time (request+response): 4.417 secs
---- response end ----

@cvega77
Copy link

cvega77 commented Oct 25, 2021

hi, I ran into the same issue, not being able to assign a custom service account to an autopilot gke cluster with terraform v1.0.1.

@slevenick Is there any update on this subject?

Regards,
C.

@ngarv
Copy link

ngarv commented Nov 24, 2021

Hi,
Any update about this bug?
I need to create an autopilot cluster with a custom service account.
With a gcloud command it's working.
I understand that the API used by terraform is different from the one used by gcloud, is it right?
With the last terraform version I still have this issue.
Regards

@tSte
Copy link
Author

tSte commented Dec 1, 2021

@nilsoulinou I created GKE cluster via gcloud CLI and terraform imported into configuration. This works.

@venkykuberan @slevenick is this still considered active?

@lrk
Copy link

lrk commented Dec 6, 2021

@tSte are you saying that you can't create GKE in autopilot mode with a non default service account directly with google provider and you have to create it with gcloud command then import it with terraform ?

If yes, i think this issue is still active because i expect it to be performed with terraform and not having to do manual steps.

@tSte
Copy link
Author

tSte commented Dec 6, 2021

@lrk you're right - all of our clusters are currently created via gcloud CLI and ten imported and managed via TF.

@sandy-0007
Copy link

sandy-0007 commented Jan 27, 2022

Are there any updates to this thread, on the ability to use non default SA to provision a Autopilot GKE?

@cagataygurturk
Copy link
Contributor

cagataygurturk commented Feb 14, 2022

The issue occurs because Terraform is using a deprecated field to set up the service account while the API no longer respects this field when the cluster type is Autopilot.

The following payload to the API will create the cluster succesfully:

{
    "cluster": {
        "autopilot": {
            "enabled": true
        },
        "binaryAuthorization": {
            "enabled": false
        },
        "ipAllocationPolicy": {
            "clusterSecondaryRangeName": "cluster-1",
            "servicesSecondaryRangeName": "service-1",
            "useIpAliases": true
        },
        "legacyAbac": {
            "enabled": false
        },
        "maintenancePolicy": {
            "window": {}
        },
        "masterAuthorizedNetworksConfig": {
            "cidrBlocks": [
                {
                    "cidrBlock": "172.16.0.0/16"
                }
            ],
            "enabled": true
        },
        "name": "gke-cluster",
        "network": "projects/network-host-0372/global/networks/production",
        "networkConfig": {
            "datapathProvider": "ADVANCED_DATAPATH",
            "enableIntraNodeVisibility": true
        },
        "nodePools":[
         {
            "config":{
               "oauthScopes":[
                  "https://www.googleapis.com/auth/devstorage.read_only",
                  "https://www.googleapis.com/auth/logging.write",
                  "https://www.googleapis.com/auth/monitoring"
               ],
               "serviceAccount":"gke-cluster@nonprod-2c64.iam.gserviceaccount.com"
            },
            "initialNodeCount":1,
            "name":"default-pool"
         }
        ],
        "privateClusterConfig": {
            "enablePrivateEndpoint": true,
            "enablePrivateNodes": true,
            "masterGlobalAccessConfig": {
                "enabled": true
            },
            "masterIpv4CidrBlock": "10.128.65.0/28"
        },
        "shieldedNodes": {
            "enabled": true
        },
        "subnetwork": "projects/network-host-0372/regions/europe-west3/subnetworks/node-1"
    }
}

However, Terraform generates the following payload:

{
 "cluster": {
  "autopilot": {
   "enabled": true
  },
  "binaryAuthorization": {
   "enabled": false
  },
  "ipAllocationPolicy": {
   "clusterSecondaryRangeName": "cluster-1",
   "servicesSecondaryRangeName": "service-1",
   "useIpAliases": true
  },
  "legacyAbac": {
   "enabled": false
  },
  "maintenancePolicy": {
   "window": {}
  },
  "masterAuthorizedNetworksConfig": {
   "cidrBlocks": [
    {
     "cidrBlock": "172.16.0.0/16"
    }
   ],
   "enabled": true
  },
  "name": "gke-cluster",
  "network": "projects/network-host-0372/global/networks/production",
  "networkConfig": {
   "datapathProvider": "ADVANCED_DATAPATH",
   "enableIntraNodeVisibility": true
  },
  "nodeConfig": {
   "oauthScopes": [
    "https://www.googleapis.com/auth/monitoring",
    "https://www.googleapis.com/auth/devstorage.read_only",
    "https://www.googleapis.com/auth/logging.write"
   ],
   "serviceAccount": "gke-cluster@nonprod-2c64.iam.gserviceaccount.com"
  },
  "privateClusterConfig": {
   "enablePrivateEndpoint": true,
   "enablePrivateNodes": true,
   "masterGlobalAccessConfig": {
    "enabled": true
   },
   "masterIpv4CidrBlock": "10.128.65.0/28"
  },
  "shieldedNodes": {
   "enabled": true
  },
  "subnetwork": "projects/network-host-0372/regions/europe-west3/subnetworks/node-1"
 }
}

The difference between these two is, the former is using the nodeConfig property, which is already deprecated, and the latter is using nodePools.config. Apparently Autopilot does not recognise the deprecated property, although this is is not documented.

Perhaps Terraform provider should get away from the deprecated property to avoid not only this one but also other any future issues @slevenick. There is already TODO item here for that :)

@cagataygurturk
Copy link
Contributor

Thinking about this a little bit more, I believe the API should not simply ignore the field although it is deprecated. I have also created an issue https://issuetracker.google.com/issues/219237911. Impacted people may consider starring the issue.

@rileykarson
Copy link
Collaborator

@slevenick: Updating assignment because I think this has gone inactive, please correct this if you're still working on it!

Perhaps Terraform provider should get away from the deprecated property to avoid not only this one but also other any future issues @slevenick. here is already TODO item here for that :)

The TODO in that file was for another tool that the MM generator used to be used for- Terraform's implementation is handwritten. #7185 and #4963 (roughly) track potential removal of the field. We haven't gone forward with it because of the projected impact- requiring users to rewrite configs, and recreating their clusters if they get it wrong- and the lack of signal from the API that they'll actually remove the field.

The API respecting the service account in one case and not the other is confusing and frustrating as both those messages should have created the same cluster- thanks for filing upstream. I think there's a workaround in the provider today, luckily, as you should be able to create clusters with node_pools set. We're passing the message directly on to the API, and the transformation to config highlighted in #4963 (comment) should be possible to produce a working payload.

@cagataygurturk
Copy link
Contributor

Hi all, the underlying API issue seems to be resolved according to here:

https://issuetracker.google.com/issues/219237911#comment3

If someone can confirm that on Terraform side this also fixed the issue, then this one can be closed.

@ngarv
Copy link

ngarv commented Mar 24, 2022

Hi all,
I'm new in this community. It seems that the bug has been fixed. Could you tell me the terraform release version or google provider version to use, in order to perform the test with a custom SA for GKE autopilot?

Regards

Nils

@ngarv
Copy link

ngarv commented Mar 29, 2022

Hi,
I still have the default service account attached to the GKE Cluster with these versions:

Terraform v1.1.7
on linux_amd64

  • provider registry.terraform.io/hashicorp/google v4.15.0
  • provider registry.terraform.io/hashicorp/google-beta v4.15.0

with the following terraform block:

resource "google_container_cluster" "private" {
 name                     = "XXXXX"
 location                 = var.region

 network                  = google_compute_network.xxxx.id
 subnetwork               = google_compute_subnetwork.xxxx.id

 node_config {
   service_account = google_service_account.yyy.email
   oauth_scopes    = [
     "https://www.googleapis.com/auth/cloud-platform"
   ]
 }

 private_cluster_config {
   enable_private_endpoint = true
   enable_private_nodes    = true
   master_ipv4_cidr_block  = "XXX.XXX.XXX.XXX/28"
 }

 master_authorized_networks_config {
   cidr_blocks {
     cidr_block = "XXX.XXX.XXX.XXX/24"
     display_name = "xxxx" 
   }
   cidr_blocks {
     cidr_block = "XXX.XXX.XXX.XXX/16"
     display_name = "xxxx" 
   }
 }

 # Enable Autopilot for this cluster
 enable_autopilot = true

 vertical_pod_autoscaling {
   enabled = true
 }
 # Configuration of cluster IP allocation for VPC-native clusters
 ip_allocation_policy {
   cluster_ipv4_cidr_block  = "XXX.XXX.XXX.XXX/16"
   services_ipv4_cidr_block = "XXX.XXX.XXX.XXX/24"
 }

 # Configuration options for the Release channel feature, which provide more control over automatic upgrades of your GKE clusters.
 release_channel {
   channel = "REGULAR"
 }
}

Should I need additional informations?

Nils

@cagataygurturk
Copy link
Contributor

If you feel the issue was not fixed, please drop a comment to https://issuetracker.google.com/issues/219237911#comment3

@deekthesqueak
Copy link

deekthesqueak commented May 30, 2022

I've recently run into this issue myself. Below are my findings

Terraform v1.1.5
on darwin_amd64

  • provider registry.terraform.io/hashicorp/external v2.2.2
  • provider registry.terraform.io/hashicorp/google v4.22.0
  • provider registry.terraform.io/hashicorp/google-beta v4.22.0
  • provider registry.terraform.io/hashicorp/kubernetes v2.11.0
  • provider registry.terraform.io/hashicorp/null v3.1.1
  • provider registry.terraform.io/hashicorp/random v3.2.0

Like in #9505 (comment) I noticed the payload that was being generated for a new autopilot cluster was the following:

POST /v1beta1/projects/{project_id}/locations/us-west1/clusters?alt=json&prettyPrint=false HTTP/1.1
Host: container.googleapis.com
...

{
 "cluster": {
  "addonsConfig": {
   "horizontalPodAutoscaling": {
    "disabled": false
   },
   "httpLoadBalancing": {
    "disabled": false
   }
  },
  "autopilot": {
   "enabled": true
  },
  "binaryAuthorization": {
   "enabled": false
  },
  "ipAllocationPolicy": {
   "clusterSecondaryRangeName": "network-pods",
   "servicesSecondaryRangeName": "network-services",
   "useIpAliases": true
  },
  "legacyAbac": {
   "enabled": false
  },
  "locations": [
   "us-west1-a",
   "us-west1-b",
   "us-west1-c"
  ],
  "loggingService": "logging.googleapis.com/kubernetes",
  "maintenancePolicy": {
   "window": {
    "dailyMaintenanceWindow": {
     "startTime": "05:00"
    }
   }
  },
  "masterAuth": {
   "clientCertificateConfig": {}
  },
  "masterAuthorizedNetworksConfig": {},
  "monitoringService": "monitoring.googleapis.com/kubernetes",
  "name": "us-west1-dev-autopilot-test",
  "network": "projects/{project_id}/global/networks/anthos-network",
  "networkConfig": {
   "defaultSnatStatus": {
    "disabled": false
   },
   "enableIntraNodeVisibility": true
  },
  "nodeConfig": {
   "oauthScopes": [
    "https://www.googleapis.com/auth/devstorage.read_only",
    "https://www.googleapis.com/auth/logging.write",
    "https://www.googleapis.com/auth/monitoring",
    "https://www.googleapis.com/auth/service.management.readonly",
    "https://www.googleapis.com/auth/servicecontrol",
    "https://www.googleapis.com/auth/trace.append"
   ]
  },
  "notificationConfig": {
   "pubsub": {}
  },
  "releaseChannel": {
   "channel": "REGULAR"
  },
  "shieldedNodes": {
   "enabled": true
  },
  "subnetwork": "projects/{project_id}/regions/us-west1/subnetworks/anthos-subnet",
  "verticalPodAutoscaling": {
   "enabled": true
  }
 }
}

Looking at the documentation to create a cluster at [1] it lists the command to be used as

gcloud container clusters create-auto CLUSTER_NAME \
    --region REGION \
    --project=PROJECT_ID

So that means that the Terraform provider using the default cluster creation API [2] that doesn't list any flags to specify autopilot when it should be using [3] instead. I've verified that using the following command will create an Autopilot cluster with a correct service account.

gcloud container --project {project_id} clusters create-auto autopilot-test \
--region=us-west1 \
--release-channel=regular \
--service-account=cluster-admin@{project_id}.iam.gserviceaccount.com \
--network=test-network \
--subnetwork=test-subnet \
--cluster-secondary-range-name=network-pods \
--services-secondary-range-name=network-services 

While I see that there is discussion of a deprecation at [4] it seems like a quicker solution may to use the API specified in [3] which currently works.

[1] https://cloud.google.com/kubernetes-engine/docs/how-to/creating-an-autopilot-cluster#gcloud
[2] https://cloud.google.com/sdk/gcloud/reference/container/clusters/create
[3] https://cloud.google.com/sdk/gcloud/reference/container/clusters/create-auto
[4] https://issuetracker.google.com/issues/219237911?pli=1

modular-magician added a commit to GoogleCloudPlatform/terraform-validator that referenced this issue Nov 12, 2022
)

fixes hashicorp/terraform-provider-google#9505

Signed-off-by: Modular Magician <magic-modules@google.com>

Signed-off-by: Modular Magician <magic-modules@google.com>
@mgoodness
Copy link

mgoodness commented Nov 16, 2022

I don't think this is fixed. I've built the provider with #13024 and am trying to provision an Autopilot cluster. We'd previously deleted the default GCE SA from the project entirely, and get

Error: googleapi: Error 400: Service account "-compute@developer.gserviceaccount.com" does not exist., badRequest

even when specifying a custom SA.

@shuyama1
Copy link
Collaborator

shuyama1 commented Nov 16, 2022

Hey @mgoodness! Would you mind providing the terraform config so that we can use to reproduce the error and also the debug log if possible?

@mgoodness
Copy link

Hey @mgoodness! Would you mind providing the terraform config so that we can use to reproduce the error and also the debug log if possible?

Sure thing! Let me know if there's anything else I can try/share.

Archive.zip

@JeremyOT
Copy link

Thanks @mgoodness - looking into this, I don't think it's related to the TF implementation. There may be a dependency issue at initialization. If the default SA exists, you should still see your workloads scheduled on nodes using your provided SA but we may still be looking for the default anyway.

@mgoodness
Copy link

@JeremyOT So more of a GCP API issue? Worth opening a ticket with them...somewhere?

I should note that we are able to provision non-Autopilot clusters using (essentially) the same TF config, even with the missing default SA. Seems to only be an issue with AP.

@johanneswuerbach
Copy link

johanneswuerbach commented Nov 21, 2022

FYI: This fix was just released as part of https://github.com/hashicorp/terraform-provider-google/releases/tag/v4.44.0 🎉

Sadly now the following tf config is accepted, but returns an API error:

resource "google_container_cluster" "my_autopilot_cluster" {
  ## other config

  enable_autopilot = true

  cluster_autoscaling {
    auto_provisioning_defaults {
      service_account = google_service_account.my_account.email
    }
  }
}

googleapi: Error 400: Overriding Autopilot autoscaling settings is not allowed.

Is the config wrong?

@kev-in047
Copy link

kev-in047 commented Nov 25, 2022

@johanneswuerbach I'm getting the same error you're, with a similar config. Did you manage to find a fix?

@johanneswuerbach
Copy link

Sadly not, I think this needs to be reopened @shuyama1

@JeremyOT
Copy link

This appears to be a server-side issue. The TF config passes the correct parameters to the backend, but the initial nodes created at bootstrapping still use the default SA. New nodes created as workloads are added do use the supplied SA, but this is preventing proper startup when the default SA is deleted. A fix is in progress.

@diegosucaria
Copy link

diegosucaria commented Dec 2, 2022

Sadly not, I think this needs to be reopened @shuyama1

+1 here. Using beta-autopilot-private-cluster, I added the block:

cluster_autoscaling {
    auto_provisioning_defaults {
      service_account = google_service_account.my_account.email
    }
  }

I re-deployed the cluster and the new nodes are still being created with the default service account.
Using provider beta 4.44.1

@JeremyOT
Copy link

JeremyOT commented Dec 2, 2022

new nodes are still being created with the default service account.

@diegosucaria (nice pilot pic :) ) this is what I was referring to in my comment above. If you deploy workloads and additional nodes are created, they should use your supplied SA - I just verified with both google and google-beta at 4.44.1.

@TrieBr
Copy link

TrieBr commented Dec 5, 2022

@diegosucaria I think that is a problem in the module. I filed terraform-google-modules/terraform-google-kubernetes-engine#1488

It seems like their module config (cluster.tf) doesn't even set the service account anywhere.

@diegosucaria
Copy link

Yes, that is correct. I had to do a local copy and added the cluster_autoscaling block in cluster.tf manually.

I still cannot get the new nodes to use the non-default service account. (new nodes for my workloads)

It is not a critical problem, but it goes against the best practices we recommend

@bharathkkb
Copy link

@diegosucaria If you have bandwidth, happy to review a PR fixing this in the module too. Some context in terraform-google-modules/terraform-google-kubernetes-engine#1488

@shuyama1 shuyama1 reopened this Dec 6, 2022
@shuyama1 shuyama1 closed this as completed Dec 6, 2022
@shuyama1
Copy link
Collaborator

shuyama1 commented Dec 6, 2022

Looks like it's a module issue now and a ticket is filed against https://github.com/terraform-google-modules/terraform-google-kubernetes-engine. Therefore close this issue (after reopen it). Sorry for the confusion. Please let me know if the issue still occurs in the provider and need to reopen this issue.

@johanneswuerbach
Copy link

johanneswuerbach commented Dec 6, 2022

I don't understand how this resolves #9505 (comment).

Does it mean that you can't change the service account of an existing cluster? Shouldn't this parameter be set to require a recreation in this case?

@rileykarson
Copy link
Collaborator

rileykarson commented Dec 6, 2022

Hey all! This is a little messy, so I checked with @JeremyOT to summarise what's up with this issue.

tl;dr: Specifying a service account through cluster_autoscaling.auto_provisioning_defaults.service_account should work for the default node pool, but currently doesn't. We're waiting on a fix on the server side, at which point provider versions v4.44.0+ should be able to configure this successfully.

The API currently supports specifying service accounts through a few places, and one of those (nodePools.nodeConfig) is what currently works for gcloud in https://cloud.google.com/kubernetes-engine/docs/how-to/creating-an-autopilot-cluster#create_an_autopilot_cluster to provision with a custom account. The top-level nodeConfig field which manages the default pool does not work, however, as it's deprecated in the API and support was not added for it in autopilot.

The method that was unblocked in GoogleCloudPlatform/magic-modules#6733 (& released in v4.44.0) is the path that's recommended by the GKE product team, corresponding to the following Terraform configuration:

resource "google_container_cluster" "my_autopilot_cluster" {
  name = "my-autopilot-cluster"
  location = "us-central1"
  networking_mode = "VPC_NATIVE"
  ip_allocation_policy {}

  enable_autopilot = true

  cluster_autoscaling {
    auto_provisioning_defaults {
      service_account = "my-service-account@my-project.iam.gserviceaccount.com"
    }
  }

As raised in #9505 (comment) it was discovered there was an issue with the server-side implementation- the default node pool continues to be created with the default SA. I can doubly confirm a fix is in progress, but can't speak to an exact timeline in this thread (sorry!).

The method that gcloud uses to specify the SA today doesn't work in our Terraform provider due to a conflicts rule (used to restrict users from setting autopilot-managed settings, where malformed configurations risk creating recreation diffs on clusters). When the conflict is removed, Terraform sees a diff between some unrelated fields, meaning that there's not an extremely simple solution to unblock a workaround in the provider.

Between a server-side fix on the way and a workaround requiring a code change + release + provider upgrade (a best case of a week, but more likely two due to release timing), the best path forward appears to be to wait for the server-side fix to roll out.

@TrieBr
Copy link

TrieBr commented Dec 6, 2022

Just wanted to comment here since I made a previous comment claiming that it wasn't working (that I since deleted).

I'm using the following in my terraform config and I found that it is working as expected as far as I can tell. I'm not sure how to 100% confirm which service account is actually being used on the autopilot nodes (they seem hidden from the GCP console UI), but I was dealing with some permission issues which didn't resolve until I added my-service-account@my-project.iam.gserviceaccount.com to a role (and not the default compute service account), so it seems to be working.

The only downside, is I think you still have to keep the default compute service account active since autopilot requires it in other ways. As JeremyOT mentioned, I think you need to make sure the nodes scale to 0, or rebuild your cluster since only new nodes will be created with the correct service account.

enable_autopilot = true

  cluster_autoscaling {
    auto_provisioning_defaults {
      service_account = "my-service-account@my-project.iam.gserviceaccount.com"
    }
  }

@rileykarson
Copy link
Collaborator

rileykarson commented Dec 6, 2022

Ah, I lost a qualifier at some point when writing my message. I'll edit it back in. My understanding is that the default node pool doesn't respect the setting, but future ones will at the moment. I believe that's the case preventing you from using autopilot w/ the account removed. Once the server-side fix rolls out, the default node pool should respect the setting as well.

@github-actions
Copy link

github-actions bot commented Jan 6, 2023

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 6, 2023
@rileykarson
Copy link
Collaborator

Note: This should have rolled out fully by now, and a configuration like the following will apply to all nodes, including the default node pool:

 cluster_autoscaling {
    auto_provisioning_defaults {
      service_account = "my-service-account@my-project.iam.gserviceaccount.com"
    }
  }

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.