Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

googleapi: Error 400: Invalid request: Failed to delete user root. #3820

Closed
shaulishaked opened this issue Jun 10, 2019 · 24 comments
Closed
Labels
persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work

Comments

@shaulishaked
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.11.13

  • provider.google v2.7.0
  • provider.google-beta v2.7.0

Affected Resource(s)

  • google_sql_user

Terraform Configuration Files

resource "google_sql_database_instance" "<name>-db-instance" {
  name = "<name>-db-instance"
  region = "${var.region}"
  database_version = "POSTGRES_9_6"

  settings {
    tier = "db-f1-micro"
...
  }
}

resource "google_sql_database" "<name>-db" {
  name = "<name>-db"
  instance = "${google_sql_database_instance.<name>-db-instance.name}"
}

resource "google_sql_user" "<name>-db-user" {
  instance = "${google_sql_database_instance.<name>-db-instance.name}"
  name = "root"
  password = "${random_string.<name>-db-password.result}"
}

Panic Output

* google_sql_user.<name>-db-user: Error, failed to deleteuser <name>-db-user in instance <name>-db-instance: googleapi: Error 400: Invalid request: Failed to delete user <name>-user. Detail: pq: role "<name>-db-user" cannot be dropped because some objects depend on it

Expected Behavior

Terraform should destroy the user and database successfully on the first run.

Actual Behavior

Database and user aren't deleted on the first run.
Re-run (with the same code) deletes successfully (sometimes require more than one re-run).

Steps to Reproduce

  1. terraform destroy
  • #0000
@ghost ghost added the bug label Jun 10, 2019
@emilymye
Copy link
Contributor

If you can repro and provide debug logs it helps a lot (i.e. TF_LOG=DEBUG terraform apply) but my guess is that the user owns the created database and the error is happening when the user delete call happens before the database deletion on a given run. Could you confirm this from your terraform logs or see if adding a explicit depends_on = ["google_sql_user.<name>-db-user"] to the database fixes the issue?

@shaulishaked
Copy link
Author

I'm trying to re-produce while running with the debug enabled.
Since it's not consistent behavior, it's harder.

(By the way, I tried upgrading Terraform to 0.12.1 and the Google provider, and the same thing happened).

I will attach with debug enabled as soon as it happens again.

Thanks,

@shaulishaked
Copy link
Author

Added logs from Terraform.

I tried to remove all the resources that are not relevant in the log.

Hope this will help.

terraform_log.log

@ghost ghost removed the waiting-response label Jun 16, 2019
@emilymye emilymye self-assigned this Jun 17, 2019
@emilymye
Copy link
Contributor

Hi @shaulishaked, could you actually give the full logs, if possible? The debug logs you gave are missing some of the HTTP requests and logs I might need to debug.

If you could also give the config you are using or which users and databases are linked to which instances, that would be helpful.

@shaulishaked
Copy link
Author

Hello @emilymye,

Could I send you the full logs and the config in private?
I prefer not to share it publicly.

Thanks again,

@shaulishaked
Copy link
Author

Hi,

Any update on this?

I sent the logs I could share, and I can send full logs privately.
Upgrading both Terraform and the Provider didn't help.

We are unable to destroy a setup in one run and require to run Terraform twice, which isn't reliable.

We still get sporadic errors of:

Error: Error, failed to deleteuser root in instance server-db-instance: googleapi: Error 400: Invalid request: Failed to delete user root. Detail: pq: role "root" cannot be dropped because some objects depend on it
., invalid

Please let me know if any additional info can help.

Thanks,

@lbornov2
Copy link

lbornov2 commented Aug 7, 2019

We have the same problem as well.

terraform --version
Terraform v0.12.5

@emilymye
Copy link
Contributor

Hi @shaulishaked - so sorry for the late reply! If you are still encountering this issue, you can send your logs to me at my google.com address (emilyye).

@JlUgia
Copy link

JlUgia commented Oct 10, 2019

I have encountered this issue too. I think this is the result of dependencies at the DB level, getting to be at conflict with others at the infra level that we define ourselves depending on how we layout our configuration files at Terraform.

At the DB level: Database -> User -> Instance, such that users can't be deleted if there are DBs that depend on them (eg.: they own), and users themselves can't be deleted if there are instances with dependencies to users.

At the same time, DBs will conflict at deletion time if there are resources using them (eg.: existing connections). With that, there is a deletion flow that allows for these dependencies to succeed:
Delete Instances connected to the database instance -> Database -> Users -> DB instance

Now, that's all great, unless one is in need of using the IPs for newly created machines to define the authorized nets / fw rules of the SQL instance. In that case, the machines need to be created first, hence deleted last, breaking the flow above.

One way to circumvent that is to ignore users and DBs at destroy time by taking manually these out of the state (terraform state rm <resource>), but the need to pollute a pipeline with custom code like that is far from ideal.

Without getting into more fundamental discussions of who has the responsibility to provision infra and DB schemas, one pattern that I find appealing is to have Terraform take care of the infra, which includes the creation of an implicit root / admin / base user and use SW provisioning technologies (Spinnaker, Ansible, [local-remote]-exec) to take care of the rest. Cloud SQL already includes a default postgres user, though to set its password it is necessary to create an additional google_sql_user resource (that brings back the dependency issues at destroy time mentioned above).

A solution to this would be to add the ability to set the default password for the instance on the google_sql_database_instance resource itself, such that, creating and destroying the admin user and password is tightly bound to the provisioning of the instance, and hence something that happens synchronously in one block.

@shaulishaked
Copy link
Author

@emilymye can what @JlUgia wrote help you? Can you work with that?

I'll try to reproduce again and send you the logs once this happens on my setup again.

Thanks,

@jclarksnps
Copy link

jclarksnps commented Jan 6, 2020

We're having lots of problems with the sql instance relating to the above feedback, basically when I say destroy sql instance, i expect it to be fully torn down (I don't care if there are connections open or if a user owns a database they shouldn't etc.. ) . Destroy means destroy... and from what I see, it currently does not because of the intertwined dependencies. Basically delete sql instance should be equivalent to: gcloud sql instances delete INSTANCE_NAME --quiet

Our config for our modules have the database instance -- users (depend on instance) -- databases (depend on users) FYI. I've seen random failures deleting the sql instance, some have come from a dev creating a new database with the user, and when terraform tries to teardown it fails as the user owns another database. If the database instance is being destroyed, i dont see why it would matter what the user owns :)

@emilymye
Copy link
Contributor

emilymye commented Jan 10, 2020

Sorry this fell on the backlog, I think I still need some clarification before I actually start trying to fix this. I feel like there are multiple issues being presented by @JlUgia and since I'm not firsthand using Cloud SQL, please bear with me.

  1. Postgres will prevent deletion of a user if a database depends on it, so deletion order must be enforced from database --> user. Terraform doesn't inherently have a dependency between user and database, so in some cases it tries to delete the user first and fails. This is solved by adding a dependency for the database on the user (i.e. database depends_on = [ google_sql_user.user ]).

  2. Cloud SQL will prevent deletion of User if specifically the only root for a DBInstance? Because, google_sql_user relies on google_sql_dbinstance, Terraform when building the dependency graph (i.e. something managed by core, we can't change in the provider) forces the deletion of users before instances, but SQL demands you delete the instance (and the user will just get deleted during that). The real issue is then that we force you to create a new root user, i.e. delete the root user on creation of an master instance. This was because up until around summer of 2019, we couldn't set root_password on an instance and would have to modify the user, so rather than do this implicitly we just had the user do it on their own and make it one operation (i.e. apply instead of import post instance-creation). That means, the solution for this problem is to (as @JlUgia suggested) add a root_password field (currently in the beta-provider only) to DB Instance so TFers no longer have to create a separate sql_user for this purpose. On our end, we'll need to change the behavior to allow for root user to continue to exist

  3. Here's where I got a bit lost:

Now, that's all great, unless one is in need of using the IPs for newly created machines to define the authorized nets / fw rules of the SQL instance. In that case, the machines need to be created first, hence deleted last, breaking the flow above.

Is this related to your issue @shaulishaked? This sounds like something I'm not sure how to manage this in Terraform, both because it's not an inherent dependency link in Terraform (or is it?) and becuase I'm not the most familiar with the proper deletion order for networking Cloud SQL instances. I guess I'm really confused by what this means. Do you have example config or do you mind giving a explaination, @JlUgia?

@shaulishaked
Copy link
Author

Hi,

  1. Explicit dependencies:
    I tried adding an explicit dependency. It did not resolve the issue. Probably because the error is related to "root" user, and not the user I created myself.

  2. root_password field:
    I am using Google Postgres SQL. root_password is ignored in Postgres SQL: https://www.terraform.io/docs/providers/google/r/sql_database_instance.html#root_password.

I agree with @jclarksnps with the saying that "Basically delete sql instance should be equivalent to: gcloud sql instances delete INSTANCE_NAME --quiet". For now, I need to do pre-destroy commands manually instead of relying on the Terraform module, which I really want to remove.

Thanks again,

@JlUgia
Copy link

JlUgia commented Jan 13, 2020

I think it's two separate issues we are discussing.

  1. @shaulishaked's issue relates to the fact that, when using Terraform, attempting to delete a Cloud SQL user fails if there are other objects making use of that user. I think I saw this issue in the past, and IIRC, it's not related to the instance but to the database objects created in Terraform. Have you tried adding a dependency to your databases such you users can only be deleted once the DBs have been destroyed (on your google_sql_database, add a depends_on = [ google_sql_user.user ])?
    We decided to remove any intra-application/service logic, like DB creation and management (google_sql_database), away from Terraform. After all, TF shines on infra/resource provisioning, and one can argue that databases inside of a database engine or instance is something beyond that purpose.
    In any case, @jclarksnps's point stands IMO: deleting a Cloud SQL instance probably should not look at application or software level dependencies (eg.: databases, connections, users), but only infrastructure related ones (eg.: IPs).
    Said so, the problem may automagically get resolved for those using the new root_password field introduced in the beta provider, since this will remove the dependency responsibility away from the TF user.

  2. The issue I experienced is also related to dependencies preventing deletion of a Cloud SQL instance, although in my case, it was VMs with active connections to the DB creating the dependency issue. @jclarksnps's suggestion would help in this case too. I'm happy to go into more detail about this one, though I feel it's beyond what's needed to address this case.

@jclarksnps
Copy link

@JlUgia Yeah our plan is to remove database creation from TF all-together, I agree that it does not belong there at all. I did try adding dependencies, although not reverse from the instance to the users. Most of my issues have been someone using the user to create another db or having connections open, which im not sure the dependencies would fix (unless I'm missing something)?

@JlUgia
Copy link

JlUgia commented Jan 14, 2020

That's right per my understanding too. If there are other objects or connections open or created outside of the scope of Terraform, that'd conflict with the deletion of the resource.

@kenmazaika
Copy link

kenmazaika commented Apr 9, 2020

Any update on this? I'm running into this issue, too.

It seems the best option is to manually do terraform state rm on the postgres user outside of the terraform configuration.

Seems like a hack.

I tried to create a remote-exec or local-exec on the google_sql_user to automatically trigger the action when destroyed but it didn't seem to work.

IMO a good solution to this would be to allow me (the user) to execute custom SQL on destroy for the user.

Can't imagine terraform would be able to handle all use cases, but if we could drop into SQL to manually do what we need to during this sequence it would be extremely helpful.

I think if I could execute SQL that looks like this:

REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM username;
REVOKE ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public FROM username;
REVOKE ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public FROM username;
DROP USER username;

It could work, but would need some sort of SQL hook.

@jclarksnps
Copy link

@kenmazaika that's what we did, we remove the state and we actually use gcloud to tear the instance down.

@kenmazaika
Copy link

@jclarksnps Thanks for the update!

How do you hook into SQL and drop into the database? I think that local-exec to call terraform commands directly causes lock errors.

Did you just make a separate script to combine the terraform destroy and the manual SQL?

Thanks for the reply! Really appreciate it!

@jclarksnps
Copy link

@kenmazaika we actually use gcloud commands to tear the instance down, prior to that call we use terraform to remove the state for the db and its databases and users. We basically gave up on terraform doing the teardown. So in our bash that calls terraform (we use concourse pipelines), we first use gcloud to delete the instance and then we remove all the state, then we go about regular TF activities.

@JlUgia
Copy link

JlUgia commented May 2, 2020

FWIW, I'm taking the same approach as @kenmazaika, force removing the user from the state prior to destroying the Cloud SQL instance.

@emilymye emilymye added persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work and removed bug labels Jun 16, 2020
@emilymye emilymye removed their assignment Jun 16, 2020
@bharathkkb
Copy link

I was able to fix a similar issue where I was receiving

Error: Error, failed to deleteuser foo in instance foo-db: 

based on comment #3820 (comment) part 1 by explicitly enforcing dependency.

@emilymye
Copy link
Contributor

It seems like this issue is now related to many different things. The original issue seems to be related to adding explicit dependencies and managing the root user in Terraform (given that Terraform should delete the root user upon instance creation unless on a replica instance). I'm going to close this issue and ask that @JlUgia and other folks open a new issue to discuss ways in which we can manage , which is difficult - as you may have figured out, there isn't a way Terraform can force destroy objects that Terraform doesn't know about and I'm hesistant to make terraform handle SQL commands or similar.

@emilymye emilymye removed their assignment Jun 26, 2020
@ghost
Copy link

ghost commented Jul 27, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Jul 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work
Projects
None yet
Development

No branches or pull requests

7 participants