Cleanup, sync and commit actions for Providers #63

mongrelion · 2016-04-20T10:19:26Z

Hi!

The Provider API could use a TeardownFunc method just like it offers a ConfigureFunc method.
Basically, some use cases need to make a final call to some API after the whole Terraform run is done, say, for example, in cases where the client used to communicate with the API does not support token-based authentication but instead user/password combination, and must make a call in the end to kill the session (some systems start to show issues if you leave too many connections open as the timeout is quite long and the developer may not have control whatsoever over that value).
There is another use case for this that I can think of at the moment and that is the Cobbler provider: by the end of creating a bunch of systems you need to make a call to their sync method, which you can call from when you create a resource from within Terraform but this has lead to odd behaviour as Cobbler is not thread safe and making a call to this sync method several times in a short period of time will make things look funky (cobbler/cobbler#1570 & hashicorp/terraform#5969)

Something like this is what I have in mind:

schema.Provider{
  Schema: ...,
  ResourcesMap: ...,
  ConfigureFunc: func (d *schema.ResourceData) ({}interface, error) {},
  TeardownFunc: func (meta {}interface) (error) {},
}

And the teardown method would receive the value previously returned by the ConfigureFunc.

What do you guys think?

The text was updated successfully, but these errors were encountered:

apparentlymart · 2016-04-20T15:36:38Z

This seems like a reasonable idea to me!

In the core Terraform layer we have an interface ResourceProviderCloser which a provider can implement so that Close() will be called on it once it's no longer needed, but as far as I can tell the helper/schema layer doesn't actually implement it and so the real-world providers we have today (which all use the schema helper) don't currently support it.

However, I expect we could extend schema.Provider to implement a Close method in addition to its existing Configure method -- probably with a very similar implementation -- to get what you're suggesting here. We stash a pointer to the meta value inside the provider instance so it should be easy to support the interface you suggested.

I would suggest that the attribute name be CloseFunc rather than TeardownFunc, since the Close method is already named by the layer below and it would be consistent with the existing Configure/ConfigureFunc pairing.

We might consider also exposing the *schema.ResourceData for the config to CloseFunc, in case there are some settings in there that only apply on close. I'm ambivalent about it since the provider could in principle save the relevant settings within the arbitrary meta object, but for many providers meta is just a direct pointer to some object from the underlying client library and so doesn't have a place to stash extra book-keeping information like this.

srinarayanant · 2017-10-09T10:35:10Z

+1
Is the tear down method available in some format ? it will be ideal to login in configure and logout in teardown

mavogel · 2018-01-26T10:41:45Z

@apparentlymart are there any updates or ETA on this issue?
Is it as straightforward as I maybe imagine that due to Graph it is possible to determine the last resource which is created by a provider and then call the CloseFunc after the last call of this resource depending if it was as plan/apply/destroy/show? If yes I'd be keen to contribute to get more into terraform

apparentlymart · 2018-01-28T23:06:39Z

Oh my... this is an old issue! Sorry for the long silence here. Before I respond, I just want to be explicit that my comment above was from long before I was a HashiCorp employee, and was just participating in this project as a community contributor. Therefore when making that comment I didn't have the full context of the long- and short-term product goals.

We've actually ended up returning to this topic recently since we've been working on revamping the provider RPC protocol to work with the new capabilities we get from introducing the improved version of HCL, so while my above comment was long enough ago that I don't have a good sense of what I was thinking about at that time, I do have some context from our recent discussion this week.

A key problem we identified while discussing the possibility of teardown-type actions is how to handle failure. How best to handle failure depends on what sort of operation the provider is attempting to do:

In the case of terminating a stateful session, producing an error that explains to the user how they might manually terminate their session seems like a reasonable outcome, since Terraform will create a new session on the next run anyway and so the current session is effectively abandoned.
For the use-case of performing some sort of "commit" action things are more complex. For any operation that affects the persistent state of remote objects, we always try to retain an accurate picture in Terraform's state file of the outcome of an action, even if it fails partway, so that on the next run Terraform can attempt to retry or conclude any pending operations. If Terraform leaves some changes uncommitted without tracking that, then a subsequent Terraform run may inadvertently apply those changes even though they are not listed in the plan, which violates Terraform's primary promise.

So with that said, we'd concluded that we need to think some more about the design of a feature like this to ensure that it can be safe and explicit in its behavior. We did an initial brainstorm of an idea of making Terraform Core have an explicit representation of actions that need to be "committed" in some way, with these indicated in plan output so the user can see where in the process commit actions may occur. This builds on some other design work we've started around supporting batch requests (discussed in hashicorp/terraform#7388) but is different because our conception of batch requests was as a hidden optimization rather than as something explicitly shown to the user.

We're not able to do detailed research and prototyping on this problem at present because our current focus is on configuration language improvements, but we know there are several providers that would benefit from explicit modelling of a "deploy" or "commit" action (the Fastly provider being a key example, and hypothetical providers for various network equipment similarly) and so do intend to return to this issue at some point.

From reading the linked issues it seems like the Cobbler provider in particular has since been fixed with a workaround of manually syncing after each operation but using a mutex to avoid concurrent syncs. The Fastly provider currently works around this by managing all of the versioned API objects as a single, huge "service" resource. Neither of these approaches are ideal by any means, but hopefully similar patterns can be employed for other providers in the mean time, until we're able to devote more time to a solution in Terraform Core.

Sorry for the delay here and the long lack of response. I'll try to come back here and update this issue again when we're ready to do more in-depth work here; to help us find this again I'm going to change the summary to something more likely to come up in search when we go looking for issues on this topic.

mavogel · 2018-01-29T11:42:39Z

thank you @apparentlymart for your detailed answer. I was too optimistic thinking it could be that straightforward. For the Docker provider is also used a temporary workaround.

shinmog · 2018-05-06T06:07:42Z

I've mentioned this during the Palo Alto Networks panos provider validation process, but we will eventually need commit type functionality for our provider as well. When you guys circle back to this issue, we would definitely like to be involved so we can explain our use case in more depth.

jonshern · 2018-08-28T20:07:03Z

Yeah we are using the Pan OS Provider for Palo Alto and this functionality would be very important to be able to fully automate

phamann · 2018-11-29T13:38:08Z

The Dyn provider would also benefit from such functionality. The Dynect API is stateful and uses a session token which the provider currently creates during the ConfigureFunc by calling the Dyn go clients login method. However, the APIs logout method is never called once the provider has finished thus leaving open sessions in the Dyn API. This causes unforeseen side-effects, for example if you run multiple Terraform runs in CI environments.

paultyng · 2020-04-02T20:28:11Z

The protocol has a Close which we plan to expose in the future. That being said, this will strictly be for cleanup, session logout, etc, not for commit type operations. Terraform's state model doesn't have any concept of a "commit" or what resources are committed, and as far as I know the plans to add this are not on the roadmap. If you need to support a commit model, there are potential resource design changes you could make to make it work, and I'm happy to discuss those on specific providers, but without a significant change to Terraform upstream, I don't think this will be added anytime soon.

jwatte · 2020-04-21T00:38:40Z

That is unfortunate. I, too, have the need for stashing a bunch of changes, and then committing them at the end (which may fail if the whole batch isn't overall consistent.)

The workaround of "one giant synthetic resource that sources everything else" is imperfect at best.

AlexHunterCodes · 2020-08-26T13:13:23Z

I do a lot of work involving DNS records, and it's currently very easy for Terraform to temporarily break these during a large plan because there's no guarantee on how the changes will be scheduled, particularly when these records relate to resources the plan doesn't know about.

From the plan, the start and end states can be reasoned about, but the state in-between while resources are deploying currently cannot. It's also impossible to batch lots of changes together and commit them as one atomic operation, which also causes API rate limiting issues (see #7) in addition to broken infrastructure if a plan fails to apply..

poroping · 2021-09-06T12:38:52Z

I also have a use-case for this with an IOS XE Restconf provider I'm creating. It's not as deal breaking as the PanOS needing to commit changes as the changes are actually implemented on the fly. However 'committing' the running config to nvram (copy run start) is needed after all the changes are made. Doing this after every change is not really workable as it takes a few seconds to complete. For now I will need to create a workaround resource which is not ideal.

bflad · 2022-06-03T19:31:02Z

Drive-by note relating to "commit" behaviors: there have been lengthy discussions on this topic between the Terraform core and providers teams. The main challenges continue to be how to potentially provide configuration for behaviors, how to handle errors during applies, and how practitioners could trigger this behavior without configurations changes.

For this use case, introducing a RPC that always gets called before providers are stopped means that core needs to return some information about any potential errors, so that providers can choose their appropriate action. Otherwise, there are risks of providers performing a "commit" against an undesirable system state. Furthermore, practitioners may wish to control this behavior. Given that provider configurations should not be specified within callable modules, there could be some friction between practitioner desires and real world usability in that scenario.

For this use case, introducing a RPC that only gets called on success means that this RPC loses its effectiveness for use cases such as closing sessions.

The solutions for all of these tends to point towards providers relying on the existing Terraform resource lifecycle model. Resource creation can perform the action, while other actions can be ignored or implemented as appropriately. More succinctly, configuration of behaviors is already well defined (e.g. resource configuration/state is tied well to the resource lifecycle and provider developers have the choice of in-place updates or replacement), error behaviors are well defined (e.g. a failing dependency will prevent the resource from performing its action), and practitioners can trigger resource behaviors manually if necessary (e.g. terraform taint/terraform apply -replace).

The biggest challenge with using a resource then becomes the configuration to properly setup the desired behavior, since every dependency must be explicitly configured. Terraform CLI 1.2 has introduced the lifecycle configuration block replace_triggered_by argument, which can be used to automatically replace a resource if any of the references change.

This means its possible to create something like the following:

resource "example_thing" "example" {}

resource "example_widget" "example" {}

resource "example_commit" "example" {
  lifecycle {
    replace_triggered_by = [
      example_thing.example,
      example_widget.example,
      # ...
    ]
  }
}

If this solution is usable at smaller scales (please let us know if it is!), then this problem space could theoretically shift to thinking about other ways to define "groupings" of resources in a configuration beyond just a module, to help reduce the amount of configuration necessary.

alexandrubordei · 2022-06-03T20:57:44Z

I can't speak for all commit-needing use cases but at least with the MetalSoft provider we ended up applying a similar strategy with a "deployer" resource (see https://registry.terraform.io/providers/metalsoft-io/metalcloud/latest/docs/resources/infrastructure_deployer), working within the limitations of the current model so perhaps that might help answer some of the questions.

Indeed dependencies are a weak point of the model. Unless the users specify all of the depends-on properly you'll miss resource changes which means all users of the provider need to be instructed to do that in their manifests which is of course prone to all sorts of preventable user mistakes. What makes this even more complicated is the fact that you cannot use wildcards. An option we sometime use to simplify things is to put all the dependent resources in a module and the deployer resource on some upper level but it is far from ideal.

Perhaps support for "depends on: all resources" might help or better yet some way to determine the dependencies internally (by specifying classes of resources for example). The need for a "drive" or an "instance" to deploy on the "deployer" would not change regardless of what the user wants so I wouldn't even give the user a choice.

The second issue that we've seen is that the deployer needs to run not just at the initial deploy but also during edit operations of a dependent resource. Since the deployer is not edited it will not be deployed again. We ended up having to force a property to be always "edited" which of course screws up the "plan" when nothing is actually edited but there was no other way at the time. I guess using "replace_triggered_by" instead of "depends on" might help with this issue but it wasn't available when we implemented this strategy.

The third (minor) issue is that this resource is not "natural" from a user's perspective. It feels forced (because of course, it is). All other resources are true resources reflected on the server-side whereas this resource is something that exists only at the terraform level.

So, to cut the long story short I think the answer to the question is that the solution would work since we're already doing it to some degree but if we could get some help to make it more elegant (perhaps: a way to set implicit dependencies, a way to force the deploy of the deployer even if the deployer resource itself is not edited and perhaps the concept of a singleton) it would be brilliant.

apparentlymart changed the title ~~Provider teardown~~ Cleanup, sync and commit actions for Providers Jan 28, 2018

hashibot transferred this issue from hashicorp/terraform Sep 26, 2019

hashibot added the enhancement New feature or request label Oct 2, 2019

radeksimko added the upstream-protocol Requires change of protocol specification, i.e. can't be done under the current protocol label Dec 8, 2019

shinmog mentioned this issue Apr 2, 2020

Add Ability to Commit - Opt-In PaloAltoNetworks/terraform-provider-panos#6

Open

paultyng mentioned this issue Apr 6, 2020

Add customizable Stop to provider #377

Closed

paddycarver mentioned this issue Aug 28, 2020

Clean up provider resource meta once done #132

Closed

iwarapter mentioned this issue May 24, 2021

Support cluster replication iwarapter/terraform-provider-pingfederate#98

Open

bflad mentioned this issue Nov 30, 2021

Systems that need a "commit" step for actions spanning across multiple resources hashicorp/terraform#30045

Open

dominik-lekse mentioned this issue May 11, 2022

Stop context in ConfigureContextFunc is not cancelled in the scope of resource.Test acceptance tests #966

Open

Marwes mentioned this issue Jun 19, 2023

feat: Add the incapsula_abp_websites resource imperva/terraform-provider-incapsula#323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup, sync and commit actions for Providers #63

Cleanup, sync and commit actions for Providers #63

mongrelion commented Apr 20, 2016

apparentlymart commented Apr 20, 2016

srinarayanant commented Oct 9, 2017

mavogel commented Jan 26, 2018 •

edited

Loading

apparentlymart commented Jan 28, 2018

mavogel commented Jan 29, 2018

shinmog commented May 6, 2018

jonshern commented Aug 28, 2018

phamann commented Nov 29, 2018 •

edited

Loading

paultyng commented Apr 2, 2020

jwatte commented Apr 21, 2020

AlexHunterCodes commented Aug 26, 2020 •

edited

Loading

poroping commented Sep 6, 2021

bflad commented Jun 3, 2022

alexandrubordei commented Jun 3, 2022

Cleanup, sync and commit actions for Providers #63

Cleanup, sync and commit actions for Providers #63

Comments

mongrelion commented Apr 20, 2016

apparentlymart commented Apr 20, 2016

srinarayanant commented Oct 9, 2017

mavogel commented Jan 26, 2018 • edited Loading

apparentlymart commented Jan 28, 2018

mavogel commented Jan 29, 2018

shinmog commented May 6, 2018

jonshern commented Aug 28, 2018

phamann commented Nov 29, 2018 • edited Loading

paultyng commented Apr 2, 2020

jwatte commented Apr 21, 2020

AlexHunterCodes commented Aug 26, 2020 • edited Loading

poroping commented Sep 6, 2021

bflad commented Jun 3, 2022

alexandrubordei commented Jun 3, 2022

mavogel commented Jan 26, 2018 •

edited

Loading

phamann commented Nov 29, 2018 •

edited

Loading

AlexHunterCodes commented Aug 26, 2020 •

edited

Loading