Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helper/schema: Custom diff logic support for providers #14887

Merged
merged 33 commits into from
Nov 1, 2017
Merged

helper/schema: Custom diff logic support for providers #14887

merged 33 commits into from
Nov 1, 2017

Conversation

vancluever
Copy link
Contributor

@vancluever vancluever commented May 28, 2017

This is far along enough now that I think putting in a PR is warranted :)

See #8769 for more details!


This adds a new object, ResourceDiff, to the schema package. This object, in conjunction with a function defined in CustomizeDiff in the resource schema, allows for the in-flight customization of a Terraform
diff after the initial diff has been performed.

Use cases for this functionality include:

  • Creating diffs off of computed values alone, for when a state transition requires action on a field that cannot be reasonably controlled via config
  • Leaning on the provider's API as much as possible when said provider has diff logic of its own (ie: Nomad)
  • Vetoing changes to the diff that could not be reasonably carried out (ie: illegal AWS RDS operations) by kicking back an error to the CustomizeDiff function, much like the other CRUD functions.

As part of this work, many internal diff functions have been moved to use a special resourceDiffer interface, to allow for shared functionality between ResourceDiff and ResourceData. This may be extended to the DiffSuppressFunc as well which would restrict use of ResourceData in DiffSuppressFunc to generally read-only fields.

@vancluever vancluever changed the title [WIP] helper/schema: Custom diff logic support for providers helper/schema: Custom diff logic support for providers May 28, 2017
@vancluever
Copy link
Contributor Author

Some notes on destroy logic:

  • DestroyTainted is easy to track as it's already exposed to schemaMap. We have coverage on this.
  • It doesn't look like Destroy (pure destroy) diffs call down the Diff stack at all, so testing for that is ultimately moot.
  • Deposed resources are kinda hard to track. The deposition logic doesn't run until after the diff so I'm not too sure if it's worthwhile tracking if an instance is deposed, or if we just run the diff for the deposed resource anyway. If we need to know, we will probably have to propagate a deposed flag down the stack, and do deposition discovery earlier (ie: move the referenced block to before the diff, store the result in a bool), and flag the resource appropriately after the diff. However, if the resource is a known deposed resource, then I don't don't know if it's really necessary to run a Diff at all - it could just be treated the same as Destroy, with an empty diff created with only the DestroyDeposed flag set, as a destroy diff node currently does.

@vancluever
Copy link
Contributor Author

vancluever commented May 28, 2017

This is now ready for review. What I've done is just removed the unused Destroy flag check for now, adding notes in the latest commit that pretty much mirror what I've said above.

This leaves 2 non-blocking outstanding questions:

  • Whether or not to change deposed logic as outlined above
  • Whether or not to change how DiffSuppressFunc gets access to the resource data, ie: if we want to move this to schema.ResourceDiffer instead. Note that this will be a breaking change - we can update all of the builtin providers, but plugins using DiffSuppressFuncs will be running in an "undefined" state until we enforce the use of the new interface instead by bumping the required plugin version. However this might need to happen anyway or else there will not be much way to communicate that CustomizeFunc now exists in the resource "schema".

Otherwise, this is good to go!

Copy link
Contributor

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, @vancluever!

I left various questions and comments inline, but it's generally just me trying to make sure I understand correctly what's going on here. I apologize that I didn't get a chance yet to actually run the code, so my questions so far are based just on reading it.

r.e. your thoughts about deposed resources, I would've expected that we could ignore that problem since we don't allow customizing of destroy diffs anyway. Is there a subtlety there that I missed?

I think it'd be helpful to pick out at least one of the use-cases from the original proposal and see how it looks in practice with this new API in place... that would help to make this feel more real and get to grips with how this behaves. No worries if you don't have time to do that... it's probably a good exercise for me to try to implement one anyway, to cement my understanding of how this is all fitting together.

Thanks for working on this! It's looking really promising, and I'm pleasantly surprised to see how little this disturbed the existing helper/schema logic, which has a history of being a bit "finicky" when changed.

}
// Note that this gets put into state after the update, regardless of whether
// or not anything is acted upon in the diff.
d.SetNew("computed", d.Get("computed").(int)+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any ill-effects if the Create / Update implementations override this to be a different value before they return? If the final write "wins" then that seems fine, but I want to make sure we don't end up creating the dreaded "Diffs didn't match during apply" error here in that case.

At the very least, we'd need to document that providers shouldn't do that if so, but ideally we'd find a way to make it not arise in the first place since those errors tend to be pretty annoying to track down when the do arise in the wild.

Copy link
Contributor Author

@vancluever vancluever May 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, I don't think there should be any ill effects. What would happen is any values that got set in with Set* functions in the schema.ResourceData that gets passed along would be set in the set writer layer, while the data set in the diff function would have been already properly rolled into the diff and would exist exclusively in the diff reader layer. set takes precedence, so what I'm pretty sure will happen is that after Create or Update returns any value that has been put into the set writer will be the one that "wins" when the new state is calculated.

The final step of a Resource.Apply is to merge the state using ResourceData.State(), which reconciles the state with all the data collected in the various levels of the writer, starting with the set layer (or state if in partial mode without a SetPartial for the specific key, which bypasses the diff anyway, before this change), and returns it, which ultimately becomes the new state.

The good old "diffs didn't match during apply" error was something I was worried about too, but I don't think we are at risk for that, mainly because EvalCompareDiff gets evaluated before EvalApply, which means that 1) state hasn't been written yet, and 2) Create or Update will not have fired anyway, meaning that there is no risk in either of these creating an issue. What I think we need to be more worried about is a pre-written plan drifting too far from the current real state of the resource that any custom diff behaviour is pulling to calculate the diff. However, I think that's an accurate case of what you were describing in #8769 where the user may need to re-refresh (and then re-plan, ultimately).

I think this could be properly vetoed - the drift could be detected via checking the current real world value and then comparing it to the data from GetChange, giving the opportunity to provide a friendly error to the user to request they re-plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of wondering if we can possibly build some safety in here so that this can be guarded against before it hits EvalCompareDiff and spews both of the diffs in a way that's not so friendly to the user. I want to come back to this.

return &schema.Resource{
Create: testResourceCustomDiffCreate,
Read: testResourceCustomDiffRead,
CustomizeDiff: testResourceCustomDiffCustomizeDiff,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cosmetic nit: visually most of the existing Resource attributes are short, so they line up nicely when formatted in a struct literal like this. Perhaps we could preserve that here by just calling this Diff. I think that's consistent with the other usage here, since all of these operations are in some sense "customizing" the default behavior that helper/schema naturally does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - I kind of was wondering this myself and I guess I didn't want a developer thinking that Terraform wasn't doing any diffing behaviour if this was left out now, but I do like the more concise Diff for sure. Will adjust!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so it turns out that this can't necessarily function because it collides with the already existing resource.Diff, of course... doh!

Trying to think of something else that still keeps the spirit of what we are trying to do here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Review or Revise? That is kind of what is happening here... the diff is being reviewed, and possibly being vetoed or rejected. :)

Other names I could think of were either too close to CRUD-like things that already exist (like Patch) or just didn't really roll off the tongue like the current set of functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes, of course. I forgot that there are actually methods on this struct. 😀

I think part of why this is hard to name is that this function kinda serves two purposes:

  • Check whether a diff is valid
  • Alter a diff to provide more context

So AlterDiff seems promising if we consider the checking to be an implicit side-effect, but that's still longer than all of the other verbs here, so doesn't really make a great deal of difference.

So with all of that said, maybe let's just stick with CustomizeDiff and accept the weird visual texture it creates... too minor a detail to have sweated this much over it. 😀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I did go forward with changing it to Review last night - check 6cbfe22 for the commit and how it looks cosmetically. I can roll it back to CustomizeDiff if you still think it's too ambiguous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to bike-shed it too much, but to me "reviewing" a diff sounds like something a human would do rather than something code would do, and feels a little vague.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sweat :) will roll this back. I think after this the only thing left to review will be the deposed stuff.

}

if computed {
w.computedKeys[strings.Join(address, ".")] = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a legitimate reason to pass both a non-nil value and computed = true at the same time? If not, I'd suggest that we catch that and return an error, since I think otherwise we might produce diffs that violate assumptions made elsewhere in Terraform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was running under the assumption that a zero value could still be legitimately zero and not computed, probably a mistake I made by looking at what NewComputed diffs look like. Will fix for sure - we will also need to adjust SetNewComputed so that it doesn't try and pull a zero value for the key in the schema, and just sets a nil value. Does that sound right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be entirely honest I'm not sure what the necessary behavior is here... eventually this all gets flattened to strings in the current diff format, so I would guess that we'd want to set this to something that eventually becomes the empty string, but I might be wrong about that. If you're seeing Terraform currently generating diffs where e.g. computed ints appear as "0" in the diff then what you did here is probably the right behavior, even though it feels a little counter-intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right about this - the more I think about the int diffs specifically I don't think I saw a diff of NewComputed where the value was "0". I probably just tunnel visioned on the string case.

w.lock.Lock()
defer w.lock.Unlock()
if w.computedKeys == nil {
w.computedKeys = make(map[string]bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth having an init method which can deal with this initialization task... that way it will be easier to keep the different initialization paths in sync if this evolves in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do! Looks like a pattern kind of used in ResourceData already, so I'll just follow that general example of gating it behind sync.Once.

//
// The object functions similar to ResourceData, however most notably lacks
// Set, SetPartial, and Partial, as it should be used to change diff values
// only. Most other frist-class ResourceData functions exist, namely Get,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frist 😀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns spell checking on for source files in vim ;)

return fmt.Errorf("SetNew, SetNewComputed, and SetDiff are allowed on computed attributes only - %s is not one", key)
}

return d.setDiff(key, old, new, computed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to know what Terraform's behavior is if you set old to a "lying" value that doesn't match the current state. That feels like it would violate some invariants expected by Terraform core. It might be "safer" to not allow that and instead to expect that Read should get the "old" value into a consistent state first, and then this diff API can then worry only about new values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that :) it makes sense to completely offload old value calculation to refresh/read for sure. If we do this too, we can get rid of the old value writer layer as well and just use state for old values.

@vancluever
Copy link
Contributor Author

Hey @apparentlymart! Answered the comments!

And finally, with regard to deposed resources - the main issue is that diff logic in schemaMap actually really does not set Destroy* flags. DestroyTainted is propagated through state, so that's easy to catch, but nothing else is passed down the stack to tell the schemaMap.Diff if the resource is being destroyed or deposed. In that respect, it doesn't look like EvalDiffDestroy runs a full Diff at all, and checking the state for deposed resources in EvalDiff doesn't happen until after the diff. So I think it'd be safe to say that without a way to expose deposition to schemaMap.Diff, diffs would still be possibly run on deposed resources, depending on where you are in the graph (mainly plan, from what I guess).

I haven't gone down the rabbit hole enough to fully understand how a resource is deposed and ultimately how it moves to the Deposed slice in ResourceState, so I'm not too sure if it's safe to skip the diff for deposed resources altogether and just return an &InstanceDiff{DestroyDeposed: true}, or if we should determine from state if the instance is deposed first and pass that down to the diff functionality so that it can be explicitly skipped.

Will apply the other changes when I can!

@vancluever
Copy link
Contributor Author

vancluever commented May 31, 2017

Alright, I think we have touched all of the specific points of the review now, aside from the issue on deposed resources. Let me know your thoughts on that! EDIT: Also want to look into making SetNew safer against drifted diffs.

if !d.schema[key].Computed {
return fmt.Errorf("SetNew only operates on computed keys - %s is not one", key)
}
return d.setDiff(key, d.get(strings.Split(key, "."), "state").Value, value, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During some testing in which I was prototyping using these features in the Nomad provider, I noticed that if you call SetNew with a key that doesn't exist in the schema you can get a panic:

2017/05/31 15:36:04 [DEBUG] plugin: terraform: [signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0xace9b1]
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 
2017/05/31 15:36:04 [DEBUG] plugin: terraform: goroutine 72 [running]:
2017/05/31 15:36:04 [DEBUG] plugin: terraform: github.com/hashicorp/terraform/helper/schema.(*ResourceDiff).SetNew(0xc4203f6940, 0x51fdf73, 0x2, 0x44cba80, 0xc4202a13f0, 0x0, 0x7c67d80)
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/home/mart/Devel/terraform/src/github.com/hashicorp/terraform/helper/schema/resource_diff.go:259 +0x71
2017/05/31 15:36:04 [DEBUG] plugin: terraform: github.com/hashicorp/terraform/builtin/providers/nomad.resourceJobCustomizeDiff(0xc4203f6940, 0x50303a0, 0xc42059d480, 0xc4200c5300, 0xc4203f6940)
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/home/mart/Devel/terraform/src/github.com/hashicorp/terraform/builtin/providers/nomad/resource_job.go:77 +0x156
2017/05/31 15:36:04 [DEBUG] plugin: terraform: github.com/hashicorp/terraform/helper/schema.schemaMap.Diff(0xc42055aba0, 0xc4201e0f00, 0xc420535f50, 0x5399a80, 0x50303a0, 0xc42059d480, 0x1, 0x18, 0x28)
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/home/mart/Devel/terraform/src/github.com/hashicorp/terraform/helper/schema/schema.go:413 +0xb9d
2017/05/31 15:36:04 [DEBUG] plugin: terraform: github.com/hashicorp/terraform/helper/schema.(*Resource).Diff(0xc420329420, 0xc4201e0f00, 0xc420535f50, 0x50303a0, 0xc42059d480, 0x1, 0x28, 0xc420418030)
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/home/mart/Devel/terraform/src/github.com/hashicorp/terraform/helper/schema/resource.go:225 +0x187
2017/05/31 15:36:04 [DEBUG] plugin: terraform: github.com/hashicorp/terraform/helper/schema.(*Provider).Diff(0xc420329490, 0xc4201e08c0, 0xc4201e0f00, 0xc420535f50, 0x7f75821c1e10, 0x0, 0x0)
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/home/mart/Devel/terraform/src/github.com/hashicorp/terraform/helper/schema/provider.go:255 +0x9b
2017/05/31 15:36:04 [DEBUG] plugin: terraform: github.com/hashicorp/terraform/plugin.(*ResourceProviderServer).Diff(0xc42015a980, 0xc42026b840, 0xc420429600, 0x0, 0x0)
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/home/mart/Devel/terraform/src/github.com/hashicorp/terraform/plugin/resource_provider.go:499 +0x57
2017/05/31 15:36:04 [DEBUG] plugin: terraform: reflect.Value.call(0xc420362720, 0xc420434240, 0x13, 0x51ff043, 0x4, 0xc4205e7f20, 0x3, 0x3, 0x53a3bc8, 0xc4200286e8, ...)
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/usr/lib/go-1.8/src/reflect/value.go:434 +0x91f
2017/05/31 15:36:04 [DEBUG] plugin: terraform: reflect.Value.Call(0xc420362720, 0xc420434240, 0x13, 0xc420028720, 0x3, 0x3, 0xc42002870c, 0x180001, 0x300000000)
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/usr/lib/go-1.8/src/reflect/value.go:302 +0xa4
2017/05/31 15:36:04 [DEBUG] plugin: terraform: net/rpc.(*service).call(0xc4201c13c0, 0xc4201c1380, 0xc420428e40, 0xc420576600, 0xc42026b060, 0x43484c0, 0xc42026b840, 0x16, 0x4348500, 0xc420429600, ...)
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/usr/lib/go-1.8/src/net/rpc/server.go:387 +0x144
2017/05/31 15:36:04 [DEBUG] plugin: terraform: created by net/rpc.(*Server).ServeCodec
2017/05/31 15:36:04 [DEBUG] plugin: terraform: 	/usr/lib/go-1.8/src/net/rpc/server.go:481 +0x404
2017/05/31 15:36:04 [DEBUG] plugin: /home/mart/Devel/terraform/bin/terraform: plugin process exited

The panic went away once I realized my mistake in naming the attribute, but if possible it'd be nice for this to return an error rather than crash. 😀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added code to fix this in e61e892.

@apparentlymart
Copy link
Contributor

I built a prototype of how the nomad provider might use this.

The work of including Nomad's diff inside Terraform's diff is unfortunately rather arduous due to the way Nomad returns diff information... the API structure is primarily designed for mechanical display in the CLI, rather than to be machine-readable. But it does achieve the goal of allowing Terraform to ensure that the configuration doesn't drift between plan and apply, and allows us to validate the jobspec during the plan phase, both of which are great improvements on their own, I think.

For a real first version I'd probably tear out all of the Computed: true attributes except job_modify_index and get the drift-detection and validation benefits, leaving the display of the actual diff details for later where hopefully we can find a way to get this done in a more-robust way with some help from Nomad itself.

@apparentlymart
Copy link
Contributor

While I was playing with d.ForceNew I unfortunately hit the dreaded "diffs didn't match during apply" error. Unfortunately I've run out of time to dig into this today, but I figured I'd share the info in case you have a quick theory as to what might cause this:

$ terraform plan -out=tfplan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

nomad_job.test: Refreshing state... (ID: foo)
The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.

Your plan was also saved to the path below. Call the "apply" subcommand
with this plan file and Terraform will exactly execute this execution
plan.

Path: tfplan

-/+ nomad_job.test
    datacenters.#:         "1" => "2"
    datacenters.0:         "dc1" => "dc2"
    datacenters.1:         "" => "dc3"
    datacenters.2:         "" => "dc3"
    deregister_on_destroy: "true" => "true"
    evaluation_id:         "7d2695a6-b5a2-a212-81d4-4294b45cc128" => "<computed>"
    job_modify_index:      "171" => "<computed>"
    jobspec:               "job \"foo\" {\n  datacenters = [\"dc1\"]\n  type = \"service\"\n\n  group \"foo\" {\n    task \"foo\" {\n      driver = \"raw_exec\"\n      config = {\n        command = \"/bin/sleep\"\n        args = [\"25\"]\n      }\n\n      resources {\n        cpu = 20\n        memory = 10\n      }\n    }\n  }\n}\n\n" => "job \"fooo\" {\n  datacenters = [\"dc2\", \"dc3\"]\n  region = \"global\"\n  type = \"service\"\n\n  group \"foo\" {\n    count = 5\n\n    task \"foo\" {\n      driver = \"raw_exec\"\n      config = {\n        command = \"/bin/sleep\"\n        args = [\"25\"]\n      }\n\n      resources {\n        cpu = 20\n        memory = 10\n      }\n    }\n  }\n}\n\n"
    name:                  "foo" => "fooo" (forces new resource)
    priority:              "50" => "<computed>"
    region:                "global" => "global"
    task_groups.#:         "1" => "1"
    task_groups.0.count:   "1" => "5"
    task_groups.0.meta.%:  "0" => "<computed>"
    task_groups.0.name:    "foo" => "foo"
    task_groups.0.tasks.#: "1" => "<computed>"


Plan: 1 to add, 0 to change, 1 to destroy.
$ terraform apply tfplan
nomad_job.test: Destroying... (ID: foo)
nomad_job.test: Destruction complete
Error applying plan:

1 error(s) occurred:

* nomad_job.test: nomad_job.test: diffs didn't match during apply. This is a bug with Terraform and should be reported as a GitHub Issue.

Please include the following information in your report:

    Terraform Version: 0.9.6
    Resource ID: nomad_job.test
    Mismatch reason: attribute mismatch: datacenters.2
    Diff One (usually from plan): *terraform.InstanceDiff{mu:sync.Mutex{state:0, sema:0x0}, Attributes:map[string]*terraform.ResourceAttrDiff{"datacenters.0":*terraform.ResourceAttrDiff{Old:"dc1", New:"dc2", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "task_groups.0.count":*terraform.ResourceAttrDiff{Old:"1", New:"5", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "name":*terraform.ResourceAttrDiff{Old:"foo", New:"fooo", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:true, Sensitive:false, Type:0x0}, "region":*terraform.ResourceAttrDiff{Old:"global", New:"global", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "task_groups.0.meta.%":*terraform.ResourceAttrDiff{Old:"0", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "task_groups.0.name":*terraform.ResourceAttrDiff{Old:"foo", New:"foo", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "datacenters.1":*terraform.ResourceAttrDiff{Old:"", New:"dc3", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "task_groups.0.tasks.#":*terraform.ResourceAttrDiff{Old:"1", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "job_modify_index":*terraform.ResourceAttrDiff{Old:"171", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "datacenters.2":*terraform.ResourceAttrDiff{Old:"", New:"dc3", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "deregister_on_destroy":*terraform.ResourceAttrDiff{Old:"true", New:"true", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "jobspec":*terraform.ResourceAttrDiff{Old:"job \"foo\" {\n  datacenters = [\"dc1\"]\n  type = \"service\"\n\n  group \"foo\" {\n    task \"foo\" {\n      driver = \"raw_exec\"\n      config = {\n        command = \"/bin/sleep\"\n        args = [\"25\"]\n      }\n\n      resources {\n        cpu = 20\n        memory = 10\n      }\n    }\n  }\n}\n\n", New:"job \"fooo\" {\n  datacenters = [\"dc2\", \"dc3\"]\n  region = \"global\"\n  type = \"service\"\n\n  group \"foo\" {\n    count = 5\n\n    task \"foo\" {\n      driver = \"raw_exec\"\n      config = {\n        command = \"/bin/sleep\"\n        args = [\"25\"]\n      }\n\n      resources {\n        cpu = 20\n        memory = 10\n      }\n    }\n  }\n}\n\n", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "priority":*terraform.ResourceAttrDiff{Old:"50", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "evaluation_id":*terraform.ResourceAttrDiff{Old:"7d2695a6-b5a2-a212-81d4-4294b45cc128", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "datacenters.#":*terraform.ResourceAttrDiff{Old:"1", New:"2", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "task_groups.#":*terraform.ResourceAttrDiff{Old:"1", New:"1", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}}, Destroy:false, DestroyDeposed:false, DestroyTainted:false, Meta:map[string]interface {}(nil)}
    Diff Two (usually from apply): *terraform.InstanceDiff{mu:sync.Mutex{state:0, sema:0x0}, Attributes:map[string]*terraform.ResourceAttrDiff{"task_groups.0.tasks.#":*terraform.ResourceAttrDiff{Old:"", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "job_modify_index":*terraform.ResourceAttrDiff{Old:"", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "priority":*terraform.ResourceAttrDiff{Old:"", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "deregister_on_destroy":*terraform.ResourceAttrDiff{Old:"", New:"true", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "evaluation_id":*terraform.ResourceAttrDiff{Old:"", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "datacenters.0":*terraform.ResourceAttrDiff{Old:"", New:"dc2", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "datacenters.#":*terraform.ResourceAttrDiff{Old:"", New:"2", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "jobspec":*terraform.ResourceAttrDiff{Old:"", New:"job \"fooo\" {\n  datacenters = [\"dc2\", \"dc3\"]\n  region = \"global\"\n  type = \"service\"\n\n  group \"foo\" {\n    count = 5\n\n    task \"foo\" {\n      driver = \"raw_exec\"\n      config = {\n        command = \"/bin/sleep\"\n        args = [\"25\"]\n      }\n\n      resources {\n        cpu = 20\n        memory = 10\n      }\n    }\n  }\n}\n\n", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "task_groups.0.count":*terraform.ResourceAttrDiff{Old:"", New:"5", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "name":*terraform.ResourceAttrDiff{Old:"", New:"fooo", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:true, Sensitive:false, Type:0x0}, "task_groups.0.meta.%":*terraform.ResourceAttrDiff{Old:"", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "region":*terraform.ResourceAttrDiff{Old:"", New:"global", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "datacenters.1":*terraform.ResourceAttrDiff{Old:"", New:"dc3", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "task_groups.#":*terraform.ResourceAttrDiff{Old:"", New:"1", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "task_groups.0.name":*terraform.ResourceAttrDiff{Old:"", New:"foo", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}}, Destroy:false, DestroyDeposed:false, DestroyTainted:false, Meta:map[string]interface {}(nil)}

Also include as much context as you can about your config, state, and the steps you performed to trigger this error.


Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

@vancluever
Copy link
Contributor Author

I can't make heads or tails of that diff at the moment... looks like some sort of weird off-by-one-ish error to me, but I have no idea where that would have gotten so messed up! Only thing I can think of that would be different is the lack of state in the apply diff might be preventing something from happening that is causing the messed up plan diff.

Trying to figure out what would cause that, I added a repro test that modifies the list as you would expect it to be modified but haven't come up with anything yet 🤔

@vancluever
Copy link
Contributor Author

vancluever commented Jun 1, 2017

Still having a hard time re-producing this... even set up a mock-ish repro in the test provider (which I will commit soon) but still nothing... short of setting up a full multi-dc environment in Nomad I'm not too sure how closer I can get.

Do you have any Vagrantfiles or TF code for standing up a multi-dc Nomad for testing? :)

@apparentlymart
Copy link
Contributor

Although I was changing my nomad job definition to include other datacenters I was actually just running a single instance of nomad agent in dev mode, with nomad agent -dev. Of course had that error not appeared the apply would ultimately have failed due to the missing datacenters, but it didn't get that far. 😀

@vancluever
Copy link
Contributor Author

Gotcha - will give that a spin later then 😀

@vancluever
Copy link
Contributor Author

Okay, got the repro going - now to figure out why flagging one attribute as ForceNew affects a completely different one 😨

@vancluever
Copy link
Contributor Author

One thing I'm kind of thinking of here - possibly the re-diff (with no state)/attribute copy that a RequiresNew forces is causing issues. Chasing it down now.

@vancluever
Copy link
Contributor Author

vancluever commented Jun 2, 2017

Figured it out 😀

It's the second run here that performs a new "create" diff on an empty state.

Once I dropped the part where we pull the existing datacenters from state and just worked solely off of Nomad as the source of truth, things worked great. I'm guessing the lack of state was creating undefined behaviour in the slice diff logic when the Nomad diff data was trying to work off an empty slice, and then combining that with the original list diff.

-/+ nomad_job.test
    datacenters.#:         "1" => "2"
    datacenters.0:         "dc1" => "dc2"
    datacenters.1:         "" => "dc3"
    deregister_on_destroy: "true" => "true"
    evaluation_id:         "df48011b-907e-8f5e-9a74-69f43c2a0086" => "<computed>"
    job_modify_index:      "67" => "<computed>"
    jobspec:               "job \"foo\" {\n  datacenters = [\"dc1\"]\n  type = \"service\"\n\n  group \"foo\" {\n    task \"foo\" {\n      driver = \"raw_exec\"\n      config = {\n        command = \"/bin/sleep\"\n        args = [\"25\"]\n      }\n\n      resources {\n        cpu = 20\n        memory = 10\n      }\n    }\n  }\n}\n" => "job \"bar\" {\n  datacenters = [\"dc2\", \"dc3\"]\n  type = \"service\"\n\n  group \"foo\" {\n    task \"foo\" {\n      driver = \"raw_exec\"\n      config = {\n        command = \"/bin/sleep\"\n        args = [\"25\"]\n      }\n\n      resources {\n        cpu = 20\n        memory = 10\n      }\n    }\n  }\n}\n"
    name:                  "foo" => "bar" (forces new resource)
    priority:              "50" => "<computed>"
    region:                "global" => "global"
    task_groups.#:         "1" => "1"
    task_groups.0.count:   "1" => "1"
    task_groups.0.meta.%:  "0" => "<computed>"
    task_groups.0.name:    "foo" => "foo"
    task_groups.0.tasks.#: "1" => "<computed>"
nomad_job.test: Destroying... (ID: foo)
nomad_job.test: Destruction complete
nomad_job.test: Creating...
  datacenters.#:         "" => "2"
  datacenters.0:         "" => "dc2"
  datacenters.1:         "" => "dc3"
  deregister_on_destroy: "" => "true"
  evaluation_id:         "" => "<computed>"
  job_modify_index:      "" => "<computed>"
  jobspec:               "" => "job \"bar\" {\n  datacenters = [\"dc2\", \"dc3\"]\n  type = \"service\"\n\n  group \"foo\" {\n    task \"foo\" {\n      driver = \"raw_exec\"\n      config = {\n        command = \"/bin/sleep\"\n        args = [\"25\"]\n      }\n\n      resources {\n        cpu = 20\n        memory = 10\n      }\n    }\n  }\n}\n"
  name:                  "" => "bar"
  priority:              "" => "<computed>"
  region:                "" => "global"
  task_groups.#:         "" => "1"
  task_groups.0.count:   "" => "1"
  task_groups.0.meta.%:  "" => "<computed>"
  task_groups.0.name:    "" => "foo"
  task_groups.0.tasks.#: "" => "<computed>"
nomad_job.test: Creation complete (ID: bar)

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

This is behaviour that was inherited from the existing diff logic, with the main purpose from what I can see of making a diff that best reflects a newly created resource.

I kind of wonder if this is one of those gotchas that we are going to have to document - that a second pass is made when a new resource is forced, and that providers are going to need to be coded to prepare for that scenario. Thoughts?

@apparentlymart
Copy link
Contributor

Ahh... so the problem was that the diff was being run twice, and we were re-appending the change a second time? Interesting...

Unfortunately Nomad's API doesn't currently give us enough information to produce a full image of the final state without doing these partial updates (it happened to work in this case, I think, just because the diff was to entirely replace the list) but that's a problem with the Nomad provider, not with your code here.

Would it be possible for us to arrange for that second call to be made against the original diff, rather than the one that resulted from running the CustomizeDiff function the first time? That way I think we'd better preserve the intended abstraction, making the code then not need to deal with the fact that it's sometimes being asked to re-run against its own output.

@vancluever
Copy link
Contributor Author

Actually, it's not necessarily running off its own output - the second run runs off of the new diff created from the empty state.

The workflow goes like this (with CustomizeDiff included):

All Runs

  • First diff happens (state vs config) (diff named result)
  • CustomizeDiff is called, working off of diff now in result
  • Re-diff happens against keys that were modified (rolled into same result diff)

If RequiresNew is Set

  • New diff (named result2) is created
  • DestroyTainted is preserved from result diff
  • ResourceData used by previous diff is reset with an empty state
  • New diff is performed, with results going into result2
  • CustomizeDiff is called again, but this time with diff results working off of and going into result2
  • Re-diff for modified fields are rolled into result2
  • Reset any RequiresNew flags in result2
  • Copy state from the start of this entire process into the old values for result2
  • Copy in RequiresNew diffs from old result diff (by copying in the diff completely from result if it doesn't exist in result2, or by just setting RequiresNew if needed)
  • Swap out result for result2

So basically I think what is happening here is that an essential step to make sure that, for diffs that require a new resource to be created, that the diff as closely as possible reflects what creating a new resource looks like, meaning that the customization needs to run as if there was no state as well the second time (and with the same diff data the nil state vs config diff worked off of too, to make sure that any logic in the second CustomizeDiff run can as closely reflect the diff as it will go into Create as).

Does this make sense? By all of this, passing the original diff in to perform the logic again off of that seems like it could possibly lead to an undefined state, as you are now kind of working off of the original data in an incremental fashion, which is also not necessarily what the spirit of that second diff is going for as well.

Last night I actually even tried to have result2 re-diff off of the original ResourceDiff used in result, but ironically I ended up with an off-by-one set in the tests when I did that 😛

@apparentlymart
Copy link
Contributor

Hmm. I'm not sure I follow here.

If the second run for RequiresNew has no state, shouldn't reading the list with Get return an empty list, and thus be equivalent to not reading it at all and just starting with an empty list?

It is certainly possible that we could ship a first version that relies on docs for what to do and not to do here, but since the "diffs didn't match" error is often hard to debug if possible I'd like to find a way to at least catch it and show a different error, so we can more easily recognize when we've hit this situation.

@vancluever
Copy link
Contributor Author

vancluever commented Jun 3, 2017

Hmm, this is interesting - added some debug logging to monitor the slice before/after:

2017/06/02 19:08:43 [DEBUG] plugin: terraform: nomad-provider (internal) 2017/06/02 19:08:43 [DEBUG] datacenters, before: []interface {}{"dc1"}
2017/06/02 19:08:43 [DEBUG] plugin: terraform: nomad-provider (internal) 2017/06/02 19:08:43 [DEBUG] datacenters, after: []interface {}{"dc1", "dc2", "dc3"}
2017/06/02 19:08:43 [DEBUG] plugin: terraform: nomad-provider (internal) 2017/06/02 19:08:43 [DEBUG] datacenters, before: []interface {}{}
2017/06/02 19:08:43 [DEBUG] plugin: terraform: nomad-provider (internal) 2017/06/02 19:08:43 [DEBUG] datacenters, after: []interface {}{"dc2", "dc3"}

So it looks like the issue is with the first diff and not the second...

More on this: grepped on a trace-level log (which includes the nomad diff), and no Deleted entry at all mentioned. Looks like deleted datacenters are not mentioned at all. Log below:

Moved to Gist

So the second run did return an empty list, and it looks like it was actually the correct diff, on part of the lack of state to pull the dc1 from that was never removed 🙂

I do agree though that the two runs can be confusing to a resource author, I'm just kind of wondering how we can go about mitigating the confusion, seeing as it seems like an important part of the diff when a new resource needs to be forced... maybe we can pass some context down to ResourceDiff that will tell the author what diff run they are in?

@apparentlymart
Copy link
Contributor

I think we should plan to not include detailed Nomad diff information in the Terraform diff to start, since that requires a lot of complex wrangling of the diff structure. But it seems like there's something underlying this -- something I'm not sure I quite follow -- that indicates a bug or limitation of the ResourceDiff API. I would expect that the second run of the diff function for a replace would behave identically to how it would be run for the initial create, but it seems like something's not quite lining up there...

Regarding allowing the provider to distinguish between a create and an update, I think that's a reasonable idea. I think we have something similar on the ResourceData type too, and it would allow the custom diff implementation to skip certain parts of the logic wholesale if we're in a create and thus the "old value" isn't meaningful.

Ideally though I'd prefer to see the CustomizeDiff function only need to deal with the "create" and "update" cases, and avoid having to distinguish "replace". The CustomizeDiff function already has a lot of power and potential complexity relative to the other functions, so I'm concerned that if we put too much on the resource author we are inviting lots of weird, hard-to-test bugs; for example, acceptance tests rarely test the replacement case.

@vancluever
Copy link
Contributor Author

The reason why the diff doesn't necessarily function the exact same way the second run is because of the lack of state, which - and I'm guessing that the only way we'd know for sure is maybe to page @mitchellh - is because the intention here is to try and make the diff look as close as possible to what the diff for a completely new resource looks like. I would say it's not a bug, but a side-effect of how a diff operates when a resource needs to be replaced, and something that I think should be respected as much as possible and I don't think is really worth it to break, because in that instance, the second run is pretty close to how CustomizeDiff would be run for a Create operation, and especially if some context was passed, a provider could definitely account for it and act accordingly.

I think right now, the line is a little too blurred between create and replace to really try and distinguish between the two - the moment that a change is made that ultimately requires a new resource during customization, you need to support the creation case again anyway, so we would probably have to drop support for ForceNew, and also remove the ability of this function to work with Optional+Computed fields as well, restricting it to just Computed only.

@apparentlymart
Copy link
Contributor

Sorry I think I wasn't clear... I was trying to say that it is expected that the second run in the replace case has no state, and that from the implementer's perspective it should be indistinguishable from the create case, so that the CustomizeDiff implementer only needs to care about two cases.

So I think we are agreeing. 😀

vancluever and others added 19 commits November 1, 2017 14:25
To keep with the current convention of most other schema.Resource
functional fields being fairly short, CustomizeDiff has been changed to
"Review". It would be "Diff", however it is already used by existing
functions in schema.Provider and schema.Resource.
Meant to remove this before finalizing the PR. :P
When working on this initially, I think I thought that since NewComputed
values in the diff were empty strings, that it was using the zero value.
After review, it doesn't seem like this is the case - so I have adjusted
NewComputed to pass nil values. There is also a guard now that keeps the
new value writer from accepting computed fields with non-nil values.
Removed some outdated comments for SetNew, and normalized the computed
key note for SetNew, SetNewComputed, and SetDiff.
The consensus is that it's a generally better idea to move setting the
functionality of old values completely to the refresh/read process,
hence it's moot to have this function around anymore. This also means we
don't need the old value reader/writer anymore, which simplifies things.
This could panic if we sent a parent that was actually longer than the
child (should almost never come up, but the guard will make it safe
anyway).

Also fixed a slice style lint warning in UpdatedKeys.
This will make this check safer and will remove the risk of it passing
on keys with a similar prefix.
This simplifies the new value initialization and future-proofs it
against more complex initialization functionality.
The old comments said that this interface was API compatible with
terraform.ResourceProvider's Diff method - with the addition of passing
down meta to it, this is no longer the case.

Not too sure if this is really a big deal - schema.Resource never fully
implemented terraform.ResourceProvider, from what I can see, and the
path from Provdier.Diff to Resource.Diff is still pretty clear. Just
wanted to remove an outdated comment.
Restoring the naming of this field in the resource back to
CustomizeDiff, as this is generally more descriptive of the process
that's happening, despite the lengthy name.
Added a list SetNew test to try and reproduce issues testing diff
customization with the Nomad provider. We are running into "diffs didn't
match during apply", with the plan diff exhibiting a strange
off-by-one-type error in a list diff:

  datacenters.#:         "1" => "2"
  datacenters.0:         "dc1" => "dc2"
  datacenters.1:         "" => "dc3"
  datacenters.2:         "" => "dc3"

The test here does not reproduce that issue, unfortunately, but should
help pinpoint the root cause through elimination.
This fixes nil pointer issues that could come up if an invalid key was
referenced (ie: not one in the schema). Also ships a helper validation
function to streamline things.
setDiff does not make use of its new parameter anymore, so it has been
removed. Also, as there is no more SetDiff (exported) function, mentions
of that have been removed from comments too.
A diff new needs to pass basic value checks to be considered the
"same". Several provisions have been added to ensure that the list, set,
and RequiresNew behaviours that have needed some exceptions in the past
are preserved in this new logic.

This ensures that we are checking for value equality as much as
possible, which will be more important when we transition to the
possibility of diffs being sourced from external data.
Needed to add more cases to support value comparison exceptions that the
rest of TF expects to work (this fixes tests in various places).

Also moved things to a switch block so that it's a little more compact.
Added some more detailed comments to CustomizeDiff's comments. The new
comments detail how CustomizeDiff will be called in the event of
different scenarios like creating a new resource, diffing an existing
one, diffing an existing resource that has a change that requires a new
resource, and destroy/tainted resources.

Also added similar detail to ForceNew in ResourceDiff.

This should help mitigate any confusion that may come up when using
CustomizeDiff, especially in the ForceNew scenario when the second run
happens with no state.
This keeps CustomizeDiff from being defined on data sources, where it
would be useless. We just catch this in InternalValidate like the rest
of the CRUD functions that are not used in data sources.
@apparentlymart apparentlymart merged commit 4787428 into hashicorp:master Nov 1, 2017
@vancluever
Copy link
Contributor Author

🎉 thanks for the merge @apparentlymart!

@ghost
Copy link

ghost commented Apr 6, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants