Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tofu Graph Target Reduction #1639

Open
theherk opened this issue May 10, 2024 · 0 comments
Open

Tofu Graph Target Reduction #1639

theherk opened this issue May 10, 2024 · 0 comments
Labels
pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. rfc

Comments

@theherk
Copy link

theherk commented May 10, 2024

Summary

Give users a new tool to speed up tofu in some cases, by adding preferential graph reduction without manually specifying targeting on the command line. In addition, add intelligence to expand to the complete graph when there are no such requested or eligible reductions by default, but allow this step to be skipped via a new CLI option.

Initial caveat: I am neither a competent graph theoretician, deeply knowledgeable of the source core or dag package, nor am I aware of how all the graph transformations are implemented.

Important to note that this is a completely non-breaking change.

Problem Statement

This is a slightly more sussed out idea of that which spawned in this comment. I believe there is a transformation somewhere of the completed graph which takes a subset of that graph based on --target arguments 1.
There must be cases 2 3 where a user may want to specify in configuration that when a state address's config has a diff from the refreshed state, it should automatically be targeted and its subgraph should be used for plan and apply.

User-facing description

lifecycle {target = true} would be like automatically assuming the -target option was given for this state address. Not giving it at all in a state, would be just like not giving any -target options, and giving -no-target would act as though no -target options were given even if the lifecycle options were given.

Technical Description

Implementation

Lifecycle attribute

Add a new boolean attribute to the lifecycle meta-argument target, which default value is false and only relevant at all if given as true. This condition signals to tofu that when the final graph is generated the newly proposed process should take place.

CLI option

Add CLI option -no-targets that ignores all requested targeting. This of course would make no sense without the added lifecycle attribute, since this would easily be assumed if no -target options were given. But with this addition, this would serve to negate the new default behavior the traverse the whole graph precisely as though no such attribute had ever been added.

Process

  1. Refresh only state along edges for nodes where reduction is requested (i.e. lifecycle {target = true}).
  2. If none are affected, finish state refresh walk and plan normally.
  3. If any are affected, only plan the subgraph containing those nodes.
  4. Notify the user that targeting was in affect for which nodes, so another cycle may need to be run for non-reduction addresses.

Rationale and alternatives

The rational is simple. If there is a situation where one regularly targets portions of the state because it can be much faster in some cases, this would allow the setting of that preference in code rather than requiring the CLI. Further, it has very little downside since it is completely backward compatible.

Downsides

Downside in parallelism

One of the great gains of the current graph implementation, is that it allows a parallel walk for each operation. Therefore it can wait for the provider dependencies called asynchronously to comeback to the node depending before proceeding. This could (only where used) potentially stunt that gain for other nodes depending on the target address but upon which the target does not depend. That is because of step 2 in the process. Since only refreshing is done first to check if the targeted address has a diff from the refreshed state, allowing the user to avoid many unnecessary API calls, only if the target address has no diff and the fallback to full graph is to be done those calls would then trigger. Now, they would only trigger synchronously in that subgraph, but this would be a cost to be considered by the user. I believe in nearly every case, the benefit would far outweigh the cost.

Unresolved Questions

No response

Related Issues

No response

Proof of Concept

No PoC is performed yet, however I did do some work on illustrative graphs, but they are incomplete, and rather than waiting until time allows me to make this 100%, I thought I'd get it into the public eye so the community can take it further if others are interested.

Footnotes

  1. targeting assumption
    I am making this assumption, partly out of ignorance but strictly based on reasoning given in the documentation about how targeting works. Targeting only works if a subset of the graph is walked in certain steps. In either case, the implementation shouldn't affect the viability of the proposal.

  2. simple case
    An API Gateway has a resource policy with an IP allow list, where if updated, its changes are prescient and waiting for a full state refresh and application is undesirable.

  3. must exist logic
    The reason I'm confident such a case must exist, aside from the contrived but private example, if because targeting exists. That itself implies along the way some people thought it would be useful.

@theherk theherk added pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. rfc labels May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. rfc
Projects
None yet
Development

No branches or pull requests

1 participant