Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate missing post plan terminal statuses to ensure plan is completely processed before attempting to confirm run #921

Merged
merged 5 commits into from
Jun 29, 2023

Conversation

Uk1288
Copy link
Contributor

@Uk1288 Uk1288 commented Jun 6, 2023

Description

The changes in this PR checks potential terminal post plan statuses to ensure that a plan has been completely processed before proceeding to confirm the associated run. The terminal status changes depending on the post plan lifecycle that a run has. For example a plan can terminate at planned status or cost_estimated or policy_checked.

See issue for more details: #898

This PR depends on go-tfe PR#712

Testing plan

  1. Workspace_run resource should create run as usual, the only way to produce the issue consistently is to force a run to remain in a status, e.g policy_checked, for a longer period of time such that workspace_run receives that status while polling the run. I achieved this by adding delays to each run state

External links

Output from acceptance tests

$ TESTARGS="-run TestAccTFEWorkspaceRun" make testacc

...

@Uk1288 Uk1288 requested a review from a team as a code owner June 6, 2023 14:39
@Uk1288 Uk1288 force-pushed the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch 2 times, most recently from dc9fe26 to 4ec3109 Compare June 6, 2023 14:51
tfe.RunQueuingApply: true,
}

var policyOverridenStatuses = map[tfe.RunStatus]bool{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var policyOverridenStatuses = map[tfe.RunStatus]bool{
var policyOverriddenStatuses = map[tfe.RunStatus]bool{

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I find your diagnosis and fix to be convincing... but I'm also finding the helpers to be difficult to read and reason about and feel like we should take a moment to simplify them. I identified these specific things that could make this module simpler:

  1. syntactically, I think there are two problems with awaitRun:
    a) nested switch statements-- I find it difficult to mentally retain a stack context once we reach that second nesting.
    b) this function takes a lot of arguments, which is an indicator that it could be doing too much or the input data is not related enough
  2. mixed responsibilities: there are interface{} and provider-specific args in many helpers. Example: meta interface{} and schema.ResourceData args could be replaced with strong types like ConfiguredClient, retry/retry backoff integers, workspace ID, etc -- The idea would be that the helpers aren't interpreting provider schema or interfaces so that narrows the scope of their input.
  3. the main entry point, createWorkspaceRun is looong so it may be best to try and break it into conceptual stages. I don't have a strong feeling about how to improve it, yet, but just moving code to subroutines that reflect the actual run stages might be a first step.

Does any of this resonate with you? What are your thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this makes sense to me. I broke down the methods into smaller ones to represent action steps. The only one I have not updated is the double switch statement because the method is now smaller and straight forward. If we still want to get rid of that, I can replace it with an if..else statement. WDYT?

@Uk1288 Uk1288 force-pushed the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch 2 times, most recently from 368d0b3 to 321d64e Compare June 14, 2023 15:09
@Uk1288
Copy link
Contributor Author

Uk1288 commented Jun 14, 2023

  • TO-DO: upgrade go-tfe after changes are good to go.

Copy link
Member

@nfagerlund nfagerlund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting there! I think you've split out the duties in a fairly legible way among these functions; some of the argument lists are still a bit unpleasant, but I think that's destiny, kind of, when the task is this annoying.

But I have some major concerns about mutating that top-scope planPendingStatuses map, and I don't think we should do it. I think what I'd suggest for a refactor goes something like this:

The root issue:

  • isPlanComplete() needs to inherently know about the list of plan terminal statuses.
  • The call to awaitRun() needs to pass in the list of plan pending statuses.
  • The actual lists of plan pending/terminal statuses depend on the run, plus whether hasPostPlanTaskStage (which is inherent in the run but has to be discovered by making additional requests and then remembered for later, and we don't want those tightly-scoped helper funcs hitting the client.)

What to do:

  • Make a func that takes (run, hasPostPlanTaskStage) and returns a fresh planPendingStatuses map. (Which you'll later pass to awaitRun().)
  • Make a func that takes (run, hasPostPlanTaskStage) and returns a fresh planTerminalStatuses map.
  • Make a func that takes planTerminalStatuses and returns an anonymous function that can be used as an isPlanComplete() func.
  • Then completely remove hasPostPlanTaskStage from the signatures of awaitRun and all the completion checker helper funcs, because isPlanComplete doesn't need it anymore -- it'll be a closure that already knows the things it currently needs hasPostPlanTaskStage in order to derive.

Comment on lines +461 to +487
planPendingStatuses[tfe.RunCostEstimated] = true
planPendingStatuses[tfe.RunPlanned] = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Uh...... hmm. The planPendingStatuses map is defined at global scope in this package, which means that this is 🚨 mutating shared global state in the process of checking the run status. That freaks me out a bit! I can't remember whether the lifecycle of a provider server means we'll never be processing multiple runs concurrently, but this really feels like it's asking for trouble.

I DO like the logic of "sort out which statuses are actually pending or terminal based on the nature of the run we're working with," but I believe this requires each run to have its own list of pending statuses. (Or, like you already did with the terminal map, each invocation of the polling helper to have its own list.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, mutating the global planPendingStatuses map raised questions for me as well, am glad that this comment validated my thought.

@nfagerlund nfagerlund mentioned this pull request Jun 22, 2023
@Uk1288 Uk1288 force-pushed the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch from 321d64e to 25c1498 Compare June 26, 2023 18:36
@nfagerlund
Copy link
Member

So these revisions look pretty great. Once the required extra status is merged into go-tfe and the provider can build properly, hit me up if I'm around and I'll try and do a smoke test; I think I still have a config sitting around from the last time we jammed on this.

@Uk1288 Uk1288 force-pushed the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch from 25c1498 to 2d12bfd Compare June 29, 2023 14:00
Copy link
Member

@nfagerlund nfagerlund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right!!! I smoke-tested this with the update to a released go-tfe version, and it works like a charm. We won't know for sure that this fixes the issue until we have the depth of randomness provided by the CI tests for some weeks, but I have a very good feeling about it at this point!!

We still need to fix the dang TFC runs API though 😆

:shipit:

@Uk1288 Uk1288 merged commit 98a029e into main Jun 29, 2023
9 checks passed
@Uk1288 Uk1288 deleted the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch June 29, 2023 18:13
github-merge-queue bot pushed a commit to panda-den/colorful-pandas that referenced this pull request Jul 4, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tfe](https://registry.terraform.io/providers/hashicorp/tfe)
([source](https://togithub.com/hashicorp/terraform-provider-tfe)) |
required_provider | minor | `0.45.0` -> `0.46.0` |

---

### ⚠ Dependency Lookup Warnings ⚠

Warnings were logged while processing this repo. Please check the
Dependency Dashboard for more information.

---

### Release Notes

<details>
<summary>hashicorp/terraform-provider-tfe (tfe)</summary>

###
[`v0.46.0`](https://togithub.com/hashicorp/terraform-provider-tfe/blob/HEAD/CHANGELOG.md#v0460-July-3-2023)

[Compare
Source](https://togithub.com/hashicorp/terraform-provider-tfe/compare/v0.45.0...v0.46.0)

FEATURES:

- **New Resource**: `r/tfe_agent_pool_allowed_workspaces` restricts the
use of an agent pool to particular workspaces, by
[@&#8203;hs26gill](https://togithub.com/hs26gill)
[870](https://togithub.com/hashicorp/terraform-provider-tfe/pull/870)
- `r/tfe_organization_token`: Add optional `expired_at` field to
organization tokens, by
[@&#8203;juliannatetreault](https://togithub.com/juliannatetreault)
(#&#8203[hashicorp/terraform-provider-tfe#844))
- `r/tfe_team_token`: Add optional `expired_at` field to team tokens, by
[@&#8203;juliannatetreault](https://togithub.com/juliannatetreault)
(#&#8203[hashicorp/terraform-provider-tfe#844))
- `r/tfe_agent_pool`: Add attribute `organization_scoped` to set the
scope of an agent pool, by
[@&#8203;hs26gill](https://togithub.com/hs26gill)
[870](https://togithub.com/hashicorp/terraform-provider-tfe/pull/870)
- `d/tfe_agent_pool`: Add attribute `organization_scoped` and
`allowed_workspace_ids` to retrieve agent pool scope and associated
allowed workspace ids, by
[@&#8203;hs26gill](https://togithub.com/hs26gill)
[870](https://togithub.com/hashicorp/terraform-provider-tfe/pull/870)

BUG FIXES:

- `r/tfe_workspace_run`: Ensure `wait_for_run` correctly results in a
fire-and-forget run when set to `false`, by
[@&#8203;lucymhdavies](https://togithub.com/lucymhdavies)
(#&#8203[hashicorp/terraform-provider-tfe#910))
- `r/tfe_workspace_run`: Fix rare random run failures; adjust lists of
expected run statuses to ensure that a plan is completely processed
before attempting to apply it, by
[@&#8203;uk1288](https://togithub.com/uk1288)
(#&#8203[hashicorp/terraform-provider-tfe#921))
- `r/tfe_notification_configuration`: Add support for missing "Check
failed" Health Event notifications, by
[@&#8203;lucymhdavies](https://togithub.com/lucymhdavies)
(#&#8203[hashicorp/terraform-provider-tfe#927))
- `r/tfe_registry_module`: Fix a bug that prevented users from being
able to create a registry module using a github app, by
[@&#8203;dsa0x](https://togithub.com/dsa0x)
(#&#8203[hashicorp/terraform-provider-tfe#935))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xNTkuNCIsInVwZGF0ZWRJblZlciI6IjM1LjE1OS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->

Co-authored-by: astronaut-panda[bot] <137164246+astronaut-panda[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants