Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch-based Planner for TF-controller #527

Closed
6 tasks
chanwit opened this issue Feb 17, 2023 · 21 comments
Closed
6 tasks

Branch-based Planner for TF-controller #527

chanwit opened this issue Feb 17, 2023 · 21 comments
Assignees

Comments

@chanwit
Copy link
Collaborator

chanwit commented Feb 17, 2023

Overview

The Branch-based Planner (aka Terraform plan result in PR) is a feature of the Terraform Controller that enables users to establish a link between a Pull Request and a Terraform CR (Customer Resource) object, which also referred to as a Terraform Plan Object. This feature allows users to create a new branch, which is then used to create a temporary GitRepository object. The Terraform Plan Object is derived from the original Terraform CR object and is pointed to the same Git repository, but on a new branch instead of the main branch of the repository.

The system allows users to make changes to the branch in a "plan-only" mode, displaying the plan information as a comment. The new plan is displayed as another comment when changes are made to the branch. Users can also trigger a replan by commenting under the pull request.

Once the user is satisfied with the changes made to the branch, they can merge it to the main branch, after which the new branch's Terraform plan object and GitRepository can be deleted.

The Branch-based Planner system can be extended to support other Git hosting platforms like GitLab, Bitbucket, or Azure DevOps by using their respective webhook mechanisms. The core idea of the Branch-based Planner system remains the same, but the webhook integration would need to be adapted to work with the desired platform. But GitHub is the only platform we would support for the first MVP.

Goals

  • To allow users to make changes to the branch in a "plan-only" mode and display plan information as comments.
  • To allow users to trigger replans through comments.
  • To promote collaboration and communication between team members.

Objectives

Objective 1

Allow users to make changes to the branch in a "plan-only" mode and display plan information as comments.

Use case

Assuming a scenario where a developer needs to make changes to the infrastructure managed by Terraform, the use case for this objective is to enable the developer to create a new branch, make changes to the configuration files, and generate a plan for the changes made in the new branch. The resulting plan information should be displayed as comments in the corresponding pull request, allowing the developer to review the changes and collaborate with other team members.

Objective 2

Allow users to trigger replans through comments.

Use Case

Assuming a scenario where a developer needs to make further changes to the infrastructure managed by Terraform after a pull request is created, the use case for this objective is to enable the developer to trigger a replan by posting specific comments within the corresponding pull request. The updated plan information should be automatically generated and displayed as comments in the pull request.

Components

  • Git Repository - This is the Git repository that contains the Terraform configuration files for the infrastructure. Please note that it's not a Source controller's GitRepository.
  • GitHub Webhooks - This is a feature in GitHub that allows the repository to send notifications to external services (in this case, the Branch-based Planner system) whenever certain events happen in the repository, such as the creation of a pull request. The Branch-based Planner is the GitHub webhook receiver.
  • Branch-based Planner System - This is the software system responsible for automatically creating a new Terraform plan object for the changes made in the new branch and generating new plan information when changes are made to the Terraform configuration files in the new branch. This system also allows users to trigger replans and view plan information in the pull request comments. The system runs inside Kubernetes alongside the Terraform Controller.
  • Terraform CR Object - This is a Kubernetes object that defines the infrastructure to be created, managed, and destroyed by the Terraform Controller. The Branch-based Planner system uses this object to create a new Terraform Plan Object for the changes made in the new branch. This object is of type infra.contrib.fluxcd.io/v1alpha2.Terraform.
  • Terraform Plan Object - This is a copy of the Terraform CR object, but the Terraform Plan Object points to a planning branch instead of the main branch. The Branch-based Planner system automatically creates a new Terraform plan object for the changes made in the new branch. It generates new plan information when changes are made to the Terraform configuration files in the new branch. This object is of type infra.contrib.fluxcd.io/v1alpha2.Terraform.

Scenarios

Scenario 1

  1. The user creates a new branch in the repository that contains Terraform files.
  2. The user makes changes to the Terraform files, commits the changes, and pushes the new branch to GitHub.
  3. The user creates a pull request for the new branch to merge it into the main branch.
  4. GitHub sends a notification of the new pull request to the Branch-based Planner system.
  5. The Branch-based Planner system automatically creates a new Terraform plan object derived from the original Terraform CR object for the new branch.
  6. The Branch-based Planner system points the new Terraform plan object to the GitRepository of the new branch instead of the main repository.
  7. The Branch-based Planner system generates a new plan for the changes made in the new branch and displays the resulting plan information as comments in the corresponding pull request.
  8. The Branch-based Planner system automatically updates the plan information when the user makes a change to the Terraform configuration files in the new branch and displays the resulting plan information as comments in the corresponding pull request.
  9. Suppose the user triggers a replan by posting a specific comment within the corresponding pull request. In that case, the Branch-based Planner system will automatically generate an updated plan for the changes made in the new branch and display the resulting plan information as comments in the corresponding pull request.
  10. When the user merges the new branch into the main branch, GitHub sends a notification to the Branch-based Planner system.
  11. The Branch-based Planner system deletes the Terraform plan object and GitRepository of the new branch.
  12. The Branch-based Planner system records the pull request in the Terraform CR object to keep track of the changes made to the infrastructure.
  13. The Branch-based Planner system closes the pull request and updates the Terraform CR object to reflect the changes made to the infrastructure.

User Stories

  • As a developer, I want to be able to create a new branch and make changes to the Terraform configuration files in a "plan-only" mode so that I can review the plan information before merging the changes into the main branch. Addressed by [tracker] Start plan-only mode for Terraform CR when creating a new branch on GitHub #576.
  • As a developer, I want to be able to trigger a replan by posting specific comments within the corresponding pull request so that I can update the plan information when further changes are made to the branch.
  • As a team member, I want to be able to review and comment on the plan information displayed in the pull request so that I can collaborate with other team members and ensure that the infrastructure changes are as intended.
  • As a manager, I want to be able to track the status of infrastructure changes and review the pull requests created for those changes so that I can ensure that the changes are reviewed and approved before being merged into the main branch.
  • As a developer, I want to be able to display the plan information in a user-friendly format so that I can easily understand the plan and identify any issues.
  • As a user, I want to be able to manage Terraform plans more efficiently and effectively using the Terraform Controller so that I can focus on other important tasks.
@chanwit chanwit self-assigned this Feb 17, 2023
@chanwit chanwit added the Epic label Feb 17, 2023
@JamWils
Copy link

JamWils commented Feb 28, 2023

@chanwit I'll review this in a couple days, moved it to the top of the new column on our board.

@chanwit
Copy link
Collaborator Author

chanwit commented Mar 28, 2023

ping @JamWils

@JamWils
Copy link

JamWils commented Mar 28, 2023

@yiannistri, can you review this and give your input?

@yiannistri
Copy link
Contributor

The proposal looks reasonable to me. Some clarifying questions:

  • Will this only support GitHub due to the use of webhooks?
  • Could you provide an example of a Terraform CR object and a Terraform Plan object and explain the differences, if any? For example, how does one use main and the other one uses a branch? And who's responsible for creating them?
  • If main changes while a branch is in progress, how is this reflected in the current branch?

@chanwit
Copy link
Collaborator Author

chanwit commented Mar 30, 2023

Thank you @yiannistri all are very good questions!

  1. The Branch-based Planner system can be extended to support other Git hosting platforms like GitLab, Bitbucket, or Azure DevOps by using their respective webhook mechanisms. The core idea of the Branch-based Planner system remains the same, but the webhook integration would need to be adapted to work with the desired platform. But GitHub is the only platform we would support for the first MVP.

  2. They are essentially the same object. The Terraform Plan object is just a copy of its original Terraform CR. With the current Flux Source + TF controller architecture, we have to clone the Terraform CR and its source, change branch of the new source to point to a new planning branch. Then point the copy of Terraform CR (aka Terraform Plan object) to the new source. The Branch-based Planner system is responsible for cloning them when it receives a webhook notification about a new pull request (where we receive the information about the new branch)

If main changes while a branch is in progress, how is this reflected in the current branch?

  1. That's an undefined behavior. It's the user responsible to rebase the planning branch onto of the new main branch, just like what we did in Git.

@chanwit
Copy link
Collaborator Author

chanwit commented Mar 30, 2023

I'll reflect your questions back into the proposal. Thank you again @yiannistri

@JamWils
Copy link

JamWils commented Apr 6, 2023

@foot and @bigkevmcd is there an opportunity to leverage GitOps Sets here? Maybe I am wrong, but I guess the missing piece outside of that is to be able to post back to the github issue in question.

@yitsushi
Copy link
Collaborator

It sounds reasonable, and I think it's a good feature.

Few notes I would expect as a user:

  1. I can define a branch prefix, if a branch does not match with the pattern, I don't expect plan runs. If a repo has a lot of documentation changes, or application code lives in the same repo, I don't want to waste resources to create a plan for nothing. I can say "if you change Terraform, prefix the branch with tf-". Other option is to scope the planner, if no known Terraform file (known file extensions) changed, no run. That would be the smart way.
  2. Adding a new comment can lead to chaos, iterative changes on a branch can end up 10-20 plans and that would be 10-20 comments. With small plans it's not a big deal, but if the plan output is like 80 lines, that's a lot to scroll down and check right comment for the final state. I have no solution/suggestion on this. Maybe update old comments with <details></details> to collapse long and old content.
  3. As a user I would like to see GitOps resource names/references in the comment. For example on the plan comment, I would like to see what is the namespace/name of the GitRepository object, so I can debug without spending too much time figuring out which GitRepository object is used when I have 60 GitRepository objects.

Questions:

  1. I didn't use the TF controller before. Where are input variables coming from? On main it can be a fix external source, a config-map, or a secret, but what happens if I add new input variable to the Terraform code and they are not in the "main variables" list?
  2. Does the Branch-based Planner system know anything about users on github and groups? Can anyone comment on the PR and trigger re-plan, or just a subset of users (maintainers, the one created the pr)?

@chanwit
Copy link
Collaborator Author

chanwit commented Apr 19, 2023

Thank you @yitsushi

I didn't use the TF controller before. Where are input variables coming from? On main it can be a fix external source, a config-map, or a secret, but what happens if I add new input variable to the Terraform code and they are not in the "main variables" list?

Input variables could be defined in many ways,

  • embed directly in a Terraform CR
  • refer as a Secret

but what happens if I add new input variable to the Terraform code and they are not in the "main variables" list?

It's following the behavior you would expect from running Terraform binary. If the input has the default value, it goes with that value. If not, and mandatory, the Terraform binary would tell the runner to exit with an error, and the reconciliation would retry.

@chanwit
Copy link
Collaborator Author

chanwit commented Apr 19, 2023

Does the Branch-based Planner system know anything about users on github and groups? Can anyone comment on the PR and trigger re-plan, or just a subset of users (maintainers, the one created the pr)?

This behavior has not been defined yet. We could refine them together.

@chanwit
Copy link
Collaborator Author

chanwit commented Apr 26, 2023

A draft of the component diagram:

graph LR
    GH[GitHub PR] <-- retrieve --> PRP
    subgraph "Planner System"
        PRP[PR Poller]
        MC[Informer]
    end
    MC -- talk to --> K8s[K8s API Server]
Loading

@LappleApple
Copy link
Contributor

LappleApple commented May 12, 2023

Hi all, some questions from a newcomer here. If we've answered these questions elsewhere, let's sync quickly offline (ie don't feel obligated to rehash material)

  • How many users to we expect will be impacted by this change? Do we have a way of measuring that?
  • What is the impact we want to make with this change? Ie as a result of this work, what will be a positive outcome/next step we want users to take?
  • Why is it important for us to do this now/in near future in relation to other priorities?
  • What is the risk/opportunity cost of not doing this?
  • Can we assess the required effort for this work and how it will impact other WIP?

@LappleApple
Copy link
Contributor

LappleApple commented May 12, 2023

FWIW I quickly ran the questions above past @yiannistri before posting

@LappleApple
Copy link
Contributor

LappleApple commented Jun 8, 2023

@yitsushi reviewed the user stories posted above to see what's done, and noted that they all roughly suggest the same thing. As the user stories are all describing a full workflow, they'll all be done at once when the MVP is ready to release.

@squaremo
Copy link
Contributor

Input variables could be defined in many ways,

[...]

* refer as a Secret

If my branch has a change to use a secret for inputs, how does the secret get into the cluster?

@chanwit
Copy link
Collaborator Author

chanwit commented Jun 20, 2023

If my branch has a change to use a secret for inputs, how does the secret get into the cluster?

That secret must be available in the cluster first as a pre-condition before the planning process kicks off.

@chanwit
Copy link
Collaborator Author

chanwit commented Jun 20, 2023

Updated workflow as we chose to implement the polling mechanism instead of webhooks.

  1. The poller reads information from Pull Requests (PRs).
  2. Using the PR information, the poller creates a Terraform Custom Resource (CR) with planOnly=true and sets labels. When new Terraform resources are created, they are done so with planOnly=true, which means they are not terraform apply, only planned.
  3. The poller also creates A GitRepository object, which points to the PR branch.
  4. The poller ensures that every PR of interest is associated with a GitRepository and Terraform object and triggers "replans" when necessary.
  5. The informer is responsible for watching a set of Terraform CRs with the labels set by the poller.
  6. When there's a new commit in the PR branch, Flux's source controller automatically detects the change and updates the source.
  7. In the case of specific comments like !restart or !replan, the poller initiates a "force restart" of the plan. This triggers the "replan" process.
  8. The informer is also responsible for relaying the Terraform plan back to GitHub.
  9. Once a PR is merged, the poller deletes the Terraform resource and GitRepository.
  • Existing Terraform resources managed by Git remain untouched.
  • All of the above communication happens via the Kubernetes API.

@chanwit
Copy link
Collaborator Author

chanwit commented Jun 20, 2023

sequenceDiagram
    participant P as Poller
    participant GH as GitHub PR
    participant I as Informer    
    participant T as Terraform CR
    participant GR as GitRepository
    participant SC as Source Controller
    P->>GH: Read PR information
    P->>T: Create Terraform CR (planOnly=true)
    P->>GR: Create GitRepository (points to PR branch)
    P->>I: Associate PR with GitRepository and Terraform CR
    P->>GH: Read comments (!restart or !replan)
    P->>T: Initiate "force restart" of the plan
    SC-->>T: Detects changes, trigger plan
    I->>T: Watch for plan generation
    I-->>GH: Show Terraform plan as comment
    P->>GH: Read merged status
    P->>T: Deletes Terraform resource
    P->>GR: Deletes GitRepository
Loading

@bigkevmcd
Copy link
Contributor

If this was me, I'd design to work on either polled comments or webhook notifications.

The simplest to get going in many enterprises will be polled, but, there definitely advantages to hook-driven (fewer API calls, faster turnaround).

@squaremo
Copy link
Contributor

If this was me, I'd design to work on either polled comments or webhook notifications.

Right, it shouldn't be either/or. Design for adding webhooks later.

@chanwit
Copy link
Collaborator Author

chanwit commented Jun 20, 2023

Yep, the first couple of MVPs will go with polling. We'll take a look at the webhook approach again after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants