Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ws-deployment] Prelim checkin to create a workspace cluster using cli #6338

Merged
merged 1 commit into from
Oct 22, 2021

Conversation

princerachit
Copy link
Contributor

@princerachit princerachit commented Oct 21, 2021

Description

image

Related Issue(s)

Fixes #6331

How to test

Release Notes

Automated workspace deployment framework and design proposal and prelim checkin for workspace cluster creation

Documentation

@princerachit
Copy link
Contributor Author

The execution is failing with error:

│ Error: Failed to get existing workspaces: querying Cloud Storage failed: googleapi: Error 403: gitpod-nodes-workload@gitpod-io-dev.iam.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket., forbidden
│ 

I am fixing this

@princerachit princerachit changed the title Prs/add build cluster step [ws-deployment] Prelim checkin to create a workspace cluster using cli Oct 21, 2021
@princerachit princerachit marked this pull request as ready for review October 21, 2021 11:38
@princerachit
Copy link
Contributor Author

princerachit commented Oct 21, 2021

The PR does its job. I could trigger a ws cluster creation from my local env which eventually failed due to helm provider failing to install cert manager:


Invalid attribute in provider configuration

  with provider["registry.terraform.io/hashicorp/kubernetes"].gitpod-io-dev,
  on providers.tf line 48, in provider "kubernetes":
  48: provider "kubernetes" {

'host' is not a valid URL

╷
│ Error: failed post-install: timed out waiting for the condition
│ 
│   with module.gitpod-workspaces.module.certmanager.helm_release.certmanager[0],
│   on ../../../../terraform/modules/certmanager/main.tf line 51, in resource "helm_release" "certmanager":
│   51: resource "helm_release" "certmanager" {
│ 
╵
{"level":"info","message":"Error creating cluster eu181: exit status 1","serviceContext":{"service":"ws-deployment","version":""},"severity":"INFO","time":"2021-10-21T11:37:57Z"}

Cluster was created:
image

I am updating the ops repo branch with some tf module changes. Once they are done, this CLI would be fully operational.

operations/workspace/deployment/example-config.yaml Outdated Show resolved Hide resolved
// MetaClusters is optional as we may not want to register the cluster
MetaClusters []*common.MetaCluster `yaml:"metaClusters"`
WorkspaceClusters []*common.WorkspaceCluster `yaml:"workspaceClusters"`
// TODO(princerachit): Add gitpod version here when we decide to use installed instead of relying solely on ops repository
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not rely on the ops repo at all, but use the Gitpod version/version manifest/installer image directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Since these changes are only for cluster creation purpose and not gitpod installation, I have not added it here yet. It will be part of next set of changes.

operations/workspace/deployment/pkg/runner/shellcommand.go Outdated Show resolved Hide resolved
operations/workspace/deployment/pkg/step/createcluster.go Outdated Show resolved Hide resolved
operations/workspace/deployment/pkg/step/createcluster.go Outdated Show resolved Hide resolved
operations/workspace/deployment/pkg/step/createcluster.go Outdated Show resolved Hide resolved
Comment on lines +82 to +80
commandToRun := fmt.Sprintf("cd %s && terraform init && terraform apply -auto-approve", tfModulesDir)
err := runner.ShellRunWithDefaultConfig("/bin/sh", []string{"-c", commandToRun})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I think we might want to introduce a bit of execution infrastructure akin to leeway (executeCommandForPackage and how it's used). There we first build up a set of commands and then execute them one after the other. Benefits:

  • the code is easy to read because you can write the entire recipe in one function without needing to deal with the intricacies of things going wrong (mostly you're just building string arrays)
  • adding a debug log that prints the commands you execute is easy
  • if all modifications happen as part of this execution mechanism adding a dry run is easy enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should introduce this but not at this point in time. These are improvements that we should be iterating on. I don't want this PR to become one big feature branch before we merge it.

I would rather have working code and then stitch the missing pieces later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - we don't want to blow this up. Chances are we'll be running a lot of code execution though. Hence we could try and follow a pattern where the individual steps produce a list of commands which get executed at the end.

We don't need fancy reporter infrastructure, just something like:

func createCluster() error {
     var commands [][]string
    commands = append(commands, []string{DefaultTFModuleGeneratorScriptPath, "-l", cfg.Region, "-n", cfg.Name})
    commands = append(commands, []string{"terraform", "init"})
    commands = append(commands, []string{"terraform", "apply", "--approve"})
    return runCommands(wd, commands)
}

That's very similar to what you've started with the "runner"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to make this change. It feels going back to the discussion of introducing structures.

Once we have at least the gitpod installation step working (which will be a good way of knowing what is going to be repeated and the patterns we discover) I would be happy to introduce these structures.

Do you see this as a blocker?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see this as a blocker?

No

It feels going back to the discussion of introducing structures.

Not sure - we should not trade on the extreme ends of this spectrum. That said, I'm also fine with going ahead as things are and harmonising afterwards.

@csweichel
Copy link
Contributor

For future notice: prior to marking something as ready for approval, please check that the commits could land in main like this. Right now it's 60 commits that clearly need some squashing/authoring :)

@codecov
Copy link

codecov bot commented Oct 22, 2021

Codecov Report

Merging #6338 (a6102bc) into main (a996c98) will decrease coverage by 3.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6338      +/-   ##
==========================================
- Coverage   19.04%   16.00%   -3.05%     
==========================================
  Files           2       13      +11     
  Lines         168     1406    +1238     
==========================================
+ Hits           32      225     +193     
- Misses        134     1162    +1028     
- Partials        2       19      +17     
Flag Coverage Δ
components-blobserve-app 30.43% <ø> (?)
components-blobserve-lib 31.34% <ø> (?)
components-local-app-app-linux-amd64 ?
components-local-app-app-linux-arm64 ?
components-local-app-app-windows-386 ?
components-local-app-app-windows-amd64 ?
components-local-app-app-windows-arm64 ?
dev-loadgen-app ∅ <ø> (?)
dev-poolkeeper-app ∅ <ø> (?)
installer-app 6.08% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
components/local-app/pkg/auth/pkce.go
components/local-app/pkg/auth/auth.go
installer/pkg/components/ws-manager/deployment.go 0.00% <0.00%> (ø)
installer/pkg/components/ws-manager/role.go 0.00% <0.00%> (ø)
installer/pkg/components/ws-manager/configmap.go 23.45% <0.00%> (ø)
installer/pkg/components/ws-manager/rolebinding.go 0.00% <0.00%> (ø)
installer/pkg/common/common.go 1.34% <0.00%> (ø)
installer/pkg/common/objects.go 0.00% <0.00%> (ø)
components/blobserve/pkg/blobserve/refstore.go 64.82% <0.00%> (ø)
installer/pkg/components/ws-manager/tlssecret.go 0.00% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a996c98...a6102bc. Read the comment docs.

)

func Deploy(context *common.ProjectContext, clusters []*common.WorkspaceCluster) error {
var wg sync.WaitGroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future reference (not in this PR): an errgroup might be better suited here to actually fail when the creation of a cluster fails

@roboquat
Copy link
Contributor

LGTM label has been added.

Git tree hash: 66a8d733374fb2d2e4dc4f7084626f7e20bfe0d6

@roboquat
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csweichel

Associated issue: #6331

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@roboquat roboquat merged commit 560f35b into main Oct 22, 2021
@roboquat roboquat deleted the prs/add-build-cluster-step branch October 22, 2021 12:09
@roboquat roboquat added deployed: workspace Workspace team change is running in production deployed Change is completely running in production labels Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved deployed: workspace Workspace team change is running in production deployed Change is completely running in production release-note size/XXL team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ws-deployment] Make workspace cluster creation automated
3 participants