[ws-deployment] Prelim checkin to create a workspace cluster using cli #6338

princerachit · 2021-10-21T10:05:25Z

Description

Related Issue(s)

How to test

Release Notes

Automated workspace deployment framework and design proposal and prelim checkin for workspace cluster creation

Documentation

princerachit · 2021-10-21T10:09:28Z

The execution is failing with error:

│ Error: Failed to get existing workspaces: querying Cloud Storage failed: googleapi: Error 403: gitpod-nodes-workload@gitpod-io-dev.iam.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket., forbidden
│

I am fixing this

princerachit · 2021-10-21T11:52:46Z

The PR does its job. I could trigger a ws cluster creation from my local env which eventually failed due to helm provider failing to install cert manager:


Invalid attribute in provider configuration

  with provider["registry.terraform.io/hashicorp/kubernetes"].gitpod-io-dev,
  on providers.tf line 48, in provider "kubernetes":
  48: provider "kubernetes" {

'host' is not a valid URL

╷
│ Error: failed post-install: timed out waiting for the condition
│ 
│   with module.gitpod-workspaces.module.certmanager.helm_release.certmanager[0],
│   on ../../../../terraform/modules/certmanager/main.tf line 51, in resource "helm_release" "certmanager":
│   51: resource "helm_release" "certmanager" {
│ 
╵
{"level":"info","message":"Error creating cluster eu181: exit status 1","serviceContext":{"service":"ws-deployment","version":""},"severity":"INFO","time":"2021-10-21T11:37:57Z"}

Cluster was created:

I am updating the ops repo branch with some tf module changes. Once they are done, this CLI would be fully operational.

operations/workspace/deployment/example-config.yaml

csweichel · 2021-10-22T06:47:53Z

operations/workspace/deployment/pkg/config/v1/config.go

+	// MetaClusters is optional as we may not want to register the cluster
+	MetaClusters      []*common.MetaCluster      `yaml:"metaClusters"`
+	WorkspaceClusters []*common.WorkspaceCluster `yaml:"workspaceClusters"`
+	// TODO(princerachit): Add gitpod version here when we decide to use installed instead of relying solely on ops repository


We should not rely on the ops repo at all, but use the Gitpod version/version manifest/installer image directly

I agree. Since these changes are only for cluster creation purpose and not gitpod installation, I have not added it here yet. It will be part of next set of changes.

operations/workspace/deployment/pkg/runner/shellcommand.go

operations/workspace/deployment/pkg/step/createcluster.go

csweichel · 2021-10-22T07:06:20Z

operations/workspace/deployment/pkg/step/createcluster.go

+	commandToRun := fmt.Sprintf("cd %s && terraform init && terraform apply -auto-approve", tfModulesDir)
+	err := runner.ShellRunWithDefaultConfig("/bin/sh", []string{"-c", commandToRun})


At this point I think we might want to introduce a bit of execution infrastructure akin to leeway (executeCommandForPackage and how it's used). There we first build up a set of commands and then execute them one after the other. Benefits:

the code is easy to read because you can write the entire recipe in one function without needing to deal with the intricacies of things going wrong (mostly you're just building string arrays)

adding a debug log that prints the commands you execute is easy

if all modifications happen as part of this execution mechanism adding a dry run is easy enough

I think we should introduce this but not at this point in time. These are improvements that we should be iterating on. I don't want this PR to become one big feature branch before we merge it.

I would rather have working code and then stitch the missing pieces later.

I agree - we don't want to blow this up. Chances are we'll be running a lot of code execution though. Hence we could try and follow a pattern where the individual steps produce a list of commands which get executed at the end.

We don't need fancy reporter infrastructure, just something like:

func createCluster() error { var commands [][]string commands = append(commands, []string{DefaultTFModuleGeneratorScriptPath, "-l", cfg.Region, "-n", cfg.Name}) commands = append(commands, []string{"terraform", "init"}) commands = append(commands, []string{"terraform", "apply", "--approve"}) return runCommands(wd, commands) }

That's very similar to what you've started with the "runner"

I don't want to make this change. It feels going back to the discussion of introducing structures.

Once we have at least the gitpod installation step working (which will be a good way of knowing what is going to be repeated and the patterns we discover) I would be happy to introduce these structures.

Do you see this as a blocker?

Do you see this as a blocker?

No

It feels going back to the discussion of introducing structures.

Not sure - we should not trade on the extreme ends of this spectrum. That said, I'm also fine with going ahead as things are and harmonising afterwards.

csweichel · 2021-10-22T07:07:49Z

For future notice: prior to marking something as ready for approval, please check that the commits could land in main like this. Right now it's 60 commits that clearly need some squashing/authoring :)

codecov · 2021-10-22T10:41:10Z

Codecov Report

Merging #6338 (a6102bc) into main (a996c98) will decrease coverage by 3.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #6338      +/-   ##
==========================================
- Coverage   19.04%   16.00%   -3.05%     
==========================================
  Files           2       13      +11     
  Lines         168     1406    +1238     
==========================================
+ Hits           32      225     +193     
- Misses        134     1162    +1028     
- Partials        2       19      +17

Flag	Coverage Δ
components-blobserve-app	`30.43% <ø> (?)`
components-blobserve-lib	`31.34% <ø> (?)`
components-local-app-app-linux-amd64	`?`
components-local-app-app-linux-arm64	`?`
components-local-app-app-windows-386	`?`
components-local-app-app-windows-amd64	`?`
components-local-app-app-windows-arm64	`?`
dev-loadgen-app	`∅ <ø> (?)`
dev-poolkeeper-app	`∅ <ø> (?)`
installer-app	`6.08% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
components/local-app/pkg/auth/pkce.go
components/local-app/pkg/auth/auth.go
installer/pkg/components/ws-manager/deployment.go	`0.00% <0.00%> (ø)`
installer/pkg/components/ws-manager/role.go	`0.00% <0.00%> (ø)`
installer/pkg/components/ws-manager/configmap.go	`23.45% <0.00%> (ø)`
installer/pkg/components/ws-manager/rolebinding.go	`0.00% <0.00%> (ø)`
installer/pkg/common/common.go	`1.34% <0.00%> (ø)`
installer/pkg/common/objects.go	`0.00% <0.00%> (ø)`
components/blobserve/pkg/blobserve/refstore.go	`64.82% <0.00%> (ø)`
installer/pkg/components/ws-manager/tlssecret.go	`0.00% <0.00%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a996c98...a6102bc. Read the comment docs.

csweichel · 2021-10-22T12:08:31Z

operations/workspace/deployment/pkg/orchestrate/deploy.go

+)
+
+func Deploy(context *common.ProjectContext, clusters []*common.WorkspaceCluster) error {
+	var wg sync.WaitGroup


for future reference (not in this PR): an errgroup might be better suited here to actually fail when the creation of a cluster fails

roboquat · 2021-10-22T12:08:50Z

LGTM label has been added.

Git tree hash: 66a8d733374fb2d2e4dc4f7084626f7e20bfe0d6

roboquat · 2021-10-22T12:08:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csweichel

Associated issue: #6331

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [csweichel]
~~components/blobserve/OWNERS~~ [csweichel]
~~dev/gpctl/OWNERS~~ [csweichel]
~~dev/loadgen/OWNERS~~ [csweichel]
~~installer/OWNERS~~ [csweichel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

roboquat added do-not-merge/work-in-progress do-not-merge/release-note-label-needed labels Oct 21, 2021

roboquat requested review from aledbf, csweichel, jankeromnes and mrsimonemms October 21, 2021 10:05

roboquat added team: workspace Issue belongs to the Workspace team size/XXL labels Oct 21, 2021

roboquat added release-note and removed do-not-merge/release-note-label-needed labels Oct 21, 2021

princerachit changed the title ~~Prs/add build cluster step~~ [ws-deployment] Prelim checkin to create a workspace cluster using cli Oct 21, 2021

princerachit marked this pull request as ready for review October 21, 2021 11:38

roboquat removed the do-not-merge/work-in-progress label Oct 21, 2021

csweichel reviewed Oct 22, 2021

View reviewed changes

[ws-deployment] Prelim checking for ws-deployment

a6102bc

princerachit force-pushed the prs/add-build-cluster-step branch from e1c4e10 to a6102bc Compare October 22, 2021 10:35

csweichel approved these changes Oct 22, 2021

View reviewed changes

roboquat assigned csweichel Oct 22, 2021

roboquat added the lgtm label Oct 22, 2021

roboquat added the approved label Oct 22, 2021

roboquat merged commit 560f35b into main Oct 22, 2021

roboquat deleted the prs/add-build-cluster-step branch October 22, 2021 12:09

roboquat added deployed: workspace Workspace team change is running in production deployed Change is completely running in production labels Oct 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ws-deployment] Prelim checkin to create a workspace cluster using cli #6338

[ws-deployment] Prelim checkin to create a workspace cluster using cli #6338

princerachit commented Oct 21, 2021 •

edited

Loading

princerachit commented Oct 21, 2021

princerachit commented Oct 21, 2021 •

edited by werft-gitpod-dev-com bot

Loading

csweichel Oct 22, 2021

princerachit Oct 22, 2021

csweichel Oct 22, 2021

princerachit Oct 22, 2021

csweichel Oct 22, 2021

princerachit Oct 22, 2021

csweichel Oct 22, 2021

csweichel commented Oct 22, 2021

codecov bot commented Oct 22, 2021 •

edited

Loading

csweichel Oct 22, 2021

roboquat commented Oct 22, 2021

roboquat commented Oct 22, 2021

		commandToRun := fmt.Sprintf("cd %s && terraform init && terraform apply -auto-approve", tfModulesDir)
		err := runner.ShellRunWithDefaultConfig("/bin/sh", []string{"-c", commandToRun})

[ws-deployment] Prelim checkin to create a workspace cluster using cli #6338

[ws-deployment] Prelim checkin to create a workspace cluster using cli #6338

Conversation

princerachit commented Oct 21, 2021 • edited Loading

Description

Related Issue(s)

How to test

Release Notes

Documentation

princerachit commented Oct 21, 2021

princerachit commented Oct 21, 2021 • edited by werft-gitpod-dev-com bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csweichel commented Oct 22, 2021

codecov bot commented Oct 22, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

roboquat commented Oct 22, 2021

roboquat commented Oct 22, 2021

princerachit commented Oct 21, 2021 •

edited

Loading

princerachit commented Oct 21, 2021 •

edited by werft-gitpod-dev-com bot

Loading

codecov bot commented Oct 22, 2021 •

edited

Loading