Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[loadgen] Add benchmark for prod startup time #4820

Merged
merged 2 commits into from
Jul 19, 2021
Merged

Conversation

csweichel
Copy link
Contributor

@csweichel csweichel commented Jul 14, 2021

This PR adds a benchmark command to loadgen that can start multiple different workspaces against a workspace cluster.

Thanks @meysholdt for the pairing session and for producing the scenario file.

/werft no-preview

@csweichel csweichel requested review from meysholdt, a team and fntlnz and removed request for a team and fntlnz July 14, 2021 14:54
@aledbf
Copy link
Member

aledbf commented Jul 14, 2021

@csweichel this will run in a fresh cluster without any image? or we will pull before the benchmark? what are we testing?

@csweichel
Copy link
Contributor Author

For the sake of the comparisons we intent to run, running this against an empty cluster (i.e. fresh nodes that haven't seen an image pull, disabled ghosts), would make most sense IMHO.

Copy link
Contributor

@princerachit princerachit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the README.md with steps that are needed to run benchmark. If any prereq is needed.

dev/loadgen/prod-benchmark.yaml Show resolved Hide resolved
@princerachit
Copy link
Contributor

Can you update the README.md with steps that are needed to run benchmark. If any prereq is needed.

I tried running loadgen with workspace count set to 500 and it broke my cluster. We should have a NOTE in the readme about this.

@meysholdt
Copy link
Member

meysholdt commented Jul 15, 2021

Thx for this PR! It's great to have an easy way to benchmark clusters.

I tried it an see

./loadgen benchmark prod-benchmark.yaml  --tls ./wsman-tls/
INFO[0000] load worker started                           worker=0
INFO[0000] load worker started                           worker=4
INFO[0000] load worker started                           worker=3
INFO[0000] load worker started                           worker=1
INFO[0000] load worker started                           worker=2
spinning up: [███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████↙] 2 p/s 500/500

Now there is no terminal output for about 30min. Is the tool waiting for something?

Maybe related: Not all workspaces are in "RUNNING" yet due to scheduling issues.

@csweichel
Copy link
Contributor Author

The loadgen command does not terminate, but has to be stopped manually (SIGINT/Ctrl+C).
That's because there can be some "observer" registered which cannot signal when they're done.

Beware: loadgen is not a polished load testing solution, but rather a pragmatic tool to get the job done :) Some readme wouldn't hurt though.

@csweichel
Copy link
Contributor Author

@princerachit a readme would be handy but is outside the scope of this PR

@meysholdt
Copy link
Member

When running the benchmark, about half of the workspaces fail.

I suspect this error

cannot initialize workspace:
    github.com/gitpod-io/gitpod/content-service/pkg/initializer.InitializeWorkspace
        github.com/gitpod-io/gitpod/content-service@v0.0.0-00010101000000-000000000000/pkg/initializer/initializer.go:425
  - git initializer:
    github.com/gitpod-io/gitpod/content-service/pkg/initializer.(*GitInitializer).Run
        github.com/gitpod-io/gitpod/content-service@v0.0.0-00010101000000-000000000000/pkg/initializer/git.go:117
  - git checkout -B main origin/main failed (exit status 128): fatal: 'origin/main' is not a commit and a branch 'main' cannot be created from it

to be the cause and it looks like loadgen's template assumes that every repo has a "main" branch.

@csweichel
Copy link
Contributor Author

Good point. Let's add the branch to the scenario as well

@meysholdt
Copy link
Member

meysholdt commented Jul 15, 2021

for the record, this is the scenarios file I used for these benchmarks. I updated the IDE image and commented out the repos that don't have a main branch.

## start with
##    loadgen benchmark prod-benchmark.yaml

workspaces: 500
ideImage: eu.gcr.io/gitpod-core-dev/build/ide/code:commit-ff263e14024f00d0ed78386b4417dfa6bcd4ae2f
repos:
#  - cloneURL: https://github.com/gitpod-io/template-typescript-node/
    # image: gitpod/workspace-mongodb
#    workspaceImage: eu.gcr.io/gitpod-dev/workspace-images:53489ee25aa4d1797edd10485d8ecc2fc7a7456ae37718399a54efb498f7236f
  - cloneURL: https://github.com/gitpod-io/template-typescript-react
    # image: gitpod/workspace-full:latest
    workspaceImage: eu.gcr.io/gitpod-dev/workspace-images:63bf2cbae693a7ecf60a40fb1eadc90b83a99919a2a010019a336a81b0c54b84
  - cloneURL: https://github.com/gitpod-io/template-python-django/
    # image: gitpod/workspace-full:latest
    workspaceImage: eu.gcr.io/gitpod-dev/workspace-images:63bf2cbae693a7ecf60a40fb1eadc90b83a99919a2a010019a336a81b0c54b84
  - cloneURL: https://github.com/gitpod-io/template-python-flask
    # image: gitpod/workspace-full:latest
    workspaceImage: eu.gcr.io/gitpod-dev/workspace-images:63bf2cbae693a7ecf60a40fb1eadc90b83a99919a2a010019a336a81b0c54b84
#  - cloneURL: https://github.com/gitpod-io/spring-petclinic/
    # image: gitpod/workspace-full:latest
#    workspaceImage: eu.gcr.io/gitpod-dev/workspace-images:63bf2cbae693a7ecf60a40fb1eadc90b83a99919a2a010019a336a81b0c54b84
#  - cloneURL: https://github.com/gitpod-io/template-php-drupal-ddev
    # image: (dockerfile)
#    workspaceImage:
#  - cloneURL: https://github.com/gitpod-io/template-php-laravel-mysql
    # image: (dockerfile)
    #workspaceImage:
#  - cloneURL: https://github.com/gitpod-io/template-ruby-on-rails-postgres/
    # image: (dockerfile)
    #workspaceImage:
  - cloneURL: https://github.com/gitpod-io/template-golang-cli/
    # image: gitpod/workspace-full:latest
    workspaceImage: eu.gcr.io/gitpod-dev/workspace-images:63bf2cbae693a7ecf60a40fb1eadc90b83a99919a2a010019a336a81b0c54b84
#  - cloneURL: https://github.com/gitpod-io/template-rust-cli/
    # image: gitpod/workspace-full:latest
#    workspaceImage: eu.gcr.io/gitpod-dev/workspace-images:63bf2cbae693a7ecf60a40fb1eadc90b83a99919a2a010019a336a81b0c54b84
#  - cloneURL: https://github.com/gitpod-io/template-dotnet-core-cli-csharp/
    # image: gitpod/workspace-dotnet
#    workspaceImage: eu.gcr.io/gitpod-dev/workspace-images:0b62a3c575c8f00b9130f8a6572d9b4fd935e06de4498cef4ec2db8507cd6159
#  - cloneURL: https://github.com/gitpod-io/template-sveltejs/
    # image: gitpod/workspace-full:latest
#    workspaceImage: eu.gcr.io/gitpod-dev/workspace-images:63bf2cbae693a7ecf60a40fb1eadc90b83a99919a2a010019a336a81b0c54b84

@csweichel
Copy link
Contributor Author

I've updated the PR to support per-repo clone targets.

}

// benchmarkCommand represents the run command
var benchmarkCommand = &cobra.Command{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Too big. Can you split this to call smaller functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your level/criterion for an adequate function size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find if functions can be refactored then it should be refactored to small functions. I don't think there could be/should be strict checks.

The primary reason I pointed this here is:

  1. There are multiple variables being created in this function. The variables are complex and use other preceeding variables.
  2. Variable/(s) have nested function in their field definition -> Hard to read and understand what is going on
  3. The big constant string in the variable definition
  4. When I try to review it becomes hard for me because now in step X the abstraction is less. I have to review not just the variable creation but also how it is being created.

Does it make any sense? From a reviewer perspective I find it hard to review. The code is good other wise.

@princerachit
Copy link
Contributor

/lgtm

@roboquat
Copy link
Contributor

LGTM label has been added.

Git tree hash: e65b799fb628bbb334a63b82935a0dd2351902b9

@roboquat
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csweichel, princerachit

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@roboquat roboquat merged commit f5793c3 into main Jul 19, 2021
@roboquat roboquat deleted the cw/loadgen-benchmark branch July 19, 2021 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants