Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add make gardener-debug as a skaffold-based debugging experience #7755

Merged
merged 4 commits into from
Apr 19, 2023

Conversation

oliver-goetz
Copy link
Member

@oliver-goetz oliver-goetz commented Apr 5, 2023

How to categorize this PR?

/area dev-productivity
/kind enhancement

What this PR does / why we need it:
Similar to the dev-flow of #7659, developers can now use make gardener-debug to start a skaffold-based debugging loop which allows remote debugging of Gardener Core pods using Delve.
Specific skaffold modules can be selected by setting SKAFFOLD_MODULE to reduce load on laptops.

Which issue(s) this PR fixes:
Part of #6016

Special notes for your reviewer:
skaffold debugging documentation

Release note:

Developers can now use `make gardener-debug` to start a skaffold-based debugging loop which allows remote debugging of Gardener Core pods using Delve. See the [documentation](https://github.com/gardener/gardener/blob/master/docs/deployment/getting_started_locally.md#debugging-gardener) for more details.

@gardener-prow gardener-prow bot added area/dev-productivity Developer productivity related (how to improve development) kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. labels Apr 5, 2023
@gardener-prow gardener-prow bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 5, 2023
@timuthy
Copy link
Member

timuthy commented Apr 6, 2023

/assign

@timebertt
Copy link
Member

/assign

Copy link
Member

@timebertt timebertt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I left some early feedback, but I need some more time to try it out

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
make gardener-debug
```

This is using skaffold debugging features. In the Gardener case, Go debugging using [Delve](https://github.com/go-delve/delve) is the most relevant use case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth mentioning that the Cloud Code integration features even deeper integration with IDEs than running make gardener-debug and attaching manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a google cloud specific feature, isn't it? That's at least what the docs say. We don't use Google Services in make gardener-up etc, so I would not mention it explicitly here. The information is just one click away anyway, because there is a link to https://skaffold.dev/docs/workflows/debug/

@istvanballok
Copy link
Contributor

👍
I also tried out debugging the gardenlet in a similar setup this week. I found one caveat which is worth sharing:

Note: Resuming or stopping only a single goroutine (Go Issue 25578, 31132) is currently not supported, so the action will cause all the goroutines to get activated or paused.
(vscode-go wiki)

This means that when a goroutine of gardenlet is paused on a breakpoint, all the other goroutines are paused. This is different to e.g. debugging the JVM where individual threads can be paused. Hence, when the whole gardenlet process is paused, it can not renew its lease and can not respond to the liveness and readiness probes. So when debugging the gardenlet, one should temporarily turn off the leader election e.g. by adding

  leaderElection:
    leaderElect: false

to example/gardener-local/gardenlet/values.yaml and disabling the various probes. I'm not sure what is the best way to achieve that - I interactively edited the gardenlet deployment and increased the failureThreshold property of the probes to a high value.

The VS Code extension https://marketplace.visualstudio.com/items?itemName=golang.Go is sufficient to attach to the debuggee. I used the following launch configuration with the remote-local-setup, where the caveat was that the path of the go files as the compiler saw them (the paths are added to the binary) need to be mapped to the paths in the workspace.
Without proper file path mapping, debugging does not work and the failure symptoms are not that conclusive. This should not be an issue in a regular local setup.

{
  // Use IntelliSense to learn about possible attributes.
  // Hover to view descriptions of existing attributes.
  // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Connect to gardenlet",
      "type": "go",
      "request": "attach",
      "mode": "remote",
      "debugAdapter": "dlv-dap",  // legacy would be the default in the remote attach case, so it should be set to dlv-dap, based on the vs code go doc
      "port": 56268,  // or another port that is port forwarded by skaffold to the 56268 port of the gardenlet process
      "host": "127.0.0.1",
      "substitutePath": [
        {
          "from": "${workspaceFolder}",
          "to": "/root/gardener"  // in the remote local setup the go compiler compiles the source code is in this folder
        }
      ]
    }
  ],
}

For the VS Code debugging features, I recommend reading https://github.com/golang/vscode-go/wiki/debugging.

cc @shafeeqes, thank you for the support!

@oliver-goetz
Copy link
Member Author

@istvanballok I added some of your caveats.
I did not face any of your issues, when I tested it. That's probably because skaffold increases timeoutSeconds of liveness and readiness probes to 600 and all our gardener-core components run with 1 replica only. This should be ok for most debugging sessions. I added that too.

@istvanballok
Copy link
Contributor

I did not face any of your issues, when I tested it.

Just for the record, the issues I faced are reproducible (and confirmed by @oliver-goetz) when stepping through a function during debugging. The liveness probes tend to fail well before reaching the 10 minute timeout with 500 HTTP status code or TCP connection errors and the kubelet restarts the gardenlet process after a few minutes of debugging in the default configuration. I think it is related to the fact that when the Go program is paused on a breakpoint, all the goroutines are paused. When it is resumed to proceed to the next line, all the go routines are resumed, and then paused again. This can then lead to issues on the other unrelated goroutines that are involved in the liveness probe handling. So if you face issues, it is recommended to disable the probes by editing the deployment that was created by skaffold debug, as it is described in the last sentence of the documentation in this PR.

@oliver-goetz
Copy link
Member Author

/cc @briantopping
any idea how to solve this?

@istvanballok
Copy link
Contributor

Note that during a debugging session with @rickardsjp we noticed that the gardenlet process panicked with leader election lost. So both caveats are needed for successful debugging (turn off leader election and disable the liveness and readiness probes), see #7755 (comment).
Otherwise, this is a very helpful feature, thank you!

/lgtm

One more trick: skaffold configures dlv to start the process and not wait for the debugger to be attached. This is fine for the regular case, but it is not configurable on the skaffold level. If we need to debug something on startup, we can remove the --continue flag from the gardenlet deployment temporarily. Then, gardenlet will wait for the remote debugger to be attached. See https://skaffold.dev/docs/workflows/debug/#why-arent-my-breakpoints-being-hit.

Note: it is clearly out of scope for this PR, but long term it would be great to also support debugging the components that are not deployed by skaffold, but rather by the gardenlet, e.g. the gardener-resource-manager.

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2023
@gardener-prow
Copy link
Contributor

gardener-prow bot commented Apr 14, 2023

LGTM label has been added.

Git tree hash: cf8051a039d9a88580fe9300e695c0f717d90291

@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2023
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2023
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2023
@briantopping
Copy link
Member

@oliver-goetz #7284 was the proposal I had to fix this three months ago. I'm not sure if it's related to how you guys solved it.

Copy link
Contributor

@istvanballok istvanballok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2023
@gardener-prow
Copy link
Contributor

gardener-prow bot commented Apr 17, 2023

LGTM label has been added.

Git tree hash: 3d9174a1fcd70f002e9d13bd4adcd1363b972067

@oliver-goetz
Copy link
Member Author

/milestone v1.69

@gardener-prow gardener-prow bot added this to the v1.69 milestone Apr 17, 2023
@oliver-goetz
Copy link
Member Author

/approve

@gardener-prow
Copy link
Contributor

gardener-prow bot commented Apr 19, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oliver-goetz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 19, 2023
@gardener-prow gardener-prow bot merged commit b976f78 into gardener:master Apr 19, 2023
@oliver-goetz oliver-goetz deleted the enh/enable-debugging branch April 25, 2023 08:25
andrerun pushed a commit to andrerun/gardener that referenced this pull request Jul 6, 2023
…ardener#7755)

* Introduce `*-debug` make targets

* Update documentation

* Address PR review feedback

* Address PR review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/dev-productivity Developer productivity related (how to improve development) cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants