Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close connection and stop listening when port forwarding errors occur so that kubectl can exit #103526

Merged
merged 1 commit into from
Nov 17, 2021

Conversation

brianpursley
Copy link
Member

@brianpursley brianpursley commented Jul 6, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, if port forwarding and the connection is lost and re-established, port forwarding gets into an unrecoverable state where it logs errors but does not fail. When this happens port forwarding no longer is working, but there is no way to handle this case except to manually terminate kubectl.

This PR changes the port forwarding code to capture errors that are received on the error stream and surface them to be returned so that kubectl can exit with a non-zero exit code instead of being "hung" in an unrecoverable error state.

This PR changes the port forwarding code to close the remote connection and stop listening for new connections when an error occurs while port forwarding.

Which issue(s) this PR fixes:

Related to kubernetes/kubectl#686

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kubectl port-forward service will now properly exit when the attached pod dies

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 6, 2021
@brianpursley
Copy link
Member Author

/cc @deads2k you had some comments on a different, unmerged PR a few months ago for this issue.

@brianpursley brianpursley changed the title Surface port forwarding errors so that kubectl can exit with error Return port forwarding errors so that kubectl can exit with error Jul 7, 2021
@caesarxuchao
Copy link
Member

/assign
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 8, 2021
@caesarxuchao
Copy link
Member

This looks reasonable. Can you add a unit test?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 12, 2021
@brianpursley
Copy link
Member Author

This looks reasonable. Can you add a unit test?

Good idea. I added a couple unit tests.

@brianpursley
Copy link
Member Author

/retest

@brianpursley brianpursley changed the title Return port forwarding errors so that kubectl can exit with error [WIP] Return port forwarding errors so that kubectl can exit with error Jul 13, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2021
@brianpursley brianpursley force-pushed the kubectl-686 branch 4 times, most recently from 6e41199 to 9596c48 Compare July 15, 2021 19:16
@brianpursley brianpursley changed the title [WIP] Return port forwarding errors so that kubectl can exit with error Return port forwarding errors so that kubectl can exit with error Jul 15, 2021
@brianpursley
Copy link
Member Author

/retest

@@ -231,6 +233,8 @@ func (pf *PortForwarder) forward() error {
case <-pf.stopChan:
case <-pf.streamConn.CloseChan():
runtime.HandleError(errors.New("lost connection to pod"))
case err := <-pf.errorChan:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. in the first case, a stop was called. portforward should probably shut down its ports on this for the library (pr wanted), but the intent is likely "process shutting down" when we wrote it.
  2. in the second case, the streamconnection was lost
  3. in this new case, the local handleConnection will stop https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/portforward/portforward.go#L356, but the callsite will restart it: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/portforward/portforward.go#L302-L312

if I'm reading this code correctly, this would return an error back and start listening on the port again. the tests are for the handleConnection, but miss the waitForConnection which will restart it indefinitely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error actually gets returned all the way up to the kubectl command itself and stops execution.

Return from forward()

Return from ForwardPorts

Return from ForwardPorts

Return from RunPortForward

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that still races for whether the ForwardPort go route runs before the reconnect from waitForConnection.

I'm supportive of trying to return errors. I just want deterministic behavior when we make this change. Closing a stop channel (or even better a context) that is honored throughout the call chain is an approach that will make this deterministic and non-leaky.

@deads2k
Copy link
Contributor

deads2k commented Nov 16, 2021

/hold

holding until #103526 (comment) is resolved.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 16, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2021
@brianpursley brianpursley changed the title Return port forwarding errors so that kubectl can exit with error Close connection and stop listening when port forwarding errors occur so that kubectl can exit Nov 17, 2021
@brianpursley
Copy link
Member Author

/retest

Comment on lines +312 to +314
return
}
go pf.handleConnection(conn, port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-existng and not blocking for this PR, but this code is so strange. it seems ripe for cleanup to have fewer go funcs launch go funcs and infinite loops.

runtime.HandleError(fmt.Errorf("error accepting connection on port %d: %v", port.Local, err))
}
select {
case <-pf.streamConn.CloseChan():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good improvement.

@deads2k
Copy link
Contributor

deads2k commented Nov 17, 2021

I think that overall the existing code could benefit from simplification and context wiring, but this PR now looks predictable to me.

/approve
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 17, 2021
@deads2k
Copy link
Contributor

deads2k commented Nov 17, 2021

bug fix

/milestone v1.23

@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Nov 17, 2021
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2021
Copy link
Member

@eddiezane eddiezane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brianpursley, deads2k, eddiezane, justinmchase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants