New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues in SPDY RoundTripper #109651
Fix issues in SPDY RoundTripper #109651
Conversation
Please note that we're already in Test Freeze for the |
netD := s.Dialer | ||
if netD == nil { | ||
netD = &net.Dialer{ | ||
Timeout: 30 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches timeout used on line 222: https://github.com/kubernetes/kubernetes/pull/109651/files#diff-be99fb028cbdec52a7ee54d6b9a97c60a76f288124640e5f41a6aa96807fb8bbL222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not set a timeout here. The socks proxy dialer is ~new, so copying its timeout to this old code path is not compelling... since we're now plumbing the context, inheriting any timeout associated with it seems sufficient to me.
staging/src/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go
Outdated
Show resolved
Hide resolved
if err != nil { | ||
return nil, err | ||
} | ||
defer func() { | ||
if !success { | ||
c.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection is now always closed on error. It was leaked before (on error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure is leaked, it looks to me that it will be closed later
// Dial implements k8s.io/apimachinery/pkg/util/net.Dialer.
func (s *SpdyRoundTripper) Dial(req *http.Request) (net.Conn, error) {
conn, err := s.dial(req)
if err != nil {
return nil, err
}
if err := req.Write(conn); err != nil {
conn.Close()
return nil, err
}
return conn, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it will not be - s.dial(req)
returns an error and function returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yeah, you are right, do you mind if we leave the connection leaking for other PR?
it seems a whole topic in its own and I can't see tests for those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I still think the conn will not leak
func (s *SpdyRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
header := utilnet.CloneHeader(req.Header)
header.Add(httpstream.HeaderConnection, httpstream.HeaderUpgrade)
header.Add(httpstream.HeaderUpgrade, HeaderSpdy31)
var (
conn net.Conn
rawResponse []byte
err error
)
clone := utilnet.CloneRequest(req)
clone.Header = header
conn, err = s.Dial(clone)
if err != nil {
return nil, err
}
is not that it runs in a goroutine, once the RoundTrip exits it will be destroyed, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A connection is a kind of object that needs to get it's lifetime managed by the program. GC will collect the object, but I don't know if there is a finalizer to clean up the unreleased underlying OS resources. It's poor practice to rely on finalizers anyway. I don't understand what is your concern. Why not close the connection if it's not needed (because an error is returned)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because this code is too old that there are a lot of assumptions that we don't know and we may break something without noticing, I'm sure you are right, but what I'm not sure is about the consequences of the changes, that is why I'm trying to be conservative ... is not the first time that for fixing something a bug arise in other place of the stack, and then is very hard to solve these bugs ... we can open a bug for the conns leaks and follow up later, I think that is not blocking your problem, and I can work on the test if you want
staging/src/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go
Show resolved
Hide resolved
too many different changes in a single commit? |
@aojea They are all one the same lines of code, so it was easier to just clean it all up in a single pass. Need to fix tests... |
I personally think that each item should be an independent commit because it is easier to backport the changes or revert if required. Also, I think that the new feature and the bugfix should have unit tests |
/assign @aojea |
fb08dc0
to
34b7240
Compare
@aojea I've split it into 3 commits, should be cleaner now. PTAL. |
34b7240
to
d925c9c
Compare
staging/src/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper_test.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go
Outdated
Show resolved
Hide resolved
LGTM if we remove the "connection leaking" code so we can tackle that problem in a different PR verifying the leaks with tests, :) |
I'm sorry but I'm not going to come back to this. The only reason I'm looking at this code is because I want to fix #89163 and I was learning how things work currently. I don't want to polish this code to look perfectly, I want to make it obsolete and eventually delete it. It's clear that there is a leak and to test that I'd need to inject a stub dialer (to return a mock connection and then check that Close() has been called). Honestly, I don't think it's worth the effort. Maybe you have a better idea of how to test it? |
@ash2k I will work on the tests, I think that your commits add value, give me some days to come back to you Regarding websockets, this comment will be interesting for you #7452 (comment) |
👍 |
the tls.Dial already performs the handshake and closes the underline connection if it fails https://cs.opensource.google/go/go/+/refs/tags/go1.18.1:src/crypto/tls/tls.go;l=156-161;drc=860704317e02d699e4e4a24103853c4782d746c1 rawConn, err := netDialer.DialContext(ctx, network, addr)
if err != nil {
return nil, err
}
colonPos := strings.LastIndex(addr, ":")
if colonPos == -1 {
colonPos = len(addr)
}
hostname := addr[:colonPos]
if config == nil {
config = defaultConfig()
}
// If no ServerName is set, infer the ServerName
// from the hostname we're connecting to.
if config.ServerName == "" {
// Make a copy to avoid polluting argument or default.
c := config.Clone()
c.ServerName = hostname
config = c
}
conn := Client(rawConn, config)
if err := conn.HandshakeContext(ctx); err != nil {
rawConn.Close()
return nil, err
}
return conn, nil Another thing is that the code that verify the hostname seems not needed anymore, https://groups.google.com/g/golang-nuts/c/4vnt7NdLvVU/m/b1SJ4u0ikb0J You can play with the different combinations using https://gist.github.com/aojea/4e172d7f3835cd3ffc6ea4f83eea6608 I've filed a new PR for that adding you as co-author #109750 |
I'll rebased once your PR is merged. Cheers. |
CloneRequest() clones headers too
@aojea Rebased, PTAL. /remove-kind bug |
staging/src/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go
Show resolved
Hide resolved
/retest /Lgtm For approval Thanks |
require.NoError(t, err) | ||
spdyTransport := NewRoundTripper(&tls.Config{}) | ||
_, err = spdyTransport.Dial(req) | ||
assert.EqualError(t, err, "dial tcp 127.0.0.1:1233: operation was canceled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to check the type of this error rather than the string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dialer returns a *net.OpError
but it's a generic thing. Within it there is an Err
field which is the original error. In Go 1.18.x it's NOT possible to check it, but in the yet unreleased 1.19 it will be possible via errors.Is()
. See golang/go#51428 (commit golang/go@85b5f86). In any case, I think it's not that important as it's a test only. String comparison is probably good enough for a test.
staging/src/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper_test.go
Show resolved
Hide resolved
netD := s.Dialer | ||
if netD == nil { | ||
netD = &net.Dialer{ | ||
Timeout: 30 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not set a timeout here. The socks proxy dialer is ~new, so copying its timeout to this old code path is not compelling... since we're now plumbing the context, inheriting any timeout associated with it seems sufficient to me.
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ash2k, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
/sig api-machinery
What this PR does / why we need it:
Does this PR introduce a user-facing change?