New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test TestAdvertisedReferencesNotExists #969

Closed
smola opened this Issue Sep 27, 2018 · 19 comments

Comments

Projects
None yet
2 participants
@smola
Member

smola commented Sep 27, 2018

This test is flaky, it fails randomly and quite often:

FAIL: <autogenerated>:1: UploadPackSuite.TestAdvertisedReferencesNotExists
/home/travis/gopath/src/gopkg.in/src-d/go-git.v4/plumbing/transport/test/upload_pack.go:55:
    c.Assert(err, Equals, transport.ErrRepositoryNotFound)
... obtained *errors.errorString = &errors.errorString{s:"unexpected EOF"} ("unexpected EOF")
... expected *errors.errorString = &errors.errorString{s:"repository not found"} ("repository not found")
... Difference:
...     s: "unexpected EOF" != "repository not found"

https://travis-ci.org/src-d/go-git/jobs/433952695#L1615

@smola smola added the bug label Sep 27, 2018

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 26, 2018

@smola I'd like to dig in here and try to figure out why this test is flaky. Can I claim this one?

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 26, 2018

In the travis details that you linked, it looks like this test isn't even being run.

=== RUN   Test
OK: 36 passed, 1 skipped
--- PASS: Test (0.37s)
PASS
coverage: 73.5% of statements
ok  	gopkg.in/src-d/go-git.v4/plumbing/transport/ssh	0.376s	coverage: 73.5% of statements
?   	gopkg.in/src-d/go-git.v4/plumbing/transport/test	[no test files]
?   	gopkg.in/src-d/go-git.v4/storage	[no test files]

I tried the following:

cd plumbing/transport/test

Then,

$ go test -check.f UploadPackSuite
?   	gopkg.in/src-d/go-git.v4/plumbing/transport/test	[no test files]

Can you point me in the correct direction?

@smola

This comment has been minimized.

Member

smola commented Oct 26, 2018

@mccurdyc Of course, feel free to dig in (good luck!). The issue is actually in the ssh package tests, which uses transport/test.

The link I originally posted was probably to a job that was later manually re-triggered. Here's one failing example:

----------------------------------------------------------------------
FAIL: <autogenerated>:1: UploadPackSuite.TestAdvertisedReferencesNotExists
/home/travis/gopath/src/gopkg.in/src-d/go-git.v4/plumbing/transport/test/upload_pack.go:55:
    c.Assert(err, Equals, transport.ErrRepositoryNotFound)
... obtained *errors.errorString = &errors.errorString{s:"unexpected EOF"} ("unexpected EOF")
... expected *errors.errorString = &errors.errorString{s:"repository not found"} ("repository not found")
... Difference:
...     s: "unexpected EOF" != "repository not found"
OOPS: 35 passed, 1 skipped, 1 FAILED
--- FAIL: Test (0.58s)
FAIL
coverage: 73.5% of statements
FAIL	gopkg.in/src-d/go-git.v4/plumbing/transport/ssh	0.589s

I don't know anything else, but feel free to ask any specific question as you progress.

@smola smola added the ssh label Oct 26, 2018

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 26, 2018

@smola (or has anyone else) been able to get this to fail on their local machine or does it seem to only happen on the travis server?

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 26, 2018

@smola I have gotten it to fail locally.

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 26, 2018

my thought is that it is resource-related (probably CPU as my memory didn't spike, but my CPU did) because I can quickly get it to fail when running 4 instances of the following script

#!/usr/bin/env bash
for i in {1..5000}
do
  echo $i

  mpstat -P ALL &>> out.txt
  go test -v &>> out.txt

  if grep -q 'FAIL' out.txt
  then
    break
  fi
done
@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 26, 2018

previously, I was able to get it to fail after about 350 iterations with a single instance of the script running, so I will try to collect more info during a single iteration

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 27, 2018

Obviously still trying to figure out why, but the firstErrLine channel is sending an empty object when it fails:

A normal run (the expected output):

=== RUN   Test
OK: 36 passed, 1 skipped
--- PASS: Test (0.42s)
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/transport/ssh	0.440s
s.firstErrLine: fatal: '//tmp/go-git-ssh-36699748929944/non-existent.git' does not appear to be a git repository OK: true
s.firstErrLine: fatal: '//tmp/go-git-ssh-36699748929944/non-existent.git' does not appear to be a git repository OK: true
OK: 36 passed, 1 skipped
--- PASS: Test (0.41s)
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/transport/ssh	0.415s

The FAIL, unexpected output:

=== RUN   Test
s.firstErrLine: fatal: '//tmp/go-git-ssh-42559873101469/non-existent.git' does not appear to be a git repository OK: true
s.firstErrLine:  OK: false

----------------------------------------------------------------------
FAIL: <autogenerated>:1: UploadPackSuite.TestAdvertisedReferencesNotExists

/home/mccurdyc/go/src/gopkg.in/src-d/go-git.v4/plumbing/transport/test/upload_pack.go:55:
    c.Assert(err, Equals, transport.ErrRepositoryNotFound)
... obtained *errors.errorString = &errors.errorString{s:"unexpected EOF"} ("unexpected EOF")
... expected *errors.errorString = &errors.errorString{s:"repository not found"} ("repository not found")
... Difference:
...     s: "unexpected EOF" != "repository not found"


OOPS: 35 passed, 1 skipped, 1 FAILED
--- FAIL: Test (0.56s)
FAIL
exit status 1
FAIL	gopkg.in/src-d/go-git.v4/plumbing/transport/ssh	0.565s
@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 29, 2018

After digging in a bit more, I actually think it is something about https://github.com/src-d/go-git/blob/master/plumbing/transport/internal/common/common.go#L145 not being safe.

It may be related to reading a sub-processes' Stderr in a concurrent fashion.

@smola

This comment has been minimized.

Member

smola commented Oct 30, 2018

@mccurdyc There's no problem about reading stderr from a goroutine, but I think you got a bit closer.

We are reaching this point when we do not get any input from server's stdout. See:

func (s *session) handleAdvRefDecodeError(err error) error {
// If repository is not found, we get empty stdout and server writes an
// error to stderr.
if err == packp.ErrEmptyInput {
s.finished = true
if err := s.checkNotFoundError(); err != nil {
return err
}

But then on the goroutine where we read the error, we find that stderr is already closed:

func (c *client) listenFirstError(r io.Reader) chan string {
if r == nil {
return nil
}
errLine := make(chan string, 1)
go func() {
s := bufio.NewScanner(r)
if s.Scan() {
errLine <- s.Text()
} else {
close(errLine)
}

This code (package transport/internal/common) is reused for all protocols that we implement, but it's only failing for SSH, so it is probably be related to something in transport/ssh package or its tests.

I suspect it is related to the way we set up the testing SSH server (which uses the git-upload-pack binary under the hood):

if err := cmd.Start(); err != nil {
fmt.Println(err)
return
}
go func() {
defer stdin.Close()
io.Copy(stdin, s)
}()
go func() {
defer stderr.Close()
io.Copy(s.Stderr(), stderr)
}()
defer stdout.Close()
io.Copy(s, stdout)
if err := cmd.Wait(); err != nil {
return
}

I would say that it is one of these:

  • We are closing stdout from git-upload-pack before starting to read stderr.
  • Or, more likely, we close SSH session from the server before writing stderr from git-upload-pack to it.

Or something similar.

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 31, 2018

@smola this is super helpful! I will continue looking

@smola

This comment has been minimized.

Member

smola commented Oct 31, 2018

Also defer stderr.Close() and defer stdout.Close() are probably not needed at all, closing stdin is good enough.

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 31, 2018

@smola

defer func() { c.Assert(r.Close(), IsNil) }()
is where we are deferring the ssh session close, correct?

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 31, 2018

Also defer stderr.Close() and defer stdout.Close() are probably not needed at all, closing stdin is good enough.

Removing defer stderr.Close() and defer stdout.Close() seems to block the tests. Admittedly, I haven't spent time looking into why.

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 31, 2018

Also defer stderr.Close() and defer stdout.Close() are probably not needed at all, closing stdin is good enough.

Removing defer stderr.Close() and defer stdout.Close() seems to block the tests. Admittedly, I haven't spent time looking into why.

Sorry, it was because I removed the goroutine wrapping the Copy

@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 31, 2018

Why is it that stdout doesn't need to wrapped in a goroutine like stdin and stderr?

go func() {
defer stdin.Close()
io.Copy(stdin, s)
}()
go func() {
defer stderr.Close()
io.Copy(s.Stderr(), stderr)
}()
defer stdout.Close()
io.Copy(s, stdout)

@smola

This comment has been minimized.

Member

smola commented Oct 31, 2018

@mccurdyc I'm not really sure, one of these:

  • stdout gets closed by the executed command itself (git-upload-pack) when it finishes. But on the other hand, cmd.Wait() should wait until that happens anyway.
  • s gets closed when the client closes the ssh session from its side.
@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Oct 31, 2018

@smola while trying to read up a bit on how to idiomatically pipe output, I did come across golang/go#5582 and which would align with why it is only for the ssh package. But, golang/go#19685 also looks relevant.

mccurdyc added a commit to mccurdyc/go-git that referenced this issue Nov 1, 2018

plumbing: ssh, Fix flaky test TestAdvertisedReferencesNotExists. Fixes
…src-d#969

Signed-off-by: Colton McCurdy <mccurdyc22@gmail.com>
@mccurdyc

This comment has been minimized.

Contributor

mccurdyc commented Nov 1, 2018

@smola I opened a PR (#1013) that I believe fixes this issue.

mcuadros added a commit that referenced this issue Nov 5, 2018

Merge pull request #1013 from mccurdyc/mccurdyc/Issue#969/fix-flaky-s…
…sh-test

 plumbing: ssh, Fix flaky test TestAdvertisedReferencesNotExists. Fixes #969
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment