Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestHttpInsecure timeout on the openbsd-amd64 builder #328

Closed
mvdan opened this issue Feb 20, 2018 · 20 comments
Closed

TestHttpInsecure timeout on the openbsd-amd64 builder #328

mvdan opened this issue Feb 20, 2018 · 20 comments
Assignees

Comments

@mvdan
Copy link
Contributor

mvdan commented Feb 20, 2018

Full log: https://storage.googleapis.com/go-build-log/198cc863/openbsd-amd64-62_382c8558.log

Interesting part:

Generating report in /tmp/profile_proto127733951
Generating report in /tmp/profile_output976672530
Generating report in /tmp/profile_proto864760137
Generating report in /tmp/profile_output749897236
Generating report in /tmp/profile_proto674757731
Generating report in /tmp/profile_output998323814
Generating report in /tmp/profile_proto228678285
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output259635848
Generating report in /tmp/profile_proto174538311
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output131752698
Generating report in /tmp/profile_proto182942225
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output948220476
Generating report in /tmp/profile_proto644947563
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output820104910
Generating report in /tmp/profile_proto172666325
Generating report in /tmp/profile_output310512944
Generating report in /tmp/profile_proto416149711
Generating report in /tmp/profile_output955108834
Generating report in /tmp/profile_proto134901721
Generating report in /tmp/profile_output655382884
Generating report in /tmp/profile_proto990546803
Hide expression matched no samples
Generating report in /tmp/profile_output065374262
Generating report in /tmp/profile_proto189982237
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output344876760
Generating report in /tmp/profile_proto098495063
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output391055306
Generating report in /tmp/profile_proto873545889
Generating report in /tmp/profile_output950568844
Generating report in /tmp/profile_proto350487931
Generating report in /tmp/profile_output148036254
Generating report in /tmp/profile_proto238922597
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output361437056
Generating report in /tmp/profile_proto859589343
Generating report in /tmp/profile_output948475570
Generating report in /tmp/profile_proto888026217
Generating report in /tmp/profile_output462067892
Generating report in /tmp/profile_proto216871555
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output432729094
Generating report in /tmp/profile_proto753219501
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output419030824
Generating report in /tmp/profile_proto161255527
Ignoring local file /path/to/buildid-contention: build-id mismatch (buildid-contention != abcdef)
Generating report in /tmp/profile_output262854298
Generating report in /tmp/profile_proto245839153
Ignoring local file /path/to/buildid-contention: build-id mismatch (buildid-contention != abcdef)
Generating report in /tmp/profile_output420925148
Generating report in /tmp/profile_proto344449675
Generating report in /tmp/profile_output334905966
Generating report in /tmp/profile_proto050744565
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output992652240
Generating report in /tmp/profile_proto759083759
Generating report in /tmp/profile_output162414466
Generating report in /tmp/profile_proto365944057
TagHide expression matched no samples
Generating report in /tmp/profile_output107096068
Generating report in /tmp/profile_proto478871443
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output781034966
Generating report in /tmp/profile_proto208459069
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output901856120
Generating report in /tmp/profile_proto196185719
Generating report in /tmp/profile_output094373226
Generating report in /tmp/profile_proto848061889
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output292484140
Generating report in /tmp/profile_proto123168155
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output518198334
TagShow expression matched no samples
Generating report in /tmp/profile_proto943217797
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
TagShow expression matched no samples
Generating report in /tmp/profile_output876305952
Generating report in /tmp/profile_proto694926591
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
Generating report in /tmp/profile_output105629778
Generating report in /tmp/profile_proto552467849
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
tagfocus:Interpreted '1mb:2gb' as range, not regexp
Generating report in /tmp/profile_output486548308
Generating report in /tmp/profile_proto752542883
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
tagfocus:Interpreted '30kb:' as range, not regexp
tagignore:Interpreted '1mb:2mb' as range, not regexp
TagHide expression matched no samples
Generating report in /tmp/profile_output913352614
Generating report in /tmp/profile_proto318557389
Generating report in /tmp/profile_output348401096
Generating report in /tmp/profile_proto892583047
Generating report in /tmp/profile_output133244986
Generating report in /tmp/profile_proto556095569
Generating report in /tmp/profile_output653629820
Generating report in /tmp/profile_proto463020715
Ignoring local file /path/to/buildid: build-id mismatch (buildid != abcdef)
tagfocus:Interpreted '400kb:' as range, not regexp
Generating report in /tmp/profile_output997609486
Local symbolization failed for cppbench_server_main: stat cppbench_server_main: no such file or directory
Local symbolization failed for libpthread-2.15.so: stat /libpthread-2.15.so: no such file or directory
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
Local symbolization failed for cppbench_server_main: stat cppbench_server_main: no such file or directory
Local symbolization failed for libpthread-2.15.so: stat /libpthread-2.15.so: no such file or directory
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
Local symbolization failed for cppbench_server_main: stat cppbench_server_main: no such file or directory
Local symbolization failed for libpthread-2.15.so: stat /libpthread-2.15.so: no such file or directory
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
--- FAIL: TestHttpsInsecure (15.03s)
	proftest.go:114: https+insecure://127.0.0.1:28700/debug/pprof/profile: http fetch: Get https://127.0.0.1:28700/debug/pprof/profile?seconds=10: net/http: timeout awaiting response headers
	fetch_test.go:426: failed to fetch any source profiles
FAIL
FAIL	cmd/vendor/github.com/google/pprof/internal/driver	17.284s

This is with the vendored version in the Go tree, which is 0e0e5b7.

Also, the tests are very noisy. Is this intended? You can see that the rest of the standard library tests are all quiet by default, as they should be. Perhaps these logs should be hidden behind testing.Verbose().

@mvdan mvdan changed the title TestHttpInsecure failure on the openbsd-amd64 builder TestHttpInsecure timeout on the openbsd-amd64 builder Feb 20, 2018
@mvdan
Copy link
Contributor Author

mvdan commented Feb 20, 2018

/cc @nolanmar511, since this seems somewhat similar to #300.

@mvdan
Copy link
Contributor Author

mvdan commented Feb 20, 2018

Also note that this looks like a test flake - if it were failing every single time, the vendored version wouldn't have been updated.

@nolanmar511
Copy link
Contributor

nolanmar511 commented Feb 28, 2018

Working on reproducing.
Currently getting different test failures (which seem to be related to the same issue) when running the tests on a freebsd GCE instance.

--- FAIL: TestHttpsInsecure (0.01s)
        fetch_test.go:386: net.Listen: got error listen tcp: lookup localhos
t on 169.254.169.254:53: no such host, want no error
--- FAIL: TestGetHostAndPort (0.00s)
        webui_test.go:258: could not get host and port for ":": could not ge
nerate random port: listen tcp: lookup localhost on 169.254.169.254:53: no s
uch host
        webui_test.go:261: for :, got host , want localhost
FAIL

@aalexand
Copy link
Collaborator

@nolanmar Looks like it tries to resolve localhost via DNS. Apparently there isn't localhost in /etc/hosts. What's your /etc/hosts?

@nolanmar511
Copy link
Contributor

This is the only entry in /etc/hosts

169.254.169.254 metadata.google.internal metadata

@nolanmar511
Copy link
Contributor

Thanks -- was able to modify /etc/hosts to get test to pass. Currently running the test to try to reproduce the flake.

@nolanmar511
Copy link
Contributor

Haven't been able to reproduce on a freebsd GCE instance. Test is now passing.

@nolanmar511
Copy link
Contributor

I think that maybe if I shorten the duration of the profile collected, then this might be more reproducible.
Also want to try testing on openBSD.

@mvdan -- Any idea how often this fails?

@mvdan
Copy link
Contributor Author

mvdan commented Mar 7, 2018

It doesn't look like it happens often. For example, look at the openbsd columns in https://build.golang.org/ - they're almost entirely green. I went through the most recent four pages of the build history and found no other occurrences.

Noone raised an issue about this flake on the Go issue tracker either, so it's unlikely that many others ran into the same flake. Although it would be quite the coincidence if I was the only one to run into the flake.

@aalexand
Copy link
Collaborator

@nolanmar511 Here is another pprof test failure I just spotted: https://build.golang.org/log/227d9194d392913ea91f96c9085cad59a7274ec7

@nolanmar511
Copy link
Contributor

The other pprof test failure has this error message:

--- FAIL: TestHttpsInsecure (15.14s)
	fetch_test.go:449: fetchProfiles(https+insecure://127.0.0.1:34665/debug/pprof/profile) got non-symbolized profile: len(p.Function)==0

So TestHttpsInsecure may be flaky in multiple ways.

@mvdan
Copy link
Contributor Author

mvdan commented Mar 15, 2018

@nolanmar511
Copy link
Contributor

I have been able to run this test on gomote, and ran the test about 200 times, and did not see this failure.
I'm still trying to reproduce on gomote.

@mvdan
Copy link
Contributor Author

mvdan commented Mar 31, 2018

@nolanmar511 you might be interested in @josharian's findings here: golang/go#24611

I also suggest that you use the stress tool. Given that the builders run a ton of tests, they tend to find more flakes than if you run one test many times in sequence. Usually, using stress to run the test many more times in parallel with itself gives better results. You're more likely to find flakes, especially because the resources it uses (such as network) are under heavier load.

@aalexand
Copy link
Collaborator

I reproduced a variant of this issue with a command like

taskset -c 0 stress -p 16 ./driver.test -test.v

on a x86_64 linux machine. The error I got is

=== RUN   TestHttpsInsecure
--- FAIL: TestHttpsInsecure (15.19s)
        fetch_test.go:451: fetchProfiles(https+insecure://127.0.0.1:41215/debug/pprof/profile) got non-symbolized profile: len(p.Function)==0

which I think we saw before. I was first confused here about why running this test on a stressed core would produce unsymbolized profile, but then realized that what really happens is that we get a profile with no samples at all - apparently when the system is stressed enough it may happen so that the code in the test that is supposed to generate some CPU usage doesn't get a chance to execute at all. I verified it by making a local change like

diff --git a/internal/driver/fetch_test.go b/internal/driver/fetch_test.go
index 0e6c930..c44b810 100644
--- a/internal/driver/fetch_test.go
+++ b/internal/driver/fetch_test.go
@@ -434,8 +434,8 @@ func TestHttpsInsecure(t *testing.T) {
        if err != nil {
                t.Fatal(err)
        }
-       if len(p.SampleType) == 0 {
-               t.Fatalf("fetchProfiles(%s) got empty profile: len(p.SampleType)==0", address)
+       if len(p.Sample) == 0 {
+               t.Fatalf("fetchProfiles(%s) got no samples: len(p.Sample)==0", address)
        }
        switch runtime.GOOS {
        case "plan9":

and running same stress command again saw expected error

=== RUN   TestHttpsInsecure
--- FAIL: TestHttpsInsecure (15.30s)
        fetch_test.go:438: fetchProfiles(https+insecure://127.0.0.1:41761/debug/pprof/profile) got no samples: len(p.Sample)==0

Anyway, what I think I am going to do is switch this test to use a different profile type, like goroutine. Trying to test against the CPU profile type has been too much trouble so far, and this test is really just an attempt to test end-to-end an https fetch in pprof so it's not that important which profile type we use.

aalexand added a commit to aalexand/pprof that referenced this issue Mar 31, 2018
Trying to test against /debug/pprof/profile has been giving some flakes
and trouble so far (google#146, google#253, google#328, golang/go#24611, golang/go#22594). While
some of the discussions the failures triggered appear to be useful (such
as whether CPU profiles are expected to be working on Windows XP or
not), that kind of testing is not really in the scope for this
particular test. This change switches the test to use goroutine profile
instead which is hopefully much less dependent on the environment. The
change also makes the test much faster to run.
@aalexand
Copy link
Collaborator

I think #350 will fix the issue.

@mvdan
Copy link
Contributor Author

mvdan commented Mar 31, 2018

Feel free to close this issue as fixed if you're confident you fixed it, but I'd wait for Josh to clarify what error he got on the other issue (before marking that one as closed too). I just imagined they are duplicates, but I have no confirmation.

@josharian
Copy link
Contributor

Sorry, AFK for a while. Yes, all my failures were of the “no samples” variety. Could that PR also now safely reduce the sample collection time from 10s to something much smaller? Thanks for working on this.

@aalexand
Copy link
Collaborator

aalexand commented Apr 2, 2018

@josharian PR #350 switches the test to use goroutine profile instead of CPU so 10s thing is not relevant now (goroutine profile is instant).

nolanmar511 pushed a commit that referenced this issue Apr 2, 2018
Trying to test against /debug/pprof/profile has been giving some flakes
and trouble so far (#146, #253, #328, golang/go#24611, golang/go#22594). While
some of the discussions the failures triggered appear to be useful (such
as whether CPU profiles are expected to be working on Windows XP or
not), that kind of testing is not really in the scope for this
particular test. This change switches the test to use goroutine profile
instead which is hopefully much less dependent on the environment. The
change also makes the test much faster to run.
@aalexand
Copy link
Collaborator

aalexand commented Apr 3, 2018

I am fairly certain this is fixed.

@aalexand aalexand closed this as completed Apr 3, 2018
gmarin13 pushed a commit to gmarin13/pprof that referenced this issue Dec 17, 2020
Trying to test against /debug/pprof/profile has been giving some flakes
and trouble so far (google#146, google#253, google#328, golang/go#24611, golang/go#22594). While
some of the discussions the failures triggered appear to be useful (such
as whether CPU profiles are expected to be working on Windows XP or
not), that kind of testing is not really in the scope for this
particular test. This change switches the test to use goroutine profile
instead which is hopefully much less dependent on the environment. The
change also makes the test much faster to run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants