Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add go tool and makefile goal to debug flaky tests #4633

Merged
merged 4 commits into from
Sep 10, 2018

Conversation

freddygv
Copy link
Contributor

@freddygv freddygv commented Sep 5, 2018

This PR adds a Go CLI tool that allows us to reproduce flaky tests inside of a Docker container. Single or package-wide tests are run for multiple iterations with a configurable amount of CPU resources.

There is also an associated goal in the makefile called test-flake which builds the CLI then runs it with the ARGS passed.

Basics:

  • A test binary is created for the target package. (At most one package can be tested at a time)
  • A script will loop over go test until the max number of iterations is reached or one of the tests fails.
  • The full Consul logs are dumped to test.log which is in the directory of the package being tested.

Options:

  -pkg=""             Target package
  -test=""            Target test (requires pkg flag)
  -cpus=0.15          Amount of CPU resources for container
  -n=30               Number of times to run tests

Usage examples:

Run all tests for a package:

make test-flake ARGS="-pkg connect/proxy"

Run tests matching a pattern:

make test-flake ARGS="-pkg connect/proxy -test Listener"

Run a single test 100 times with 0.5 CPUs:

make test-flake ARGS="-pkg connect/proxy -test TestUpstreamListener -n 100 -cpus 0.5"

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super cool @freddygv great job.

It would be awesome to get any extra context/tips for how this has been most useful for you and Siva in finding and fixing issues.

Not sure if that belongs in the README here or somewhere else.

I get that in general being able to reproduce more gives better chance to debug, but any practical tips like:

  • how do you debug/instrument the consul test binary inside the docker container to try figure out why it's flaky?
  • does it work for API test where we run a full external consul binary process? (I guess not for now)

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As my other comment says this would be better to be done in bash rather than writing Go source to compile which does littler more than run shell commands as subprocesses.

Secondly most of this should fit into the existing structure.

build-support/flake-repro/Dockerfile really belongs in build-support/docker/Flake-Test.dockerfile (or pick a different name)

build-support/flake-repro/main.go should be build-support/scripts/flake-test.sh

build-support/flake-repro/test.sh should probably go in build-support/docker/flake-test.sh or something similarly named.

Also all the logic regarding running a flake repro test could be put into the build-support/functions/*.sh files and then have the top level script just source all of those, process arguments and invoke the correct functions.

GNUmakefile Outdated
@ if [ -z $(ARGS) ] ; then \
echo ""; \
echo "error: ARGS required for flake-repro"; \
echo "ex: make test-flake ARGS=\"-pkg api\""; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expose any flags as separate make vars so you could do something like this:

make test-flake FLAKE_PKG=api FLAKE_TEST="TestFlaky" FLAKE_...

It makes it easier to run when you don't have to deal with escaping quotes in your shell arguments.

@@ -0,0 +1,180 @@
package main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really should be a bash script and not Go source as all that is done is to shell out and run commands. It would be much easier to follow (and probably less code) if it were done in a shell scripting language like bash and would be consistent with the other build scripts. If you want some pointers on this ping me.

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions/comments

echo -e "Iterations:\t$4"
echo

echo "-----> Cleaning up old containers..."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal at all but you could use the status or status_stage functions available and it will colorize the output for you.

It could help to distinguish this scripts output from a subcommands output.

-e APP="$APP" \
$IMAGE

docker rm -f $CONTAINER > /dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use --rm to docker run instead of another invocation.

env GOOS=$GOOS GOARCH=$GOARCH go test -c "./$1" -o $TEST_BINARY

echo "-> Running container..."
docker run \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the exit status of this if there is a test failure? Can it be used as the return code of the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit status is 1 when there's a failure, will also print the name of the test and the assertion output


USER travis

COPY test.sh /usr/local/bin/test.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be COPY flake.sh ...

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@freddygv freddygv merged commit b2032b0 into hashicorp:master Sep 10, 2018
@freddygv freddygv deleted the flake-repro-env branch September 10, 2018 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants