Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run majority of unit tests in CI with V2 Pytest runner #7724

Merged
merged 18 commits into from May 23, 2019

Conversation

Projects
None yet
4 participants
@Eric-Arellano
Copy link
Contributor

commented May 14, 2019

Problem

Beyond wanting to move everything towards V2 in general, this change will allow us to start remoting our unit tests for greater parallelism and less dependance on Travis.

Solution

We must make several changes to land this:

Result

CI is overall less useful. The unit tests now take 50 minutes to run, rather than 20. Likewise, the logging is less useful because the use of travis_wait means that it is not logged until the entire process completes, and currently we log stdout and stderr for successful tests, so there is more noise.

However, these costs are deemed worth it to allow us to soon use remoting.

@Eric-Arellano Eric-Arellano force-pushed the Eric-Arellano:unit-tests-use-v2 branch 4 times, most recently from 3e72aab to 359cc48 May 14, 2019

@Eric-Arellano Eric-Arellano force-pushed the Eric-Arellano:unit-tests-use-v2 branch from 359cc48 to 512833a May 20, 2019

@Eric-Arellano Eric-Arellano changed the title WIP: Run unit tests in CI with V2 Pytest runner Run unit tests in CI with V2 Pytest runner May 20, 2019

Show resolved Hide resolved build-support/bin/ci.sh Outdated
Show resolved Hide resolved build-support/bin/ci.sh Outdated

@Eric-Arellano Eric-Arellano changed the title Run unit tests in CI with V2 Pytest runner Run majority of unit tests in CI with V2 Pytest runner May 20, 2019

Eric-Arellano added some commits May 21, 2019

Bump travis_wait to 50 minutes
Turns out that travis_wait does not actually read the input to check if the process is still running. Instead, it says "you have up to x minutes for this process to complete, otherwise it will time out." It took 41 minutes for the Py2 unit tests to pass last time, so we set it to 50 minutes to avoid flakiness.
@stuhood
Copy link
Member

left a comment

Let's do it.

@illicitonion

This comment has been minimized.

Copy link
Contributor

commented May 21, 2019

Do you know the cause of the 250% perf regression? How much do we think is down to each of:

  • Resolving more (once-per-target instead of once per pants invocation)
  • Copying files into tempdirs more
  • Overhead of starting more python processes
  • Overhead of starting more pytest runs
    ?
@Eric-Arellano

This comment has been minimized.

Copy link
Contributor Author

commented May 21, 2019

Do you know the cause of the 250% perf regression? How much do we think is down to each of:

I haven't quantitatively confirmed anything, but my intuition from working with the V2 test runner the past few weeks is that it's the result of resolving requirements for each individual test, as opposed to not having to resolve any requirements at all because we just use what's already in the venv.

I'm not sure how we could solve this...copying files from the venv into the subprocess sounds horribly error prone when we introduce remoting. Precomputing a Pex with every single permutation of dependency combinations would likely take too much memory, and it's not clear how we could get that cache up to Travis.

@illicitonion

This comment has been minimized.

Copy link
Contributor

commented May 21, 2019

Do you know the cause of the 250% perf regression? How much do we think is down to each of:

I haven't quantitatively confirmed anything, but my intuition from working with the V2 test runner the past few weeks is that it's the result of resolving requirements for each individual test, as opposed to not having to resolve any requirements at all because we just use what's already in the venv.

Interesting... We should do one resolve per 3rdparty dependency combination, right? I'd be interested to know how many resolves we're actually doing... From some quick grepping, it looks like we have 38 3rdparty deps from our python code, and I would be surprised if we were doing more than maybe 30 resolves, and 30 minutes is a long time for 30 resolves to be happening, especially in parallel...

I guess the easiest way to find this out right now would be to grab a --native-engine-visualize-to trace for a whole run, and look at the graph...

@stuhood

This comment has been minimized.

Copy link
Member

commented May 21, 2019

Do you know the cause of the 250% perf regression? How much do we think is down to each of:

I haven't quantitatively confirmed anything, but my intuition from working with the V2 test runner the past few weeks is that it's the result of resolving requirements for each individual test, as opposed to not having to resolve any requirements at all because we just use what's already in the venv.

Interesting... We should do one resolve per 3rdparty dependency combination, right? I'd be interested to know how many resolves we're actually doing... From some quick grepping, it looks like we have 38 3rdparty deps from our python code, and I would be surprised if we were doing more than maybe 30 resolves, and 30 minutes is a long time for 30 resolves to be happening, especially in parallel...

+1. Also, it's worth noting that caching (either pantsd, local as in #6898, or remote) should completely eliminate the resolve overhead in 99% of cases.

I guess the easiest way to find this out right now would be to grab a --native-engine-visualize-to trace for a whole run, and look at the graph...

That or getting the zipkin patch relanded. I suspect that visualize-to is too noisy for this case...

Eric-Arellano added a commit that referenced this pull request May 23, 2019

Fix dependency cycle between `backend/native/subsystems` and `backend…
…/python/subsystems` (#7793)

### Problem
`backend/python/subsystems/python_native_code.py` was depending on `pants.backend.native.subsystems`, but not declaring the dependency in its `BUILD` file. 

The naive solution of adding the proper depedency to its `BUILD` results in a dependency cycle, as `backend/native/subsystems/conan.py` already depends depending on `pants.backend.python.subsystems.python_tool_base`.

So, `./pants test tests/python/pants_test/backend/python/tasks:pytest_run` would fail when ran directly, but pass in CI because all of the targets would get the sources combined.

When changing to using the V2 test runner, we no longer allow this sort of leaky dependency usage—every dependency must be properly declared in the appropriate `BUILD` file. So, `./pants test tests/python/pants_test/backend/python/tasks:pytest_run` started failing in #7724.

### Solution
Move `conan.py` into a new subdirectory `backend/native/subsystems/packaging`, as suggested by @cosmicexplorer.
@@ -947,23 +947,23 @@ matrix:
- *py27_linux_test_config_env
- CACHE_NAME=linuxunittests.py27
script:
- ./build-support/bin/ci.sh -2lp
- travis_wait 50 ./build-support/bin/ci.sh -2lp

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano May 23, 2019

Author Contributor

Frustrating UX experience of using travis_wait: the resulting std{out,err} will be stripped off any color. We normally print those in red upon test failures to make it easier to scan.

We won't be able to remove travis_wait until improving the performance via #7795 or via remoting.

@Eric-Arellano Eric-Arellano merged commit c9ea445 into pantsbuild:master May 23, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Eric-Arellano Eric-Arellano deleted the Eric-Arellano:unit-tests-use-v2 branch May 23, 2019

blorente added a commit to blorente/pants that referenced this pull request May 28, 2019

Eric-Arellano added a commit that referenced this pull request Jun 7, 2019

Split up CI unit tests into two distinct shards (#7867)
### Problem
Now that we use the V2 test runner as of #7724, unit tests both take much longer (20 minutes -> 40-50 minutes) and have become very flaky (not exclusively thanks to V2).

Especially because the tests flake so much, it is frustrating to have to wait a whole 50 minutes to rerun the shard.

### Solution
We can't use automated sharding because V2 does not support that yet, but we can introduce our own manual shards. One shard runs the V2 tests, and the other runs all blacklisted tests, the contrib tests, and the `pants-plugin` tests.

### Result
Flakes will be slightly less painful, because when something flakes you will not have to run the entire 50 minutes of CI again, but just the subset for that specific shard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.