New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port BaseTest to v2 engine (attempt two) #5611

Merged
merged 12 commits into from May 26, 2018

Conversation

Projects
None yet
4 participants
@stuhood
Copy link
Member

stuhood commented Mar 19, 2018

This is a revert of the revert of #4867.

The only interesting differences here are in the topmost commit, which hides a few more APIs that we want to deprecate, and fixes the flakiness that caused the revert (which turned out to be leaked references to the scheduler causing "Too many open file handle" issues).

@stuhood stuhood changed the title WIP: Port BaseTast to v2 engine (attempt two) WIP: Port BaseTest to v2 engine (attempt two) Mar 19, 2018

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch 3 times, most recently from b0aafa8 to bdc4688 Mar 19, 2018

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch 4 times, most recently from 28ad9c1 to 6347674 Apr 3, 2018

@stuhood stuhood requested review from kwlzn , benjyw and illicitonion Apr 5, 2018

@stuhood stuhood changed the title WIP: Port BaseTest to v2 engine (attempt two) Port BaseTest to v2 engine (attempt two) Apr 5, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Apr 5, 2018

This is now reviewable.

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch from 6347674 to ad6912b Apr 5, 2018

@baroquebobcat
Copy link
Contributor

baroquebobcat left a comment

Looks ok to me, though I'd prefer if the src changes that seem unrelated to updating the test class were broken out.

From looking at the rest of the patch, it seems like some of those changes are to bring the error messages in line with the existing assertions, but there are some moves that look like they could be broken out.

@@ -285,7 +285,8 @@ def test_stitch_deps_remote_existing_rev_respected(self):
pkg='prod',
rev='v1.2.3')
pre_execute_files = self.stitch_deps_remote(materialize=True)
self.build_graph.reset() # Force targets to be loaded off disk
self.reset_build_graph(reset_build_files=True) # Force targets to be loaded off disk
print('>>> {}'.format(self.buildroot_files()))

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

xx print?

@@ -253,7 +253,7 @@ def _raise_incorrect_address_error(self, spec_path, wrong_target_name, addresses

if not addresses:
raise self.EmptyBuildFileError(
'{was_not_found_message}, because that directory contains no BUILD files defining addressable entities.'
'{was_not_found_message}, because that directory does not contain any BUILD files defining addressable entities.'

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

This feels like it's unrelated to this change, but I do like the updated wording better.

This comment has been minimized.

@stuhood

stuhood Apr 5, 2018

Member

It's related to the change because a bunch of tests are swapping from being run in v1 to being run in v2.

'TargetMacro.Factory instances that construct more than one type are no longer supported. '
'Consider using a `context_aware_object_factory, which can construct any number of '
'different objects.'
)

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

Should this be a separate change?

This comment has been minimized.

@stuhood

stuhood Apr 5, 2018

Member

Maybe, but I no longer remember which test I was fixing when I changed it, so I'd rather land it here.

@@ -319,27 +319,12 @@ def spec_to_globs(address_mapper, specs):
elif type(spec) is AscendantAddresses:
patterns.update(join(f, pattern)
for pattern in address_mapper.build_patterns
for f in _recursive_dirname(spec.directory))
for f in recursive_dirname(spec.directory))

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

Separate change?

This comment has been minimized.

@stuhood

stuhood Apr 5, 2018

Member

The method is now used in file invalidation in test_base.

@@ -0,0 +1,573 @@
# coding=utf-8
# Copyright 2014 Pants project contributors (see CONTRIBUTORS.md).

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

Should this have an updated year? Or is it primarily a copy of 2014 code? Looks like the latter.

This comment has been minimized.

@stuhood

stuhood Apr 5, 2018

Member

Mm. Good question. Might as well refresh.

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch from ad6912b to 5e21e8b Apr 5, 2018

@baroquebobcat
Copy link
Contributor

baroquebobcat left a comment

Looks good to me!

@kwlzn

kwlzn approved these changes Apr 6, 2018

Copy link
Member

kwlzn left a comment

thanks!

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch from 5e21e8b to ee791e0 Apr 6, 2018

@benjyw

benjyw approved these changes Apr 6, 2018

@stuhood stuhood referenced this pull request Apr 20, 2018

Merged

Break a Core / Node cycle #5733

stuhood added a commit that referenced this pull request Apr 21, 2018

Break a Core / Node cycle (#5733)
### Problem

`Core` instances are currently being leaked in cases where `Nodes` have not completed running, and thus hold a `Context` in the closure for their `Future`. The cycle is:
```
Core -> Graph -> Node -> Context -> Core
```

### Solution

Clear all `Node` states when we `Drop` a `Scheduler`, breaking their cycles with the `Core`.

### Result

Fixes #5732 by ensuring that the `Store` held via `Scheduler -> Core -> Store` is dropped. It should also unblock #5611.

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch from ee791e0 to bb136fb Apr 21, 2018

stuhood added a commit that referenced this pull request Apr 21, 2018

Break a Core / Node cycle (#5733)
### Problem

`Core` instances are currently being leaked in cases where `Nodes` have not completed running, and thus hold a `Context` in the closure for their `Future`. The cycle is:
```
Core -> Graph -> Node -> Context -> Core
```

### Solution

Clear all `Node` states when we `Drop` a `Scheduler`, breaking their cycles with the `Core`.

### Result

Fixes #5732 by ensuring that the `Store` held via `Scheduler -> Core -> Store` is dropped. It should also unblock #5611.

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch 4 times, most recently from f383188 to 2e99612 May 8, 2018

@stuhood stuhood force-pushed the pantsbuild:master branch from b6bb42d to 9e2fdb5 May 11, 2018

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch from 2e99612 to 5984fde May 22, 2018

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch 2 times, most recently from 9568019 to 6daee11 May 23, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented May 23, 2018

This now depends on #5859.

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented May 23, 2018

There are two remaining failures in this job, both of which (to be confirmed shortly) are caused by SIGSEGVs in tests, which should now be more cleanly exposed by #5859. Searching for the phrase FAILURE: Test was killed in the logs should show which.

As far as I can tell, the failures occur in stable locations, although I'm unable to reproduce either of them locally:

  1. tests/python/pants_test/backend/python/tasks: These are unit tests using TestBase, so it might potentially make sense to see a failure here due to the usage of the new scheduler.
  2. contrib/go/tests/python/pants_test/contrib/go/tasks:integration: These are integration tests (extending PantsRunIntegrationTest), and thus should not actually have been affected by this PR at all (integration tests have run with v2 for a while).

At some point in the past I was briefly able to reproduce these failures locally, and running the tests with ./pants test $target -- -s (to disable all pytest output capturing) exposed a panic due to having run out of file handles. But unfortunately, I have not had luck attempting to reproduce that error today (and it's possible that it has changed to a virtual memory error at this point... unclear).

Having typed all of this out, I think I have one more thing I want to try, which is to see whether we might be able to generically catch panics in the scheduler via https://doc.rust-lang.org/std/panic/fn.catch_unwind.html and convert them directly to error messages. EDIT: I went ahead and did this here: master...twitter:stuhood/catch-unwind ... it's not pretty, because using catch_unwind safely is apparently quite challenging, but it's possible that it will allow us to get the errors in a more useful context?

illicitonion added a commit that referenced this pull request May 23, 2018

Improve logging/handling of signaled, killed, and terminated tests (#…
…5859)

### Problem

While debugging #5611, I determined that when a test was killed by a signal (and thus had a negative return value from `poll()`), `TestRunnerTaskMixin` incorrectly interpreted the failure as a still-living and hung process, and would later try to kill it. There were a few broken tests allowing for this, but primarily: `test_pytest_run_timeout_cant_terminate` was not waiting long enough for the tested-test to start up, and so was killing it before it had its signal handler in place.

Additionally, the usage of a `Timer` + `poll()` to implement test termination meant that we were guaranteed to wait the full `timeout_terminate_wait` time (10 seconds by default) before `poll()`ing to see whether the test had exited.

### Solution

1. Interpret `poll() == None` as a still running process, and `poll() < 0` as a process killed by a signal.
2. Report processes killed by signals before our initial attempt to kill them (which should expose a SIGSEV in #5611).
3. Rather than a timer, use `wait()` between `terminate()` and `kill()`, which avoids unnecessary sleeping.

### Result

Improved debuggability for tests that exit abnormally.

@illicitonion illicitonion force-pushed the twitter:stuhood/test-base-round-2 branch from 6daee11 to 92f1381 May 23, 2018

@stuhood stuhood force-pushed the twitter:stuhood/test-base-round-2 branch from e930cb7 to 78cd420 May 24, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented May 26, 2018

The two failing shards are due to the Yarn outage earlier. Going to go for it.

@stuhood stuhood merged commit 2e65f46 into pantsbuild:master May 26, 2018

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details

@stuhood stuhood deleted the twitter:stuhood/test-base-round-2 branch May 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment