Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pantsd] Enable pantsd in Travis #7320

Closed
blorente opened this issue Mar 6, 2019 · 12 comments

Comments

Projects
None yet
3 participants
@blorente
Copy link
Contributor

commented Mar 6, 2019

In the context of #4438, a large step will be enabling pantsd in Travis CI. This ticket is created to track progress towards that.


An unordered summary of known issues discovered in #7322:

  • Forking issues

    • "OSX 10.12 sanity check shard fails fast"[ ]
    • (#7380) "dynamically adding to --pythonpath does not affect a forked pantsd instance" ?:
      • tests/python/pants_test/backend/python:integration
  • (have seen it be flaky in other places) https://gist.github.com/stuhood/2d87abf9d08df80c9062ffcb93cff393:

    • tests/python/pants_test/backend/python/tasks/native:integration
  • https://gist.github.com/stuhood/a2a927816f7d7712278b454949778d15

    • tests/python/pants_test/backend/project_info/tasks:idea_plugin_integration
  • "test directly inspecting value of the ENABLE_PANTSD option":

    • tests/python/pants_test/goal:run_tracker_integration
  • "not necessarily broken, just expecting log output". When exceptions are logged to files as opposed to stderr, and tests check for some string in stderr, that's a problem:

    • tests/python/pants_test/logging:test_native_engine_logging
    • tests/python/pants_test/rules:test_integration
    • tests/python/pants_test/engine/legacy:pants_engine_integration
    • tests/python/pants_test/engine/legacy:graph_integration
    • tests/python/pants_test/build_graph:build_graph_integration
    • TestPantsDaemonIntegration.test_pantsd_sigterm
  • pants.pantsd.watchman.Watchman.WatchmanCrash: error from watchman: RootResolveError: unable to resolve root /home/travis/build/pantsbuild/pants/.pants.d/tmp/$snip.pants.d: directory /home/travis/build/pantsbuild/pants/.pants.d/tmp/$snip.pants.d is not watched -> This presents the case where problems arise if tests create their own buildroots I don't think that's something we should deal with in production, for now? Not sure though.

    • tests/python/pants_test/backend/jvm/tasks/jvm_compile/java:cache_compile_integration
    • tests/python/pants_test/backend/python/tasks:integration
  • When hermetic is enabled in PantsDaemonIntegrationTestBase, many of the pantsd tests complain about absolute symlinks on in the python distribution: Exception: Failed to read_link for Link("build-support/pants_dev_deps.py37.venv/.Python"): Custom { kind: InvalidData, error: StringError("Absolute symlink: \"/Users/bescobar/workspace/otherpants/build-support/pants_dev_deps.py37.venv/.Python\"") }, or because some plugins are not loaded. I think this is because of the --pants-config-files=[] that is passed when hermetic. To this point, the reason things had to be hermetic in the first place was because some tests run pantsd multiple times, and expect it to be alive between runs in the same test. Since the PANTS_SHUTDOWN_PANTSD_AFTER_RUN option leaked from the travis env, every pants run in the test shut down the daemon. Stripping that option should be enough to make those tests run non-hermetically, I think.

    • tests/python/pants_test/pantsd:pantsd_integration
      • test_pantsd_compile
      • test_pantsd_run
      • test_pantsd_filesystem_invalidation
      • test_pantsd_aligned_output
    • tests/python/pants_test/engine/legacy:console_rule_integration
      • test_v2_list

So, TL;DR, the current set of issues this has (which may or may not be relevant to enabling pantsd itself). Barring these, travis passes (ticked means "probably not relevant outside of the test harness"):

  • Forking issues.
  • Some tests had to be marked hermetic. The root cause was that the env variable PANTS_SHUTDOWN_PANTSD_AFTER_RUN=true was leaking from the travis env. There are tests that expect the daemon to be alive run after run, because they test things like FS invalidation. Even though we turned the CLI option off in the pantsd running context, the ENV var still leaked. That meant that pantsd got restarted after every run within the test. This problem is both not very relevant because it only affects the test harness, and not very hard to solve for the test harness.
  • Some tests expect output in stderr, but they are not getting it. When the daemon is enabled, the output gets logged into the log file or logs/exceptions.log, and doesn't reach stderr. I have to investigate more to see if this is an error that actually happens in the wild.
    • This one seems legitimate, it's pretty easy to repro locally with:
      $ cat stderr_repro.sh
      #!/bin/bash
      mv $1/TEST_BUILD $1/BUILD
      ./pants --enable-pantsd --build-file-imports=warn run "${1}:hello"
      mv $1/BUILD $1/TEST_BUILD
      $ ./stderr_repro.sh testprojects/src/python/build_file_imports_module
      <The warning will not appear in the console, but in the pantsd log>
      
      When we stop forking, fixing this issue should be easy.
  • Some tests create custom buildroots, inside the workdir of the pants instance running the tests. This means that the watchman instance orchestrating the tests has a hard time tracking those roots. I'm not sure how much this is an Issue, but I can see a world where we don't care that much about buildroots within buildroots, particularly given that specifying a custom buildroot is already discouraged.
    • The current hypothesis is that mock_buildroot() actually symlinks the inner .pants.d directory to the outer one (something like .pants.d/tmpabc123/.pants.d -> .pants.d). Therefore, when we run the test wit the daemon enabled, and we try to watch the workdir (.pants.d/tmpabc123/.pants.d), watchman will crash because it doesn't support symlinks. If this is the case, this means that the issues is in the test harness, not pantsd (except maybe we could give a more graceful error if we fail because of this).
    • EDIT: I don't think it's symlinks specifically anymore, because when setting a folder other than the outer workdir as the root of our mock buildroot, the test doesn't repro anymore.
    • EDIT: No, that was not what was wrong. The story goes: When calling mock_buildroot(), we link some essential files to basically recreate a minimal buildroot. We don't link a file called .python_interpreter_constraints. This means that, when the test runs pants in the current buildroot, it sees that the python interpreter constraints file is missing, and triggers a ./pants clean-all, which in turn gets rid of the symlink of .pants.d/tmpabc123/.pants.d -> .pants.d, and replaces it with a new folder. I'm not sure what the connection is between this and watchman crashing, but if we also link the .python_interpreter_constraints file, it works.
@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 6, 2019

I believe that #7300 failed due to a known issue with pantsd on OSX < 10.12 (EDIT: now reported as #7323).

Nonetheless, I don't think it's actually necessary to use pantsd on the binary builder shards yet... will take a new tack in a new PR.

@stuhood stuhood added the pantsd label Mar 6, 2019

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 6, 2019

Iterating on #7322.

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 12, 2019

Will gather a log today of issues, and triage them based on whether I think that they will be fixed by #7341.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Mar 12, 2019

Is there a ticket or comment somewhere describing the blockers to using pantsd run-over-run for integration tests in travis? I am vaguely aware that the integration tests are supposed to have a separate buildroot, which means pantsd can't immediately be used. There is even a specific warning when using pantsd in an integration test:

pidfile_absolute = PantsDaemon.metadata_file_path('pantsd', 'pid', bootstrap_options.pants_subprocessdir)
if pidfile_absolute.startswith(build_root):
pidfile = os.path.relpath(pidfile_absolute, build_root)
else:
pidfile = None
logging.getLogger(__name__).warning(
'Not watching pantsd pidfile because subprocessdir is outside of buildroot. Having '
'subprocessdir be a child of buildroot (as it is by default) may help avoid stray '
'pantsd processes.'
)

Is there a reason we can't share that across runs? The time we get back in our lives from having integration tests run faster is going to make everything else easier to iterate on. If this is considered an unnecessary addendum to the pantsd work right now then I can work on it myself, but I'd appreciate having the blockers more clearly laid out.

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 12, 2019

@cosmicexplorer : All integration tests use new anonymous workdirs, intentionally, to avoid mixing of results from one test method into another. This is mostly because having to worry about state from other tests being mixed into your current test method would drive you fairly quickly insane.

I think that we need to build a huge amount of trust in pantsd before we begin to do that. We could begin collecting blockers into an issue, but I know of 3 or 4 off the top of my head (which we may or may not fix while enabling pantsd by default... unclear), so it doesn't feel like a priority to me.

EDIT: But, if you are interested in pushing on it, I would recommend making it a separate ticket blocked by this one.

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 12, 2019

I've updated the description with a summary of failures from #7322.

In short... there aren't that many? And it doesn't seem like any of them would be fixed by #7341 (with the exception of maybe the watchman-related failures). I don't think that that means we shouldn't implement and use #7341, but it might affect the priority.

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 12, 2019

Unassigning for now so that I can work on the release.

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 14, 2019

Will pick this back up later today.

@stuhood stuhood self-assigned this Mar 14, 2019

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 16, 2019

I've added and applied a decorator to the tests mentioned above in #7322: so it should go mostly-green.

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 20, 2019

Still looking at a few more failures here.

@stuhood stuhood assigned blorente and unassigned stuhood Mar 26, 2019

@stuhood

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

This is being continued in #7440.

blorente added a commit that referenced this issue Apr 15, 2019

Enable pantsd in Travis (#7440)
** Problem **
See #7320 for context. We would like to run integration tests under pantsd, to discover more issues.

** Solution **
Enable pantsd in the integration tests in the cron shard for py36 (not in the test runner, but on individual integration tests themselves) via a class property, and blacklist the tests that don't work with a useful docstring.

** Result **
Pantsd will now be enabled in integration and contrib tests running in the cron py36 shards.
@blorente

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2019

Closed with #7440

@blorente blorente closed this May 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.