Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote execution fails to import native_engine.so #8067

Closed
Eric-Arellano opened this issue Jul 17, 2019 · 4 comments
Closed

Remote execution fails to import native_engine.so #8067

Eric-Arellano opened this issue Jul 17, 2019 · 4 comments

Comments

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Jul 17, 2019

About half of unit tests fail when remoted with the below:

tests/python/pants_test/engine stdout:
============================= test session starts ==============================
platform linux -- Python 3.6.8, pytest-3.6.4, py-1.8.0, pluggy-0.7.1
rootdir: /b/f/w, inifile:
plugins: timeout-1.2.1, cov-2.4.0
collected 8 items

pants_test/engine/test_engine.py FFFFFFFs                                [100%]

=================================== FAILURES ===================================
_________________________ EngineTest.test_fork_context _________________________

self = <pants_test.engine.test_engine.EngineTest testMethod=test_fork_context>

    def test_fork_context(self):
      # A smoketest that confirms that we can successfully enter and exit the fork context, which
      # implies acquiring and releasing all relevant Engine resources.
      expected = "42"
      def fork_context_body():
        return expected
>     res = self.mk_scheduler().with_fork_context(fork_context_body)

pants_test/engine/test_engine.py:142:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pants_test/engine/scheduler_test_base.py:61: in mk_scheduler
    include_trace_on_error=include_trace_on_error)
pants/engine/scheduler.py:87: in __init__
    self._tasks = native.new_tasks()
pants/engine/native.py:752: in new_tasks
    return self.gc(self.lib.tasks_create(), self.lib.tasks_destroy)
pants/util/memo.py:115: in memoize
    result = func(*args, **kwargs)
pants/engine/native.py:638: in lib
    lib = self.ffi.dlopen(self.binary)
pants/util/memo.py:115: in memoize
    result = func(*args, **kwargs)
pants/engine/native.py:645: in ffi
    return getattr(self._ffi_module, 'ffi')
pants/util/memo.py:115: in memoize
    result = func(*args, **kwargs)
pants/engine/native.py:658: in _ffi_module
    return importlib.import_module(NATIVE_ENGINE_MODULE)
/pyenv-docker-build/versions/3.6.8/lib/python3.6/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:994: in _gcd_import
    ???
<frozen importlib._bootstrap>:971: in _find_and_load
    ???
<frozen importlib._bootstrap>:955: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:658: in _load_unlocked
    ???
<frozen importlib._bootstrap>:571: in module_from_spec
    ???
<frozen importlib._bootstrap_external>:922: in create_module
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

f = <built-in function create_dynamic>
args = (ModuleSpec(name='native_engine', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7ff0fcab5cf8>, origin='/tmp/tmpn6uegfmq/native_engine.so'),)
kwds = {}

>   ???
E   ImportError: /tmp/tmpn6uegfmq/native_engine.so: invalid ELF header

<frozen importlib._bootstrap>:219: ImportError

To reproduce, rebase against #8066, then run ./pants --pants-config-files=pants.remote.ini --no-v1 --v2 --remote-oauth-bearer-token-path=<(gcloud auth application-default print-access-token | perl -p -e 'chomp if eof') --process-execution-speculation-strategy=none test tests/python/pants_test/engine

--

I think this makes sense, as we seem to never explicitly depend on native_engine.so? V2 does not work with implicit runtime dependencies.

@illicitonion
Copy link
Contributor

I just ran this and couldn't reproduce... And from looking at the BUILD files, it looks like there is a dep on native_engine.so...

@Eric-Arellano
Copy link
Contributor Author

Indeed, we do have a dep on native_engine.so:

resources(
name='native_engine_shared_library',
sources=['native_engine.so']
)

However, this dep is not very safe in that it's generated and provided by the host machine. For example, perhaps my generated native_engine.so works on my local mac, but when used in the remote RBE docker image, it throws this invalid ELF header error. Meanwhile, the native_engine.so file on your computer might happen to work in both environments.

In general, ideas on a robust way to provide to remote workers a solid native_engine.so that works for their platform? This is important to solve to remote integration tests, which all depend on this file.

(P.S. @cosmicexplorer was able to reproduce today using ./pants --pants-config-files=pants.remote.ini --no-v1 --v2 --remote-oauth-bearer-token-path=<(gcloud auth application-default print-access-token | perl -p -e 'chomp if eof') --process-execution-speculation-strategy=none test tests/python/pants_test/auth:auth on the #8051 branch)

@stuhood
Copy link
Sponsor Member

stuhood commented Jul 22, 2019

In general, ideas on a robust way to provide to remote workers a solid native_engine.so that works for their platform? This is important to solve to remote integration tests, which all depend on this file.

@Eric-Arellano : This is precisely #7735 I think.

EDIT: Oh. I guess it isn't. It's definitely related, but I think the situation in this case is essentially that unless we were to port the bootstrap of pants itself into remote execution, we would not be able to build the native portion for the remote host.

And porting the bootstrap of pants into remote execution is a large project: @cosmicexplorer started it at one point, but there were a series of hermeticity leaks due to limitation in the v2 API that we would need to resolve to get that working.

I think that we should assume that we will not be diving into that one too soon (any time in the next two months maybe?), and focus on "doing the right thing" in this case via #7735 (ie, not attempting to remote something platform-dependent to a mismatched platform).

@Eric-Arellano
Copy link
Contributor Author

Closing because Pants now does the right thing. Pants won't even attempt to do remote execution from a mac with a Linux platform as it is not safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants