New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a --toolchain-variant option to select the compiler for C/C++ #6800

Merged
merged 28 commits into from Dec 10, 2018

Conversation

Projects
None yet
3 participants
@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Nov 21, 2018

Problem

We have not previously made it possible for the user to select the compiler to use for C/C++ code (although we have unit tests for both compilers). It is currently hardcoded to use LLVM.

Solution

  • Add a --toolchain-variant enum option to NativeBuildSettings.
  • Convert the subsystem dependency in NativeTask on NativeBuildSettings to be a global subsystem dependency instead of a scoped dependency.
  • Add some boilerplate in NativeTask to select the toolchain based on the option.
  • Add an info-level log to the compile and link tasks stating the executable name which is being used.
  • Expand an integration test to cover each toolchain variant.

Result

Users can now elect to use --native-build-settings-toolchain-variant=llvm to use clang/clang++ for the compiler. The default compiler has been changed to be gcc/g++ (previously llvm was the only option).

@CMLivingston
Copy link
Contributor

CMLivingston left a comment

LGTM, thanks for adding this! One comment on a possible way to improve the test.

self._assert_ctypes_binary(variant)

def _assert_ctypes_binary(self, toolchain_variant):
# TODO: figure out a way to check that when we select 'gnu' as the `toolchain_variant`, we use

This comment has been minimized.

@CMLivingston

CMLivingston Nov 22, 2018

Contributor

Maybe you could use ldebug and pattern match against what exe we use?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Nov 22, 2018

Contributor

I was thinking of reading the output, hadn't thought of using ldebug. When I tried to set 'level': 'debug' in the pants run config, it failed with an error relating to run tracking -- however, I realized that knowing the compiler and linker's filenames isn't that intrusive of a message and might be really useful as an info log. So the current commit checks the compiler and linker filenames are in the output from the info logs that were added.

# Check that we have selected the appropriate compilers for our selected toolchain variant,
# for both C and C++ compilation.
for compiler_name in self._compiler_names_for_variant[toolchain_variant]:
self.assertIn("selected compiler exe name: '{}'".format(compiler_name),

This comment has been minimized.

@jsirois

jsirois Nov 22, 2018

Member

I am definitely not a fan of scraping logs to implement tests. This was cheap since there was already an integration test but expensive since it adds production logging of questionable value (it effectively parrots the very well tested option system) and that will surprisingly break tests when altered. Consider ~never doing this when possible.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Nov 22, 2018

Contributor

I will make an effort to remember your comment as the concerns you mention are quite bothersome. I would love to implement any alternative -- the info logs here were pretty much exactly for testing purposes and that conflation makes me sick at heart.

I suppose the point of an integration test is not to try to inspect internal state but to examine some properties of the result of a pants run. I was wondering whether it was possible to tag an object file or shared library with metadata that includes e.g. which compiler produced it. The rationale for testing both compilers here is to avoid introducing a breakage in one toolchain, and just changing the option value doesn't seem like it does enough to verify that we are choosing the right compiler.

It's probably possible to propagate this metadata (which compiler/linker was used) into the produced python_dist() (by making some edits to BuildLocalPythonDistributions -- not by tagging any native code), which we could then inspect in this test. I will look up whether this is feasible -- adding a "compiler" tag (or something) to a dist seems like a very interesting and concise way to begin the foray into classifying compatibility between e.g. native code cache artifacts. I'm also seeing the implementation of this requiring a relatively small amount of code, which would be feasible to add in this PR or another. I'm working on this approach now to avoid the addition of these info-level logs. I'm thinking of potentially storing another file in data_files which contains info about which compiler was used.

Internally we have had to address the situation several times where e.g. tensorflow and related wheels assume quite a lot of things about the native code execution environment (mostly relating to glibc and stdlibc++ version), so if there is some way to tag wheels with this type of more in-depth native code compatibility (e.g. which compiler is used), we could potentially apply that method to 3rdparty wheels as well in the future.

Bazel might have an analogue of a far more general form of compatibility with e.g. compatible_with (with application to bazel toolchains) -- I'm not familiar enough with any of that to say whether it's applicable here. Bazel's compatibility also handles e.g. ensuring "secret" code doesn't get mixed with non-"secret" code, so it's a far more general formulation than I care about right now, but was interesting to look at just now.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Nov 22, 2018

Contributor

I've linked the above to #native so the content isn't lost. In essence, I'm just looking for a simple method for now to propagate the identity (even just the compiler name) of the toolchain we use to the python_dist() we build, in a way we can check later in this integration test. Right now this is the only such check we make so it can probably be hacked on to the specific dist we build, now that I think of it.

This comment has been minimized.

@jsirois

jsirois Nov 26, 2018

Member

That would certainly be preferable. That said, just keep scope in mind. All this to test NativeTask.get_c_toolchain_variant and NativeTask.get_cpp_toolchain_variant work - which just tests subsystems / options. I think that can be done simply in a unit test (although you do request engine products - not sure how easy that is but I'm pretty sure unit test infra has support for this already). You need not test every bit of program logic through an integration test or unit test or both. Pick your battles based on risk or complexity and this is a tiny skirmish your whacking with a howitzer.

This comment has been minimized.

@jsirois

jsirois Dec 4, 2018

Member

So, what's the status of this? The test still log scrapes. Do you want to follow-up with an issue tracking fixing this test and removing the production log.info it requires?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Dec 5, 2018

Contributor

Added a link to #6866 to cover my interpretation of this discussion!

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-toolchain-variant-option branch from dfa661e to 3fca43e Nov 26, 2018

@@ -186,20 +193,19 @@ def select_llvm_cpp_toolchain(platform, native_toolchain):
# NB: we use g++'s headers on Linux, and therefore their C++ standard library.
include_dirs=provided_gpp.include_dirs,
extra_args=(llvm_cpp_compiler_args + provided_clangpp.extra_args + gcc_install.as_clang_argv))
linking_library_dirs = provided_gpp.library_dirs + provided_clangpp.library_dirs
# TODO: why are these necessary? this is very mysterious.

This comment has been minimized.

@CMLivingston

CMLivingston Nov 27, 2018

Contributor

+1 that we should better understand why certain toolchains require knowing about both library dirs or both include dirs.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Nov 27, 2018

Contributor

I don't know if it is required -- that's why this PR is so necessary (testing all of these configurations in an integration test).

@CMLivingston
Copy link
Contributor

CMLivingston left a comment

Curious how the crti.o on command line became necessary after these changes?

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Nov 27, 2018

See the commit message: adding it explicitly to the command line was necessary in the past, but mysteriously became not a problem (because clang searches for it itself). I was in bed 24/7 around this time and didn't realize that using LLVM as the default would change the results of our integration tests. Our subsystem unit testing in test_native_toolchain.py contains an approximation of the changes to native_toolchain.py, and that file can probably be made significantly smaller after this change. This is why the testing introduced in this change is so significant.

EDIT: I'll find sources for the above, multitasking currently.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-toolchain-variant-option branch 3 times, most recently from 366659f to 3d91846 Nov 27, 2018

@jsirois

This comment has been minimized.

Copy link
Member

jsirois commented Nov 27, 2018

It appears a good deal has changed since my initial review, can you ping when you think this is ready for re-review?

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Nov 27, 2018

It appears a good deal has changed since my initial review, can you ping when you think this is ready for re-review?

Yes! I expected to be able to skip some tests here, but in digging into those, found more I needed to investigate before being able to merge this (if CI passes this time, then that would be done, but Travis itself is being extremely flaky today -- not even our tests). The lengthy comment I made earlier is going into a followup issue -- perhaps one more such issue or PR will be made before this is ready to review again.

cosmicexplorer added some commits Nov 21, 2018

add crti.o back to the command line
i think we were doing it wrong this whole time because we were only invoking llvm in our testing. a
mistake i hope we won't be able to make again.
fix error introduced by incorrect git rebase operation
the commit 082a1c0 was the product of a rebase, and this broke the
code in unexpected ways, leading to the CI failure. this is a very strange occurrence

cosmicexplorer added some commits Nov 29, 2018

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-toolchain-variant-option branch from 29a3d0e to 9a25d32 Nov 30, 2018


def test_ctypes_third_party_integration(self):
pants_binary = self.run_pants(['binary', self._binary_target_with_third_party])
self.assert_success(pants_binary)

# TODO(#6848): this fails when run with gcc on osx as it requires gcc's libstdc++.so.6.dylib to
# be available on the runtime library path.
pants_run = self.run_pants(['-q', 'run', self._binary_target_with_third_party])

This comment has been minimized.

@CMLivingston

CMLivingston Nov 30, 2018

Contributor

Any way we could wrap this in a conditional that skips if the condition is true and logs this info to the user?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Nov 30, 2018

Contributor

I would prefer that this just gets fixed for real as part of #6848, as the underlying issue seems to require some infra work beforehand (figuring out how to run things which depend on runtime libraries from our BinaryTools). Does that sound reasonable?

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Nov 30, 2018

This PR has ballooned, again, but this is also the PR that tests both of our toolchains separately and therefore hopefully avoids massive diffs to native_toolchain.py in the future. I left a TODO regarding an "algebra" of Executables which would have made this diff much smaller, but I haven't developed that enough right now.

I have made a stub PR #6848 to cover the unfinished business in this one. Let me know if this diff is inscrutable @jsirois and I can work on breaking out some more of the changes.

@memoized_property
def linker(self):
return self._cpp_toolchain.cpp_linker
# NB: we are using the C++ toolchain here for linking every type of input, including compiled C

This comment has been minimized.

@CMLivingston

CMLivingston Nov 30, 2018

Contributor

Curious why this is necessary and why we don't link through the frontend of the C toolchain?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Nov 30, 2018

Contributor

Well, we would have to choose one, as in this phase we are linking C and C++ object files together into the shared lib defined by the native_artifact() build object. In this case, g++/clang++ are necessary to add the appropriate arguments to the linker subcommand for C++ (which can be seen if we add -v to the command line). This will also work for C. Because we are linking all these object files at once, we are forced to use the C++ compiler to support linking the C++ object files. We could e.g. use our C compiler to compile a target closure of only C targets, but I don't see any benefit to doing that. The reason we still have a specific C linker is so that we can support any C-specific link process in the future (and we do test our C linker in test_native_toolchain.py, so that's not going to rot).

@@ -219,7 +260,7 @@ def select_gcc_c_toolchain(platform, native_toolchain):
# TODO: we should be providing all of these (so we can eventually phase out XCodeCLITools
# entirely).
xcode_clang = yield Get(CCompiler, XCodeCLITools, native_toolchain._xcode_cli_tools)
new_include_dirs = xcode_clang.include_dirs + provided_gcc.include_dirs
new_include_dirs = provided_gcc.include_dirs + xcode_clang.include_dirs

This comment has been minimized.

@CMLivingston

CMLivingston Nov 30, 2018

Contributor

Why was this swap necessary? (it might have implications for internal use cases, will verify)

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Nov 30, 2018

Contributor

This is actually a correction which is necessary for tests to pass (this is due to the gcc headers' use of #include_next). The same switch is done for the GCCCppToolchain. The reason this bug existed before was because there was no concise way to say "this executable is composed of these other executables' resources", so we lost a lot of readability by requiring repeated code.

This comment has been minimized.

@jsirois

jsirois Dec 4, 2018

Member

...and so this probably deserves a comment in the code. Your cohort which is probably the closest to this code found it mysterious - excellent indicator that it truly is and should get note not just in review.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Dec 5, 2018

Contributor

I agree. It has historically been difficult to justify spending time on fixing tech debt for this backend as I wasn't able to get a lot of help understanding the time cost of different approaches, so I attempted to muddle through solutions and leave millions of TODOs because I was extremely concerned about wasting time on unnecessary abstractions. @CMLivingston has been extremely helpful in developing the larger picture.

The first longstanding issue is this PR, which addresses the lack of toolchain choice, which you have mentioned multiple times previously (which was appreciated). The second is #6855, which I have listed in a couple comments in this file, and which is intended to remove the mystery from repeated but slightly varying Executable compositions, and to remove the chance of making ordering mistakes like the above by combining Executables at a higher level of abstraction. The last is #5970, which needs to be redone entirely but may be much simpler after the previous are complete.

@CMLivingston
Copy link
Contributor

CMLivingston left a comment

LGTM - a few comments. I am currently building a Linux pex for internal validation against our use cases (verifying that swapping include dirs doesn't cause any unexpected issues).

Show resolved Hide resolved src/python/pants/backend/native/config/environment.py
Show resolved Hide resolved src/python/pants/backend/native/config/environment.py
Show resolved Hide resolved src/python/pants/backend/native/subsystems/libc_dev.py Outdated
@@ -219,7 +260,7 @@ def select_gcc_c_toolchain(platform, native_toolchain):
# TODO: we should be providing all of these (so we can eventually phase out XCodeCLITools
# entirely).
xcode_clang = yield Get(CCompiler, XCodeCLITools, native_toolchain._xcode_cli_tools)
new_include_dirs = xcode_clang.include_dirs + provided_gcc.include_dirs
new_include_dirs = provided_gcc.include_dirs + xcode_clang.include_dirs

This comment has been minimized.

@jsirois

jsirois Dec 4, 2018

Member

...and so this probably deserves a comment in the code. Your cohort which is probably the closest to this code found it mysterious - excellent indicator that it truly is and should get note not just in review.

Show resolved Hide resolved src/python/pants/backend/native/subsystems/xcode_cli_tools.py Outdated
# Check that we have selected the appropriate compilers for our selected toolchain variant,
# for both C and C++ compilation.
for compiler_name in self._compiler_names_for_variant[toolchain_variant]:
self.assertIn("selected compiler exe name: '{}'".format(compiler_name),

This comment has been minimized.

@jsirois

jsirois Dec 4, 2018

Member

So, what's the status of this? The test still log scrapes. Do you want to follow-up with an issue tracking fixing this test and removing the production log.info it requires?

expand docstrings and add types to Executable properties
also respond to various review comments
remove some of the include dir dark magic by standardizing options
specifically remove the hardcoded extra include paths in the xcode clang wrapper

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-toolchain-variant-option branch from 297c6c4 to 55e6908 Dec 5, 2018

@jsirois
Copy link
Member

jsirois left a comment

Please give notice when you've got this change green for final review.

@@ -42,34 +42,54 @@ def resolve_platform_specific(self, platform_specific_funs):
return fun_for_platform()


class Executable(object):
# NB: @abstractproperty requires inheriting from AbstractClass to work!

This comment has been minimized.

@jsirois

jsirois Dec 5, 2018

Member

This is both untrue and doesn't stand the novelty test. Using AbstractClass is a convenient way to properly support @abstractproperty but not the only way, and we do this in many places in the codebase, so the revelation - though personally novel is not so generally and so probably not worth a comment.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Dec 7, 2018

Contributor

This was a note to myself which I was planning to remove, and this was a note that I was going to word as "inheriting from AbstractClass, or an equivalent method to set the metaclass", then decided I probably didn't need to go that in depth for now. I am aware this is being presented as code for review, and I agree that this shouldn't be in there, but as mentioned in the other comment, since I "need" travis to run tests, this all gets flattened.

I suppose in the future I could keep TODOs in another file, but that's not ideal and I would prefer to make the local testing situation better to avoid pushing code to a PR that I intend to remove.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Dec 7, 2018

Contributor

This note has been removed.

"""

# TODO(#???): rename this to 'runtime_library_dirs'!

This comment has been minimized.

@jsirois

jsirois Dec 5, 2018

Member

Please fill in the issue link or else kill the unfinished #??? thought. A todo that itself is in a half-done state is just too much. Maybe fine when you're hacking locally, but you really should clean things up like this before posting for review. You may be making an effort at this though and just missed this one.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Dec 7, 2018

Contributor

If it was possible to run testing through the travis shards before pushing a commit, I would never have any such comments. I'll grant that magit in emacs makes it easy to stage hunks of files and not the whole diff, but since bootstrapping pants when running in a docker image takes a very long time, the only method of iteration I can see is to abuse travis runs on each commit. I feel that this has continually led to the assumption that I think TODO(#???) is an acceptable thing to have in code for review, which I don't, at all -- I left this particular comment as is because I was trying to test the new code against travis instead of context switching. This also applies to the NB I left for myself about @abstractproperty, which was mentioned above. I have been working on trying to fix the local testing situation but it is difficult to do all of these things at the same time.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Dec 7, 2018

Contributor

Removed the (#???)!

@@ -161,6 +163,7 @@ def select_llvm_c_toolchain(platform, native_toolchain):
llvm_c_compiler_args = [
'-x', 'c', '-std=c11',
'-nobuiltininc',
'-nostdinc',

This comment has been minimized.

@jsirois

jsirois Dec 5, 2018

Member

You've added this in four places and it seems related to the failing unit tests:

  • TestNativeToolchain.test_hello_c_clang
  • TestNativeToolchain.test_hello_c_gcc
  • TestNativeToolchain.test_hello_cpp_clangpp
  • TestNativeToolchain.test_hello_cpp_gpp

As well as in the TestProjectsIntegrationTest.test_shard_44 integration test which shows:

...
                     E   	03:49:57 00:01   [native-compile]
                     E   	03:49:57 00:01     [native-third-party-fetch]
                     E   	                   Invalidated 2 targets.
                     E   	03:50:11 00:15     [c-for-ctypes]
                     E   	03:50:11 00:15     [cpp-for-ctypes]
                     E   	                   Invalidated 2 targets.
                     E   	                   selected compiler exe name: 'clang++'
                     E   	03:50:11 00:15       [cpp-compile]
                     E   	                     clang-6.0: warning: argument unused during compilation: '-nobuiltininc' [-Wunused-command-line-argument]
                     E   	                     /home/travis/build/pantsbuild/pants/testprojects/src/python/python_distribution/ctypes_with_extra_compiler_flags/some_more_math.cpp:2:10: fatal error: 'assert.h' file not found
                     E   	                     #include <assert.h>
                     E   	                              ^~~~~~~~~~
                     E   	                     1 error generated.
...

And CTypesIntegrationTest.test_ctypes_third_party_integration which shows:

...
                     E   	03:21:58 00:01   [native-compile]
                     E   	03:21:58 00:01     [native-third-party-fetch]
                     E   	                   Invalidated 2 targets.
                     E   	03:22:11 00:14     [c-for-ctypes]
                     E   	03:22:12 00:15     [cpp-for-ctypes]
                     E   	                   Invalidated 1 target.
                     E   	                   selected compiler exe name: 'clang++'
                     E   	03:22:12 00:15       [cpp-compile]
                     E   	                     clang-6.0: warning: argument unused during compilation: '-nobuiltininc' [-Wunused-command-line-argument]
                     E   	                     In file included from /home/travis/build/pantsbuild/pants/testprojects/src/python/python_distribution/ctypes_with_third_party/some_more_math_with_third_party.cpp:4:
                     E   	                     /home/travis/build/pantsbuild/pants/.pants.d/tmp/tmpQX3l3R.pants.d/native-compile/native-third-party-fetch/a33349f38f6e/testprojects.src.python.python_distribution.ctypes_with_third_party.rang/current/include/rang.hpp:15:10: fatal error: 'unistd.h' file not found
                     E   	                     #include <unistd.h>
                     E   	                              ^~~~~~~~~~
                     E   	                     1 error generated.
...

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Dec 7, 2018

Contributor

-nobuiltininc (which clang recognizes, but doesn't use), as well as -nostdinc (which both clang and gcc recognize) have been removed, with a more concise TODO link to #6143 to look into precisely what these flags do (and whether we need to start making pull requests to llvm...).

cosmicexplorer added some commits Dec 7, 2018

@jsirois

jsirois approved these changes Dec 8, 2018

@cosmicexplorer cosmicexplorer merged commit 5036877 into pantsbuild:master Dec 10, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment