New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associate cli arguments with executables and refactor llvm/gcc c/c++ toolchain selection #6217

Merged
merged 24 commits into from Jul 24, 2018

Conversation

Projects
None yet
5 participants
@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Jul 23, 2018

Problem

#5951 explains the problem addressed by moving CLI arguments to individual Executable objects -- this reduces greatly the difficulty in generating appropriate command lines for the executables invoked. In this PR, it can be seen to remove a significant amount of repeated boilerplate.

Additionally, we weren't distinguishing between a Linker to link the compiled object files of gcc or g++ vs clang or clang++. We were attempting to generate a linker object which would work with any of gcc, g++, clang, or clang++, and this wasn't really feasible. Along with the above, this made it extremely difficult and error-prone to generate correct command lines / environments for executing the linker, which led to e.g. not being able to find crti.o (as one symptom addressed by this problem).

Solution

  • Introduce CToolchain and CppToolchain in environment.py, which can be generated from LLVMCToolchain, LLVMCppToolchain, GCCCToolchain, or GCCCppToolchain. These toolchain datatypes are created in native_toolchain.py, where a single @rule for each ensures that no C or C++ compiler that someone can request was made without an accompanying linker, which will be configured to work with the compiler.
  • Introduce the extra_args property to the Executable mixin in environment.py, which Executable subclasses can just declare a datatype field named extra_args in order to override. This is used in native_toolchain.py to ensure platform-specific arguments and environment variables are set in the same @rule which produces a paired compiler and linker -- there is a single place to look at to see where all the process invocation environment variables and command-line arguments are set for a given toolchain.
  • Introduce the ArchiveFileMapper subsystem and use it to declare sets of directories to resolve within our BinaryTool archives GCC and LLVM. This subsystem allows globbing (and checks that there is a unique expansion), which makes it robust to e.g. platform-specific paths to things like include or lib directories.

Result

Removes several FIXMEs, including heavily-commented parts of test_native_toolchain.py. Partially addresses #5951 -- setup_py.py still generates its own execution environment from scratch, and this could be made more hygienic in the future. As noted in #6179 and #6205, this PR seems to immediately fix the CI failures in those PRs.

cosmicexplorer added some commits Jul 23, 2018

@jsirois
Copy link
Member

jsirois left a comment

This is alot to grok at once, so a bit of a superficial skim.

def c_compiler(self):
exe_filename = 'gcc'
path_entries = self.path_entries()
_PLATFORM_INTERMEDIATE_DIRNAME = {

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

Are these solid for all 64 linux and osx? They look a bit specific. If they are just enough to get CI tests working that's a step forward, but deserve a TODO with issue pointer. On the other hand, if they are solid - they don't look so and desrve a comment pointing off to GCC docs that show they are.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

So this is actually only a path into the specific gcc archive we provide, which is why I was thinking it was ok to lean on. However, the ParseSearchDirs approach actually may be able to get this -- doing it manually results in:

> PATH="${HOME}/.cache/pants/bin/gcc/mac/10.13/7.3.0/gcc/bin" g++ -print-search-dirs | grep -P '^libraries' | sed -re 's#^libraries: =##g' | tr ':' '\n' | sort | uniq | parallel -n1 readlink -f
/Users/dmcclanahan/.cache/pants/bin/gcc/mac/10.13/7.3.0/gcc/lib/gcc/x86_64-apple-darwin17.5.0/7.3.0
/Users/dmcclanahan/.cache/pants/bin/gcc/mac/10.13/7.3.0/gcc/lib
/Users/dmcclanahan/.cache/pants/bin/gcc/mac/10.13/7.3.0/gcc/lib/gcc

This looks like it can be manipulated to find the include dirs we want instead of generating those paths ourselves, which would allow us to revert many of the changes to this file.

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

OK, that would be excellent.

But this clarification also points to this issue pantsbuild/binaries#78; ie: x86_64-apple-darwin17.5.0 is a bit specific, which would be OK iff it corresponded to 10.11.

This comment has been minimized.

@stuhood

stuhood Jul 23, 2018

Member

The fact that this entire class refers to the specifics of our GCC build is worth pointing out in a class level doc most likely.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

Yes, 100% agreed.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

To clarify, 17.5.0 is from the LLVM binary release for OSX -- this is the package we provide for all OSX (because it's the package LLVM provides for all OSX), and is not an artifact of any build process we perform ourselves. I am making the ParseSearchDirs approach work regardless, just wanted to clear up why I thought that path was ok.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

I was having difficulty getting the ParseSearchDirs solution to work, so e67044c scraps that entirely and uses a helper method _get_check_single_path_by_glob() which is used to locate directories within our provided gcc distribution.

I was annoyed that we couldn't just invoke the binary -- although there may be a better way in the future, adding /lib and /include to directory paths (as I mentioned could be doable above) ended up getting way too long and needlessly complex.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

The process of using globs was split off into another pants.backend.native.utils subsystem named ArchiveFileMapper, and docstrings describing that GCC and LLVM rely on the current specific layout of the archive were added as well, in 3051235.


@abstractproperty
def as_c_toolchain(self):
"""???"""

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

Please doc or pass.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

I think this should just be a pass, will make that change.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

Removed in 3441e74!


@abstractproperty
def as_cpp_toolchain(self):
"""???"""

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

Ditto doc or pass.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

Removed in 3441e74!

@@ -92,12 +92,11 @@ class SetupPyNativeTools(datatype([
"""


@rule(SetupPyNativeTools, [Select(CCompiler), Select(CppCompiler), Select(Linker), Select(Platform)])
def get_setup_py_native_tools(c_compiler, cpp_compiler, linker, platform):
@rule(SetupPyNativeTools, [Select(LLVMCToolchain), Select(LLVMCppToolchain), Select(Platform)])

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

This seems like a step back - SetupPyNativeTools which only cares about a CppToolchain and a CToolchain is not LLVM specific and cannot be produced for GCC. Am I missing something or is a TODO/comment missing?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

Totally right! In 3441e74 I moved all the rules and datatypes into python_native_code.py. The rules select_c_toolchain_for_local_dist_compilation and select_cpp_toolchain_for_local_dist_compilation mean that LLVM is used as the default toolchain, but SetupPyNativeTools only requires an injected CToolchain and CppToolchain.

class LLVMCToolchain(datatype([
('llvm_c_compiler', LLVMCCompiler),
('llvm_c_linker', LLVMCLinker),
]), CToolchainProvider):

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

I'm not understanding why class LLVMCCompiler(datatype([('c_compiler', CCompiler)])): pass vs just class LLVMCCompiler(CCompiler): pass, etc. It seems to me this would eliminate the need for both CToolchainProvider and CppToolchainProvider.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

You're completely right. 3441e74 does exactly this. That removed ~70 lines from this diff.

yield GCCCCompiler(gcc.c_compiler())
@rule(GCCCCompiler, [Select(GCC), Select(Platform)])
def get_gcc(gcc, platform):
yield GCCCCompiler(gcc.c_compiler(platform))

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

Why a yield here and below instead of a return? Afaict that just causes a little extra work looping through generator send gymnastics for what will be a blocking call either way:

deps
.then(move |deps_result| match deps_result {
Ok(deps) => externs::call(&externs::val_for(&func.0), &deps),
Err(failure) => Err(failure),
})
.then(move |task_result| match task_result {
Ok(val) => {
if externs::satisfied_by(&context.core.types.generator, &val) {
Self::generate(context, entry, val)
} else {
ok(val)

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

It had slipped my mind that return was allowed for an @rule -- this is using return now.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

It absolutely makes sense, but I wasn't aware before that yield was interpreted to mean this (hadn't thought about it in a while) -- this makes the execution process of @rules more clear to me.

library_dirs=[])
library_dirs=[],
linking_library_dirs=[],
extra_args=['-mmacosx-version-min=10.11'])

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

The 10.11 seems arbitrarty / tied to Pants current min version support. Does this deserve to be lifted out to a constant used to generate all such flags?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

It does! And it was, in MIN_OSX_VERSION_ARG from b2a750b.

return NativeToolchain.scoped_instance(self)

def get_compile_settings(self):
return CppCompileSettings.scoped_instance(self)

@memoized_property
def _cpp_toolchain(self):
llvm_cpp_toolchain = self._request_single(LLVMCppToolchain, self._native_toolchain)

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

Another instance of regressing? to LLVM lock in here in a presumably generic CppCompile class.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

Yep! This was fixed in 6843579! Thanks for describing the regressions!

return NativeToolchain.scoped_instance(self)

@memoized_property
def _cpp_toolchain(self):
llvm_cpp_toolchain = self._request_single(LLVMCppToolchain, self._native_toolchain)

This comment has been minimized.

@jsirois

jsirois Jul 23, 2018

Member

More lock in.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 23, 2018

Contributor

Also should be fixed by 6843579!

@stuhood
Copy link
Member

stuhood left a comment

John has provided a very useful review for this, so I'll stay out of it. But will say that what you've done to inline things into src/python/pants/backend/native/subsystems/native_toolchain.py makes a lot of sense to me, and removes a bit of unnecessary abstraction.

def c_compiler(self):
exe_filename = 'gcc'
path_entries = self.path_entries()
_PLATFORM_INTERMEDIATE_DIRNAME = {

This comment has been minimized.

@stuhood

stuhood Jul 23, 2018

Member

The fact that this entire class refers to the specifics of our GCC build is worth pointing out in a class level doc most likely.

@@ -30,7 +30,9 @@ def linker(self):
return Linker(
path_entries=self.path_entries(),
exe_filename='ld',
library_dirs=[])
library_dirs=[],

This comment has been minimized.

@CMLivingston

CMLivingston Jul 23, 2018

Contributor

Slightly confused in the distinction between this param and the one below it. This is slightly ambiguous to me as well:
@abstractproperty def library_dirs(self): """Directories containing shared libraries required for a subprocess to run."""

Maybe add an abstract property with a docstring to the Linker class?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

This makes things a lot neater! Added LinkerMixin in b2a750b!

libc_dev = yield Get(LibcDev, NativeToolchain, native_toolchain)
working_linker = Linker(
path_entries=(base_linker.path_entries + working_c_compiler.path_entries),
exe_filename=working_c_compiler.exe_filename,

This comment has been minimized.

@CMLivingston

CMLivingston Jul 23, 2018

Contributor

So we are still linking through the compiler frontend? Based on path_entries, it looks like the we will hit our linker first, however why not explicitly invoke it? I know you explained this to me once but I'm having a tough time recalling...I think it had to do with critical search dirs that we are unable to find by directly executing the linker?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

It has been very difficult to invoke the linker directly when I've tried, for C or C++. If you try adding -v to the extra_args for a Linker somewhere you can see the (extremely long) generated command line -- I have not been able to make it work yet, after trying for a while, and as far as I can tell there is no reason we would to invoke the linker directly, unless the compiler frontend is pulling in something we don't want (which is what we are using the extra_args to avoid, with arguments such as -nostdinc++).

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

(to clarify, I would love to invoke the linker directly, but it's not something that has ever borne fruit every time I've tried. if there's a way to do this that I'm missing I'm all ears)

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

(what I said to you before was a condensed version of that)

This comment has been minimized.

@CMLivingston

CMLivingston Jul 24, 2018

Contributor

Sure, I don't think this is a problem if we are hitting our linker regardless. If we find that the frontend is making things too difficult to build/debug, we can lean into direct invocation.

'-x', 'c++', '-std=c++11',
# These mean we don't use any of the headers from our LLVM distribution.
'-nobuiltininc',
'-nostdinc++',

This comment has been minimized.

@CMLivingston

CMLivingston Jul 23, 2018

Contributor

This implies that we will always be relying on system headers/includes to be there for any calls to std::*, correct?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

No, the opposite -- these flags stop clang from searching for or using the system headers (because we don't control what it finds in that way) except for the ones we provide via *_INCLUDE_PATH. We add the system headers that we control / know about specifically later in the @rule, either from XCodeCLITools or GCC.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

Expanded the definition of these more in 973f26e. Also, -nobuiltininc may not exist, or may not exist on Linux? I'm not sure, but it's not being recognized now -- will see if it's required on OSX.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

clang++ --help is the only way I've been getting information about what any of these options do.

This comment has been minimized.

@CMLivingston

CMLivingston Jul 24, 2018

Contributor

👍

extra_args=llvm_cpp_compiler_args)
linking_library_dirs = provided_gpp.library_dirs + provided_clang.library_dirs
# Ensure we use libstdc++, provided by g++, during the linking stage.
linker_extra_args=['-stdlib=libstdc++']

This comment has been minimized.

@CMLivingston

CMLivingston Jul 23, 2018

Contributor

Am I correct in that we are using the provided lib dirs for linking, but not using them for compilation? This is for LLVM-based toolchains only, correct?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

linking_library_dirs aren't used in compilation, only the compiler object is used. This is only for LLVM toolchains -- this @rule is producing the single LLVMCppToolchain. The -stdlib argument is only recognized on clang, see the libcxx site (the new LLVM C++ standard library that we are turning off with this option, because it is not complete on Linux yet). See the LinkerMixin introduced in b2a750b which was added in response to your other review comment, which makes the difference between these directories more clear.

@CMLivingston
Copy link
Contributor

CMLivingston left a comment

Thanks for circling back to fix these TODOs. A few questions from me, but I would like to hear your take on the llvm-locking happening in the native tasks (w.r.t. John's comments) before I approve these changes.

It would also be a great idea to get one more member of the Build team to give this a look to promote knowledge sharing of the native compilation support.

cosmicexplorer added some commits Jul 23, 2018

centralize -mmacosx-version-min=10.11 argument creation
also introduce `LinkerMixin` as per review comments

cosmicexplorer added some commits Jul 24, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 24, 2018

I removed some arguments to clang and clang++ while iterating on Linux that turned out to be necessary for OSX, but it should all be sorted out now (we'll see what travis has to say about that). All comments above should have been addressed.

cosmicexplorer added some commits Jul 24, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 24, 2018

Green, and OP updated.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 24, 2018

Also broke out #6224 for further documentation of what we are doing and why we do what we are doing in native_toolchain.py.


if self.include_dirs:
ret['C_INCLUDE_PATH'] = create_path_env_var(self.include_dirs)
ret = super(CCompiler, self).as_invocation_environment_dict.copy()

ret['CC'] = self.exe_filename

This comment has been minimized.

@CMLivingston

CMLivingston Jul 24, 2018

Contributor

Have you been able to verify that setup.py compilation is taking place with the correct exe? I recall we hit some cases where it wasn't actually respecting this variable.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

With the correct exe_filename? There is no testing that the setup.py compilation does anything other than succeed, currently, but if you make the build fail (e.g. by inserting a syntax error into a C/C++ source), pex will print the stdout and stderr to the terminal stderr, and you can see that modifying exe_filenames (or e.g. selecting GCCCToolchain instead of the llvm one in python_native_code.py) will change the compiler used for the generated command line to build setup.py native sources.

At this point, I'm not sure what we should be testing wrt setup.py compilation other than success. If you have some itches you want to scratch, an issue just listing what tests we should add, or a PR doing that, would be great so I or someone else can address it in full. This wouldn't need to more than a few sentences.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

And that may have been the case (wrt not respecting CC), but I don't remember the situation, and the issue may have been e.g. that we were adding the wrong entries to the PATH.

@jsirois
Copy link
Member

jsirois left a comment

A general unease I have with new and existing code here is the proliferation of tiny rules that don't obviously do anything expensive. Some definitely just construct a datatype instance from other datatype instances - which is not good. Others are less clear. The problem though is that each of these tiny rules will run on a different machine in the future! That's completely un-necessary and although it would be nice to not have to think about that when writing rules, I think - currently at least - you do have to consider that.

@@ -129,3 +139,157 @@ def check_build_for_current_platform_only(self, targets):
'native code. Please ensure that the platform arguments in all relevant targets and build '
'options are compatible with the current platform. Found targets for platforms: {}'
.format(str(platforms_with_sources)))


@rule(CToolchain, [Select(PythonNativeCode)])

This comment has been minimized.

@jsirois

jsirois Jul 24, 2018

Member

CToolchain but you limit to LLVMCToolchain here (GCCCToolchain is ruled out). Why is this OK? Is this required to avoid the central problem that you have 2 producers for CToolchain components in GCC and LLVM registered rules with the engine, and to request a CCompiler straight up would thus blow up with an ambiguous producer? A comment / TODO to address would be good.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

100% correct on why this is required. This concern was vaguely addressed with the TODO(#4020): These classes are performing the work of variants. I can add your comment to the usage here to make it clear how variants would improve the ux.

This comment has been minimized.

@jsirois

jsirois Jul 24, 2018

Member

A comment would be great, but this is not just UX, it's a bug. There is currently no way to select GCC.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

Hm, my comment was under the assumption that it was possible to select GCC for C/C++ elsewhere (this is done in test_native_toolchain.py) -- we just currently elect to use the LLVM toolchain when compiling python_dist()s without making that configurable yet. Am I missing something?

I just added 4f5194c which removes the rules which provide a bare CToolchain or CppToolchain, and expands the explanation of why the wrappers are necessary/how to request them. EDIT: the right commit is ac543fe now.

This comment has been minimized.

@jsirois

jsirois Jul 24, 2018

Member

Sure, but in the test you select GCC explicitly. What's the point of the generic CToolchain if you know you're getting an LLVMCToolchain or GCCCToolchain toolchain because you ask for one of those directly by type. There is perhaps more abstraction than needed.

class CToolchain(datatype([('c_compiler', CCompiler), ('c_linker', Linker)])): pass


class LLVMCToolchain(datatype([('c_toolchain', CToolchain)])): pass

This comment has been minimized.

@jsirois

jsirois Jul 24, 2018

Member

I'm still missing the need for indirection in CToolchain and CppToolchain variants. Why not just subtype directly?:

class CToolchain(datatype([('c_compiler', CCompiler), ('c_linker', Linker)])): pass


class LLVMCToolchain(CToolchain): pass


class GCCCToolchain(CToolchain): pass

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

Because then I can't figure out how to do e.g. this in python_native_code.py:

@rule(CToolchain, [Select(PythonNativeCode)])
def select_c_toolchain_for_local_dist_compilation(python_native_code):
  return Get(LLVMCToolchain, NativeToolchain, python_native_code.native_toolchain)


@rule(CppToolchain, [Select(PythonNativeCode)])
def select_cpp_toolchain_for_local_dist_compilation(python_native_code):
  return Get(LLVMCppToolchain, NativeToolchain, python_native_code.native_toolchain)

because afaik the engine matches on an Exactly constraint when type-checking @rule return types. Just now, when trying the edits in your comment and changing python_native_code.py to the above, I get this error message excerpt when running ./pants test tests/python/pants_test/backend/python/tasks:python_native_code_testing -- -vsk test_python_create_platform_specific_distribution:

E         ExecutionError: Received unexpected Throw state(s):
                     E         Computing Select(<pants.backend.python.subsystems.python_native_code.PythonNativeCode object at 0x11141ad90>, =SetupPyNativeTools)
                     E           Noop(No task was available to compute the value.)

cc @stuhood: is it correct to say this is happening because of an Exactly type constraint on rule results? Or is there something else I'm missing?

This comment has been minimized.

@jsirois

jsirois Jul 24, 2018

Member

I assume by return Get(... you mean yield Get(.... Gets are only meant to work with yields.

This comment has been minimized.

@stuhood

stuhood Jul 24, 2018

Member

Yep. This will give a more useful error post #5788... but currently returning the wrong type from a rule triggers that Noop behaviour.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

Correct -- I had tried both, and yield Get(...) also fails, with a StopIteration error, but I had neglected to mention that above.

# TODO: could this kind of @rule be automatically generated?
@rule(SetupPyNativeTools, [Select(CToolchain), Select(CppToolchain), Select(Platform)])
def get_setup_py_native_tools(c_toolchain, cpp_toolchain, platform):
yield SetupPyNativeTools(

This comment has been minimized.

@jsirois

jsirois Jul 24, 2018

Member

return

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

This rule was removed entirely in 9846475!

# only creating an environment at the very end.
native_tools = self.setup_py_native_tools
if native_tools:
# TODO: an as_tuple() method for datatypes could make this destructuring cleaner!

This comment has been minimized.

@jsirois

jsirois Jul 24, 2018

Member

I think this is actually a sign you should eliminate the middleman. See comment below RE SetupPyNativeTools rule.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

I have so far ended up keeping the SetupPyExecutionEnvironment for now, but removing all the rules associated with it. This is because the right environment is difficult to create simply by composing together other executable environments (for now) -- but I did expand the comment here and remove the TODO in 7ebfe51.

"""


# TODO: could this kind of @rule be automatically generated?

This comment has been minimized.

@jsirois

jsirois Jul 24, 2018

Member

Any time you have a rule like this I think its a sign you don't need a rule like this - for 2 reasons:

  1. You're likely creating a middleman wrapper struct - if so, let the folks needing the struct components ask for them directly.
  2. You're just running non-io python code - this is generally an abuse of parallelism and in-efficient, just run that code where you need to run it, not off in a rule.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 24, 2018

Contributor

This was a very helpful comment! Along these lines, I was actually able to remove all the rules from the python backend in 9846475 and request LLVMCToolchain/etc directly.

cosmicexplorer added some commits Jul 24, 2018

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:refactor-llvm-gcc-toolchain-selection branch from 4f5194c to ac543fe Jul 24, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 24, 2018

In response to the above and other comments on individual lines in the diff, I was able to remove all of the rules from the python backend. I would strongly appreciate not having to mix concerns of the execution model with the @rule interface -- I'm really digging the use as a generic typed dependency injection facility. I will see if it would be reasonable to intersperse a non-futurized/parallelized execution model at some point in the future into the one we have right now to avoid having to run certain logic in the coroutine execution model, if only because, incredibly, that is something I was thinking a lot about before I ever started working on pants.

@Eric-Arellano
Copy link
Contributor

Eric-Arellano left a comment

Looks Py3 compliant to me.

@cosmicexplorer cosmicexplorer merged commit 2fc57cb into pantsbuild:master Jul 24, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

cosmicexplorer added a commit that referenced this pull request Jul 28, 2018

Associate cli arguments with executables and refactor llvm/gcc c/c++ …
…toolchain selection (#6217)

### Problem

#5951 explains the problem addressed by moving CLI arguments to individual `Executable` objects -- this reduces greatly the difficulty in generating appropriate command lines for the executables invoked. In this PR, it can be seen to remove a significant amount of repeated boilerplate.

Additionally, we weren't distinguishing between a `Linker` to link the compiled object files of `gcc` or `g++` vs `clang` or `clang++`. We were attempting to generate a linker object which would work with *any of* `gcc`, `g++`, `clang`, or `clang++`, and this wasn't really feasible. Along with the above, this made it extremely difficult and error-prone to generate correct command lines / environments for executing the linker, which led to e.g. not being able to find `crti.o` (as one symptom addressed by this problem).

### Solution

- Introduce `CToolchain` and `CppToolchain` in `environment.py`, which can be generated from `LLVMCToolchain`, `LLVMCppToolchain`, `GCCCToolchain`, or `GCCCppToolchain`. These toolchain datatypes are created in `native_toolchain.py`, where a single `@rule` for each ensures that no C or C++ compiler that someone can request was made without an accompanying linker, which will be configured to work with the compiler.
- Introduce the `extra_args` property to the `Executable` mixin in `environment.py`, which `Executable` subclasses can just declare a datatype field named `extra_args` in order to override. This is used in `native_toolchain.py` to ensure platform-specific arguments and environment variables are set in the same `@rule` which produces a paired compiler and linker -- there is a single place to look at to see where all the process invocation environment variables and command-line arguments are set for a given toolchain.
- Introduce the `ArchiveFileMapper` subsystem and use it to declare sets of directories to resolve within our BinaryTool archives `GCC` and `LLVM`. This subsystem allows globbing (and checks that there is a unique expansion), which makes it robust to e.g. platform-specific paths to things like include or lib directories.

### Result

Removes several FIXMEs, including heavily-commented parts of `test_native_toolchain.py`. Partially addresses #5951 -- `setup_py.py` still generates its own execution environment from scratch, and this could be made more hygienic in the future. As noted in #6179 and #6205, this PR seems to immediately fix the CI failures in those PRs.

CMLivingston pushed a commit to CMLivingston/pants that referenced this pull request Aug 27, 2018

Associate cli arguments with executables and refactor llvm/gcc c/c++ …
…toolchain selection (pantsbuild#6217)

### Problem

pantsbuild#5951 explains the problem addressed by moving CLI arguments to individual `Executable` objects -- this reduces greatly the difficulty in generating appropriate command lines for the executables invoked. In this PR, it can be seen to remove a significant amount of repeated boilerplate.

Additionally, we weren't distinguishing between a `Linker` to link the compiled object files of `gcc` or `g++` vs `clang` or `clang++`. We were attempting to generate a linker object which would work with *any of* `gcc`, `g++`, `clang`, or `clang++`, and this wasn't really feasible. Along with the above, this made it extremely difficult and error-prone to generate correct command lines / environments for executing the linker, which led to e.g. not being able to find `crti.o` (as one symptom addressed by this problem).

### Solution

- Introduce `CToolchain` and `CppToolchain` in `environment.py`, which can be generated from `LLVMCToolchain`, `LLVMCppToolchain`, `GCCCToolchain`, or `GCCCppToolchain`. These toolchain datatypes are created in `native_toolchain.py`, where a single `@rule` for each ensures that no C or C++ compiler that someone can request was made without an accompanying linker, which will be configured to work with the compiler.
- Introduce the `extra_args` property to the `Executable` mixin in `environment.py`, which `Executable` subclasses can just declare a datatype field named `extra_args` in order to override. This is used in `native_toolchain.py` to ensure platform-specific arguments and environment variables are set in the same `@rule` which produces a paired compiler and linker -- there is a single place to look at to see where all the process invocation environment variables and command-line arguments are set for a given toolchain.
- Introduce the `ArchiveFileMapper` subsystem and use it to declare sets of directories to resolve within our BinaryTool archives `GCC` and `LLVM`. This subsystem allows globbing (and checks that there is a unique expansion), which makes it robust to e.g. platform-specific paths to things like include or lib directories.

### Result

Removes several FIXMEs, including heavily-commented parts of `test_native_toolchain.py`. Partially addresses pantsbuild#5951 -- `setup_py.py` still generates its own execution environment from scratch, and this could be made more hygienic in the future. As noted in pantsbuild#6179 and pantsbuild#6205, this PR seems to immediately fix the CI failures in those PRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment