Skip to content

Conversation

@Teemperor
Copy link
Contributor

@Teemperor Teemperor commented Jul 11, 2017

With this patch we can finally do make -j200! (And I no longer have to explain to the summer students why they have to type make Cling; make to compile root :) ). We also get the module dependencies for rootcling right.

See the specific commits for the individual fixes.

@Teemperor Teemperor force-pushed the FixCMakeModuleDependencies branch from 5c9cf52 to 23c5a14 Compare July 11, 2017 13:16
@root-project root-project deleted a comment from phsft-bot Jul 11, 2017
@root-project root-project deleted a comment from phsft-bot Jul 11, 2017
@root-project root-project deleted a comment from phsft-bot Jul 11, 2017
@amadio
Copy link
Member

amadio commented Jul 11, 2017

Just for information, this was tested on KNL with 200 threads without a crash. More testing is needed, but I think that this is definitely a step in the right direction. Thanks @Teemperor for worknig on this!

@root-project root-project deleted a comment from phsft-bot Jul 11, 2017
@Teemperor Teemperor force-pushed the FixCMakeModuleDependencies branch 3 times, most recently from 4e228e1 to 896503a Compare July 11, 2017 13:28
@Teemperor Teemperor changed the title Fix race conditions in the build system Fix some race conditions in the build system Jul 11, 2017

std::string getTmpFileName(const std::string &filename) {
return filename + "_tmp";
return filename + "_tmp_" + std::to_string(getpid());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this problematic - this will mean we'll never fix the underlying build system issue that triggers concurrent invocations on the same dictionary. Could you add a note to the relevant Jira ticket https://sft.its.cern.ch/jira/browse/ROOT-8244 that this commit should be reverted once the issue is fixed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part can actually go away now, as far as I can tell. It was needed before because more than one process was trying to make the dictionaries, and one process would move the files of the other, causing a crash as seen here.

Copy link
Contributor Author

@Teemperor Teemperor Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the CMake bug tracker and the CMake mailing list, the underlying issue is a design flaw in Make (without GNU extensions) which prevents from coordinating the generation of custom_command targets between multiple jobs (this is also why Pere doesn't see this in Ninja). See the man page of "add_custom_command" which states that we should never depend on the output files directly, but instead on a custom target. If we don't do this, then the make jobs can't communicate who is responsible for this target and then produce it multiple times (usually twice from what I saw).

We already have this custom target in the current CMakeLists.txt, we just had the wrong dependencies based on it (because of the inconsistent naming). Those missing links are fixed in the commit about fixing the dependencies, which should prevent this issue in the way the CMake manual recommends (even though it seems they still make no guarantee that this works as intended on every platform).

I also added the commented out debug code which helps figuring out if dependencies are missing for rootcling invocations (the developer warnings are enabled by default in CMake, so there isn't a better way of doing this without spamming the compile log). This should fix the problem of "How do we debug this in the future" in a systematic (and better) way than hoping to find race conditions by trial-and-error and trying to increase the error-chance by using always the same file name.

Also, we should behave like every other generator/compiler which tries to further mitigate this problem by choosing process unique tmp names (see clang etc. with their FILE.o-UNIQUE names). We don't know if someone generates build files for something really obscure (e.g. Visual Studio or an old Make version) which chooses the target order in a different way (e.g. on the CMake mailing list people complained that VS seems to have yet another behavior for building these things1 ), so let's at least give those people at least some relief in the future by making it less likely that things blow up when building :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Teemperor Can you drop this commit from the PR? I think you agree it's not needed anymore, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amadio I think I ninja'd you there :)

Copy link
Member

@Axel-Naumann Axel-Naumann Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Teemperor

We don't know if someone generates build files for something really obscure (e.g. Visual Studio or an old Make version) which chooses the target order in a different way

That's not the issue.

What you're saying is that build systems might trigger concurrent commands to generate the same target. That's horrendous. It's a bug. It would mean e.g. that the same library is linked twice in parallel. I buy your argument that this is a consequence of your CMake-based configuration to use add_custom_command.

I'd hope that we can not use add_custom_command and instead use something that allows CMake to construct a proper dependency tree.

Can we keep the original tmp name, issue an warning message if that file exists and then go use a unique name? That would allows us to detect whether the build system is generating the same dictionary in multiple concurrent processes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Axel-Naumann nope, no way around add_custom_command and add_custom_target here. LLVM is in the same situation with TableGen (even though I'm not sure if they do unique temp files). But we can easily check from CMake if someone forget to link to the custom_target as we have quite obvious naming conventions (and we further prevent this when we refactor most stuff to use the ROOT_STANDARD_WHATEVER function from RootNewMacros).

I prefer the first suggestion with reverting the "_tmp" commit before we start turning rootcling into a tool for checking our CMake build system files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Teemperor . Anything that prevents CMake from spawning two rootcling processes to build the same product is majorly appreciated by anyone building with LLVM_BUILD_TYPE=Debug! :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Teemperor ping - can this be reverted so we see build system issues again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Teemperor and/or should be it be replaced by a 'if already exist' fail?

string(REPLACE "multi" "" header ${header})
ROOT_GENERATE_DICTIONARY(G__std_${dict} ${header} STAGE1 MODULE ${dict}Dict LINKDEF src/${dict}Linkdef.h DEPENDENCIES Core)
ROOT_LINKER_LIBRARY(${dict}Dict G__std_${dict}.cxx DEPENDENCIES Core)
ROOT_LINKER_LIBRARY(std_${dict} G__std_${dict}.cxx DEPENDENCIES Core)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this maybe the source of https://sft.its.cern.ch/jira/browse/ROOT-8244 ? How does this renaming from ${dict}Dict to std_${dict} relate to ROOT_GENERATE_DICTIONARY's MODULE ${dict}Dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see the commit about fixing the naming which states that this prevents dependency linking between compiling and generating (and my comment above).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the cause. I added a comment in ROOT-8244 explaining the problem.

@Teemperor
Copy link
Contributor Author

I also added the dependencies for rootcling regarding the BUILTINS. This should prevent the unlikely case that rootcling starts before the BUILTINS are ready (which is more likely once we don't have to build LLVM anymore).

@Teemperor Teemperor force-pushed the FixCMakeModuleDependencies branch 4 times, most recently from d0a2a2a to 330ecf2 Compare July 11, 2017 16:09
@Teemperor
Copy link
Contributor Author

Ok, seems like touching the module names is making everything collapse. Let's just change the G__*.cxx filenames...

@Teemperor Teemperor force-pushed the FixCMakeModuleDependencies branch 2 times, most recently from 6d5e4ff to 2ec7efc Compare July 11, 2017 18:26
@Teemperor
Copy link
Contributor Author

Ok, all tests pass now on all platforms. I don't see any traces for missing dependencies anymore (e.g. multiple Generating G__* messages for the same target).

ROOT_GENERATE_DICTIONARY allows to specify dependencies, but we currently
don't actually specify those. As we need this for getting the module
dependencies right (as we can't build missing modules on demand), we should
add those dependencies here. Then those dependencies propagate to the
rootcling invocation which will in the future also generate the C++ module
for the selected dictionary.
In case CMake spawns multiple instances of a rootcling invocation
(which can happen as CMake doesn't have a good mechanism for ensuring
that we only spawn one when using the make build system), we now
choose process-unique file names for the temporaries to prevent the
build errors when multiple rootcling invocations try to work
on the same temporary file.

This isn't a real fix, but as renaming files from temporary to the
real name is usually FS atomic, this will mitigate any future race
conditions we have in this problematic part of the building process
on obscure build systems. Those systems possible don't strictly follow the
order in which they are supposed to build those targets, so this hopefully
makes the consequences of any race conditions less intrusive  (e.g. Visual
Studio also seems to suffer from this problem according to the mailing
list).
@Teemperor Teemperor force-pushed the FixCMakeModuleDependencies branch from 2ec7efc to d25a03d Compare July 12, 2017 06:07
@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

The naming of the library in the two function calls ROOT_GENERATE_DICTIONARY
and ROOT_LINKER_LIBRARY was often not matching. This causes that
ROOT_LINKER_LIBRARY couldn't correctly set its dependencies to the output
file of ROOT_GENERATE_DICTIONARY and therefore causes race conditions in
the build system.

Note: We're attached all dependencies to a custom target, and
NOT to the output files generated by rootcling. This should mitigate
the race conditions we experience when multiple targets in different
Make jobs request the same output file as a dependency and then
suffer from this Make problem that is described here:
https://cmake.org/Bug/view.php?id=10082

This patch also adds a (commented out) piece of code that can print
warnings if we don't have a fitting G__*.cxx file for a
ROOT_LINKER_LIBRARY call. This is sometimes intended, so I didn't
enable this warning by default. We should enable this code
by default in the future once we have a way to express if we
intentionally don't provide the G__*.cxx file when calling
ROOT_LINKER_LIBRARY.
@Teemperor Teemperor force-pushed the FixCMakeModuleDependencies branch from d25a03d to 8dd8abc Compare July 12, 2017 06:10
@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

Right now rootcling doesn't know if the headers depend on a
target that is provided by a BUILTIN. This patch adds a new
parameter that allows expressing if thie rootcling invocation
depends on a certain BUILTIN in a way that is identical to the
ROOT_LINKER_LIBRARY way of handling this.
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@root-project root-project deleted a comment from phsft-bot Jul 12, 2017
@amadio
Copy link
Member

amadio commented Jul 12, 2017

@Axel-Naumann Could you please take a final look and merge? I believe this PR fixes ROOT-8244. I tested with make on KNL with 256 threads and on a dual-socket Xeon machine (72 threads) and don't see race conditions anymore.

@Teemperor Thanks for fixing this!


include_directories(${X11_INCLUDE_DIR})

ROOT_GENERATE_DICTIONARY(G__X3D TViewerX3D.h TX3DFrame.h MODULE ${libname} LINKDEF LinkDef.h DEPENDENCIES Graf3d Gui)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should be using G__${libname} all over the place to keep things consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but that's out of the scope of this particular PR. It can be addressed in a new PR later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two single calls will anyway disappear when we can replace them with the ROOT_STANDARD_LIBRARY_PACKAGE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amadio, I feel fairly strongly about this. If we aim to improve something we have to go the last mile.

@Teemperor, then let's replace this with ROOT_STANDARD_LIBRARY_PACKAGE. I think we can afford to wait for a few days and fix this once and for all.

@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

So far GENERATE_DICTIONARY depends on all the targets passed to its
DEPENDENCIES argument. However, this means for some targets not
only the generation of PCMs, rootmaps etc, but also the linking of
this target (as for example the target `Hist` generates Hist.pcm
and then also links libHist.so). We only care about the files
generated by rootcling when we specify the dependencies here,
so we can improve build performance here.

This patch creates a new target for each dictionary generation
command called ROOTCLING_{MODULE} which can be refernced by other
rootcling invocations to state that they depend on the
rootcling files (PCMs, rootmap, root-PCMs) of this module, but
not on a fully built module. We then start checking for each
dependency passed to GENERATE_DICTIONARY if there is a
ROOTCLING_MODULE target and depend on this if possible.

This should cause that all the rootcling invocations can be
started earlier and the linking of modules and rootcling
invocations of its dependencies now run in parallel.
@amadio amadio merged commit a9d16cc into root-project:master Jul 12, 2017
@vgvassilev
Copy link
Member

@amadio, I'd prefer @Axel-Naumann to have given his green light and my comments to be addressed. Why did you find this so urgent to merge?

@amadio
Copy link
Member

amadio commented Jul 12, 2017

We need to test thoroughly, and I wanted to give it a go at the nightlies.

@vgvassilev
Copy link
Member

Let's take this discussion offline.

@Teemperor Teemperor deleted the FixCMakeModuleDependencies branch August 23, 2017 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants