You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
vepadulano
changed the title
TMVA cannot be initialized properly in distributed environments
TMVA cannot be initialized properly in distributed Python environments
Oct 3, 2023
The direct reason of the failure seems to be that, at least when using SWAN, the ROOT installation is somehow ill-formed. This is a simpler reproducer that just uses Spark primitives to try to run the root-config command on the worker
In all fairness, I don't understand the reason to call into that command in the first place, which happens here, so I will investigate if that is needed at all.
Hello. I think this issue is quite relevant, not only for distributed execution, but also for PyROOT usability in general. This is why I proposed these changes, in case they turn out to seem useful #13803
Thanks @dpiparo for your PR, after merging I was able to continue the investigation on this issue! After removing this code from the TMVA initialization, we are seeing different problems, so I will open a new issue for those and consider this closed. Still, since all the problems we see are related to the same triggering factor, I will also write it here for completeness. It turns out that the Spark workers have a different set of environment variables from the one available in the client session. In particular, the PATH and ROOT_INCLUDE_PATH are completely removed in the Spark workers, which was also causing the absence of the root-config command
Check duplicate issues.
Description
As seen in a few forum posts
https://root-forum.cern.ch/t/issue-with-rdataframe-using-spark-cluster-on-lcg-102-and-higher/56568/3
https://root-forum.cern.ch/t/error-while-using-rdataframe-with-spark-cluster-analytix/56006
https://swan-community.web.cern.ch/t/distributed-rdataframes-with-spark/690
When running on lxplus/SWAN with recent LCG stacks, TMVA fails its part of the Python initialization with the following error
This has surfaced when users try distributed RDataFrame applications on such platforms, which cannot even start due to the reported error.
Reproducer
See the related forum posts
ROOT version
6.26 and above (based on the LCG stacks reported)
Installation method
LCG build
Operating system
Linux
Additional context
No response
The text was updated successfully, but these errors were encountered: