TMVA cannot be initialized properly in distributed Python environments #13798

vepadulano · 2023-10-03T19:45:31Z

Check duplicate issues.

Checked for duplicates

Description

As seen in a few forum posts

https://root-forum.cern.ch/t/issue-with-rdataframe-using-spark-cluster-on-lcg-102-and-higher/56568/3

https://root-forum.cern.ch/t/error-while-using-rdataframe-with-spark-cluster-analytix/56006

https://swan-community.web.cern.ch/t/distributed-rdataframes-with-spark/690

When running on lxplus/SWAN with recent LCG stacks, TMVA fails its part of the Python initialization with the following error

File "/cvmfs/sft.cern.ch/lcg/views/LCG_103swan/x86_64-centos7-gcc11-opt/lib/ROOT/_pythonization/_tmva/__init__.py", line 25, in <module>
    hasRDF = gSystem.GetFromPipe("root-config --has-dataframe") == "yes"
ValueError: TString TSystem::GetFromPipe(const char* command) =>
    ValueError: nullptr result where temporary expected

This has surfaced when users try distributed RDataFrame applications on such platforms, which cannot even start due to the reported error.

Reproducer

See the related forum posts

ROOT version

6.26 and above (based on the LCG stacks reported)

Installation method

LCG build

Operating system

Linux

Additional context

No response

The text was updated successfully, but these errors were encountered:

vepadulano · 2023-10-03T20:44:17Z

The direct reason of the failure seems to be that, at least when using SWAN, the ROOT installation is somehow ill-formed. This is a simpler reproducer that just uses Spark primitives to try to run the root-config command on the worker

In all fairness, I don't understand the reason to call into that command in the first place, which happens here, so I will investigate if that is needed at all.

Which also relied on TSystem::GetFromPipe which sometimes may not work properly for any reason. See root-project#13798 for more details.

dpiparo · 2023-10-04T14:27:09Z

Hello. I think this issue is quite relevant, not only for distributed execution, but also for PyROOT usability in general. This is why I proposed these changes, in case they turn out to seem useful #13803

vepadulano · 2023-10-05T15:21:35Z

Thanks @dpiparo for your PR, after merging I was able to continue the investigation on this issue! After removing this code from the TMVA initialization, we are seeing different problems, so I will open a new issue for those and consider this closed. Still, since all the problems we see are related to the same triggering factor, I will also write it here for completeness. It turns out that the Spark workers have a different set of environment variables from the one available in the client session. In particular, the PATH and ROOT_INCLUDE_PATH are completely removed in the Spark workers, which was also causing the absence of the root-config command

vepadulano added bug affects:master priority:critical affects:6.26 affects:6.28 labels Oct 3, 2023

vepadulano added this to the 6.30/00 milestone Oct 3, 2023

vepadulano self-assigned this Oct 3, 2023

vepadulano changed the title ~~TMVA cannot be initialized properly in distributed environments~~ TMVA cannot be initialized properly in distributed Python environments Oct 3, 2023

vepadulano added a commit to vepadulano/root that referenced this issue Oct 3, 2023

[tmva] Remove unused imports from pythonizations

c85687d

Which also relied on TSystem::GetFromPipe which sometimes may not work properly for any reason. See root-project#13798 for more details.

vepadulano mentioned this issue Oct 3, 2023

[tmva] Remove unused imports from pythonizations #13799

Closed

vepadulano closed this as completed Oct 5, 2023

vepadulano added this to Issues in Fixed in 6.30/00 via automation Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TMVA cannot be initialized properly in distributed Python environments #13798

TMVA cannot be initialized properly in distributed Python environments #13798

vepadulano commented Oct 3, 2023

vepadulano commented Oct 3, 2023

dpiparo commented Oct 4, 2023

vepadulano commented Oct 5, 2023

TMVA cannot be initialized properly in distributed Python environments #13798

TMVA cannot be initialized properly in distributed Python environments #13798

Comments

vepadulano commented Oct 3, 2023

Check duplicate issues.

Description

Reproducer

ROOT version

Installation method

Operating system

Additional context

vepadulano commented Oct 3, 2023

dpiparo commented Oct 4, 2023

vepadulano commented Oct 5, 2023