Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with running slc6 based GaudiExec #2320

Closed
mesmith75 opened this issue Apr 9, 2024 · 9 comments · Fixed by #2321
Closed

Problems with running slc6 based GaudiExec #2320

mesmith75 opened this issue Apr 9, 2024 · 9 comments · Fixed by #2321

Comments

@mesmith75
Copy link
Contributor

For some very old applications (things requiring slc6) you need to use apptainer as they are not functional with el9.

Should be a very short addition to the run line command.

@heistera
Copy link

heistera commented Apr 9, 2024

Looks like the standard ganga virtualisation, e.g. using something like:

j.virtualization = Apptainer("/cvmfs/cernvm-prod.cern.ch/cvm4")

or

j.virtualization = Apptainer("docker://gitlab-registry.cern.ch/lhcb-core/lbdocker/slc6-build:latest")

does not work for GaudiExec?

@egede
Copy link
Member

egede commented Apr 10, 2024

Looks like the standard ganga virtualisation, e.g. using something like:

j.virtualization = Apptainer("/cvmfs/cernvm-prod.cern.ch/cvm4")

or

j.virtualization = Apptainer("docker://gitlab-registry.cern.ch/lhcb-core/lbdocker/slc6-build:latest")

does not work for GaudiExec?

While it might be possible to get that to work, it will be better to just implement it in a transparent way for the GaudiExec application.

@heistera
Copy link

Looks like the standard ganga virtualisation, e.g. using something like:

j.virtualization = Apptainer("/cvmfs/cernvm-prod.cern.ch/cvm4")

or

j.virtualization = Apptainer("docker://gitlab-registry.cern.ch/lhcb-core/lbdocker/slc6-build:latest")

does not work for GaudiExec?

While it might be possible to get that to work, it will be better to just implement it in a transparent way for the GaudiExec application.

As a user a timely solution would be great. Let me know if and how I could help.

@egede egede changed the title Add option to start GaudiExec in apptainer Problems with running slc6 based GaudiExec Apr 12, 2024
@egede
Copy link
Member

egede commented Apr 12, 2024

I was investigating this a bit further. So there are two issues at play here. I consider a job of the type

j = Job(application = prepareGaudiExec('DaVinci','v39r1p6', myPath='.', platform='x86_64-slc6-gcc49-opt', options=['empty.py'])

where empty.py is a python file with just the line pass in it.

Configuration

If you are on an el9 machine, then j.prepare() will fail for this job as the cmake command fails. If starting Ganga inside a centos7 apptainer, then the j.prepare() step works. Clearly there is an issue that should be fixed there.

Running

Having prepared the job inside a centos7 apptainer, then job can then be submitted (from a standard session running on el9). The job then runs on the Dirac backend just fine. This is compatible with observations from others that jobs start but then crash later. In the JDL for the job when looking at the Dirac monitoring, I see Platform = "x86_64-slc6" which is correct and in the stderr of the job, I also see WARNING:lb-run:Decided best container to use is apptainer which indicates that the job already run inside an apptainer. I indeed confirm this by running the job with the Local backend. So for runtime errors, it looks like a problem with how lb-run works and not a Ganga problem.

@egede
Copy link
Member

egede commented Apr 12, 2024

The command

exec lb-run   --siteroot=${MYSITEROOT:-/cvmfs/lhcb.cern.ch/lib} -c x86_64-slc6-gcc49-opt --path-to-project ${base_dir}/DaVinciDev_v39r1p6 bash

indeed sees you ending up in an slc6 environment

DaVinciDev v39r1p6] DaVinciDev_v39r1p6 $ cat /etc/redhat-release 
Scientific Linux release 6.9 (Carbon)

however, you can't run the configuration step inside that apptainer

[DaVinciDev v39r1p6] DaVinciDev_v39r1p6 $ cmake --build /home/egede/DaVinciDev_v39r1p6/build.x86_64-slc6-gcc49-opt --target ganga-input-sandbox
cmake: symbol lookup error: /cvmfs/lhcb.cern.ch/lib/var/lib/LbEnv/3114/stable/linux-64/bin/../lib/libuv.so.1: undefined symbol: sendmmsg

So we can't run the cmake command inside an slc6 environment (for a DaCinci version that requires slc6), but it works on centos7. Not very helpful.

@mesmith75
Copy link
Contributor Author

Yes we can run the command inside the apptainer. I'll open an MR later today.

@mesmith75
Copy link
Contributor Author

As fair as I can tell it is just the make that needs adjusting. The jobs seem to run automatically with apptainer on the WN.

@egede
Copy link
Member

egede commented Apr 12, 2024

In that case the runtime errors reported are completely unrelated. Let's see.

@mesmith75
Copy link
Contributor Author

Getting the build fixed at least is useful though. I ran an example job fine - the log showed it ran inside a container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants