Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recipe for installing SLURM and friends on Debian 11 #70

Open
judith-ipac opened this issue Jul 21, 2022 · 1 comment
Open

recipe for installing SLURM and friends on Debian 11 #70

judith-ipac opened this issue Jul 21, 2022 · 1 comment

Comments

@judith-ipac
Copy link

judith-ipac commented Jul 21, 2022

Hello and apologies if this question is in the wrong place. We are upgrading from Debian 8 to Debian 11. I am a developer with no particular background in system administration or configuration. Several weeks into a cycle of install/google-error-message/install-something-else, I have installed munge, slurm, slurm-drmaa, and bats(!). slurmctld and slurmd are now running, but calls to drmaa_run_job() result in seg faults. (The surrounding C++ code is copied from our Debian 8 host, where drmaa_run_job() runs successfully.) I'll print some debug output below, but what I'm really looking for is start-to-finish step-by-step instructions for configuring, installing, and running whatever it takes to make SLURM usable on Debian 11. Thanks in advance.

Last few steps of debug output from drmaa_run_job:

d #597f9 [ 40.42] * finalizing job constraints
d #597f9 [ 40.42] * set min_cpus to ntasks: 1
t #597f9 [ 40.42] <- slurmdrmaa_parse_native
ORA-24550: signal received: [si_signo=11] [si_errno=0] [si_code=1] [si_int=0] [si_ptr=(nil)] [si_addr=0x1656]
kpedbg_dmp_stack()+394<-kpeDbgCrash()+204<-kpeDbgSignalHandler()+113<-skgesig_sigactionHandler()+258<-__sighandler()<-0x00007F06CFEC9B71<-slurm_pack_selected_step()+1286<-slurm_send_node_msg()+505<-slurm_send_recv_msg()+66<-slurm_send_recv_controller_msg()+315<-slurm_submit_batch_job()+119<-slurmdrmaa_session_run_bulk()+518<-slurmdrmaa_session_run_job()+179<-drmaa_run_job()+374<-_ZN19custom_code::submit_jobERKN5boost10filesystem4pathES4_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESC_bb()+4407<-0x0000000000000009<-0x7453705F6D00626F

runscript.sh: line 62: 366577 Segmentation fault

Stack trace from gdb:

           Stack trace of thread 366585:
            #0  0x00007f06d1914fe1 raise (libpthread.so.0 + 0x13fe1)
            #1  0x00007f06c254893f skgesigOSCrash (libclntsh.so + 0x267293f)
            #2  0x00007f06c2c63cdd kpeDbgSignalHandler (libclntsh.so + 0x2d8dcdd)
            #3  0x00007f06c2548c12 skgesig_sigactionHandler (libclntsh.so + 0x2672c12)
            #4  0x00007f06d1915140 __restore_rt (libpthread.so.0 + 0x14140)
            #5  0x00007f06cfec9b71 __strlen_avx2 (libc.so.6 + 0x15fb71)
            #6  0x00007f06d0467cb3 n/a (libslurm.so.36 + 0xf8cb3)
            #7  0x00007f06d047c646 n/a (libslurm.so.36 + 0x10d646)
            #8  0x00007f06d0456cf9 slurm_send_node_msg (libslurm.so.36 + 0xe7cf9)
            #9  0x00007f06d0457f72 slurm_send_recv_msg (libslurm.so.36 + 0xe8f72)
            #10 0x00007f06d04580db slurm_send_recv_controller_msg (libslurm.so.36 + 0xe90db)
            #11 0x00007f06d03b76e7 slurm_submit_batch_job (libslurm.so.36 + 0x486e7)
            #12 0x00007f06d05414f1 slurmdrmaa_session_run_bulk (libdrmaa.so.1 + 0xb4f1)
            #13 0x00007f06d054123b slurmdrmaa_session_run_job (libdrmaa.so.1 + 0xb23b)
            #14 0x00007f06d055c133 drmaa_run_job (libdrmaa.so.1 + 0x26133)
            #15 0x000056442ad0bf37 n/a (XXX + 0xd1f37)
            #16 0x0000000000000009 n/a (n/a + 0x0)

Any advice would be greatly appreciated.

@judith-ipac
Copy link
Author

judith-ipac commented Jul 21, 2022

FWIW, when I run "make check" in the slurm-drmaa-1.1.3 repo, it stalls after the first test suite:

============================================================================
Testsuite summary for FedStage DRMAA utilities library 2.0.1

TOTAL: 1

PASS: 1

SKIP: 0

XFAIL: 0

FAIL: 0

XPASS: 0

ERROR: 0

============================================================================

make[4]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils/test'
make[3]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils/test'
make[2]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils/test'
make[2]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils'
make[2]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils'
make[1]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/drmaa_utils'
Making check in slurm_drmaa
make[1]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/slurm_drmaa'
make[1]: Nothing to be done for 'check'.
make[1]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/slurm_drmaa'
Making check in test
make[1]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/test'
make slurm_ping
make[2]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/test'
make[2]: 'slurm_ping' is up to date.
make[2]: Leaving directory 'ROOTDIR/slurm-drmaa-1.1.3/test'
make check-TESTS
make[2]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/test'
make[3]: Entering directory 'ROOTDIR/slurm-drmaa-1.1.3/test'

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant