Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running CLI commands without options segfaults. #64

Closed
richc-admin-gcai opened this issue Sep 13, 2021 · 2 comments · Fixed by natefoo/drmaa-utils#5
Closed

Running CLI commands without options segfaults. #64

richc-admin-gcai opened this issue Sep 13, 2021 · 2 comments · Fixed by natefoo/drmaa-utils#5

Comments

@richc-admin-gcai
Copy link

Testing slurm drmaa in a container, but even when running outside of a container either building from source or installing via galaxy rpm every time I run binary its segfaults.

am I missing something?

Error is:
[root@f8ddc11bc51e /]# DRMAA_LIBRARY_PATH=/usr/lib64/libdrmaa.so /usr/bin/drmaa-run
Segmentation fault (core dumped)

Backtrace shows:

[root@f8ddc11bc51e /]# export DRMAA_LIBRARY_PATH=/usr/lib64/libdrmaa.so
[root@f8ddc11bc51e /]# gdb /usr/bin/drmaa-run
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/bin/drmaa-run...Reading symbols from /usr/lib/debug/usr/bin/drmaa-run.debug...done.
done.
(gdb) run
Starting program: /usr/bin/drmaa-run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00000000004129b6 in parse_args (argc=0, argv=0x7fffffffe7a0) at drmaa_run.c:254
254 while (argc >= 0 && argv[0][0] == '-')
(gdb) backtrace
#0 0x00000000004129b6 in parse_args (argc=0, argv=0x7fffffffe7a0) at drmaa_run.c:254
#1 0x00000000004120df in main (argc=1, argv=0x7fffffffe798) at drmaa_run.c:122
(gdb)

My test setup is as follows:

Dockerfile:
$ cat Dockerfile
FROM centos:7

RUN (cd /lib/systemd/system/sysinit.target.wants/; for i in ; do [ $i == systemd-tmpfiles-setup.service ] || rm -f $i; done);
rm -f /lib/systemd/system/multi-user.target.wants/
;
rm -f /etc/systemd/system/.wants/;
rm -f /lib/systemd/system/local-fs.target.wants/;
rm -f /lib/systemd/system/sockets.target.wants/udev;
rm -f /lib/systemd/system/sockets.target.wants/initctl;
rm -f /lib/systemd/system/basic.target.wants/
;
rm -f /lib/systemd/system/anaconda.target.wants/*;

VOLUME [ "/sys/fs/cgroup"]

RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
RUN yum-config-manager --add-repo https://depot.galaxyproject.org/yum/galaxy.repo

RUN yum -y install which strace gdb
RUN debuginfo-install -y libgcc-4.8.5-44.el7.x86_64
RUN debuginfo-install -y glibc-2.17-324.el7_9.x86_64
RUN yum -y install slurm-slurmd-20.11.8 slurm-devel-20.11.8glibc-2.17-324.el7_9.x86_64

RUN yum clean all && yum -y update

RUN yum -y install slurm-drmaa slurm-drmaa-debuginfo

RUN yum clean all &&
rm -rf /var/cache/yum

VOLUME [ "/sys/fs/cgroup"]

ENTRYPOINT ['/usr/sbin/init']

Which results in a working container, and when I login to the container I'm running:

[root@f8ddc11bc51e /]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

[root@f8ddc11bc51e7 /]# rpm -qa slurm*
slurm-slurmd-20.11.8-1.el7.x86_64
slurm-drmaa-debuginfo-1.1.2-1.el7.x86_64
slurm-20.11.8-1.el7.x86_64
slurm-devel-20.11.8-1.el7.x86_64
slurm-drmaa-1.1.2-1.el7.x86_64

[root@f8ddc11bc51e /]# yum info slurm-drmaa-1.1.2-1.el7.x86_64
Loaded plugins: fastestmirror, ovl
Loading mirror speeds from cached hostfile

  • base: mirrors.vinters.com
  • extras: mirrors.coreix.net
  • updates: mirrors.coreix.net
    Installed Packages
    Name : slurm-drmaa
    Arch : x86_64
    Version : 1.1.2
    Release : 1.el7
    Size : 863 k
    Repo : installed
    From repo : galaxy
    Summary : DRMAA for Slurm
    URL : https://github.com/natefoo/slurm-drmaa
    License : GPLv3+
    Description : DRMAA for Slurm is an implementation of Open Grid Forum DRMAA 1.0 (Distributed
    : Resource Management Application API) specification for submission and control of
    : jobs to SLURM. Using DRMAA, grid applications builders, portal developers and
    : ISVs can use the same high-level API to link their software with different
    : cluster/resource management systems.
@richc-admin-gcai
Copy link
Author

I believe this issues is in:

drmaa_utils/drmaa_utils/drmaa_run_bulk.c: while (argc >= 0 && argv[0][0] == '-')
drmaa_utils/drmaa_utils/drmaa_run.c: while (argc >= 0 && argv[0][0] == '-')

Shouldn't this be:

while (argc > 0 && argv[0][0] == '-')

As if argc = 0, then referencing argv to check for '-' will cause a segfault.

If I make that change then the binaries throw the expected error:

[root@f8ddc11bc51e slurm-drmaa-1.1.2]# ./drmaa-run-bulk
F #9472 [ 0.00] * syntax error
F #9472 [ 0.00] | drmaa-run-bulk {start} {end} {step} {command}

[root@f8ddc11bc51e slurm-drmaa-1.1.2]# ./drmaa-run
F #9473 [ 0.00] * Failed to submit a job: drmaa_remote_command not set for job template

@natefoo
Copy link
Owner

natefoo commented Oct 7, 2021

Your analysis looks correct to me, I'll commit a fix and include it in the next release of slurm-drmaa. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants