Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fails with SIGSEGV #36

Closed
prspkt opened this issue Sep 30, 2020 · 3 comments
Closed

fails with SIGSEGV #36

prspkt opened this issue Sep 30, 2020 · 3 comments

Comments

@prspkt
Copy link

prspkt commented Sep 30, 2020

After a quick build on alpine linux, slurm seems to consistently segfault during runtime.
Stepping through with gdb the error happens here with BUFSIZ - 1 being greater than BUFSIZE.

Applying this patch seems to fix the issue.

@mattthias
Copy link
Owner

Hi,

the answer might come a year to late :-)

BUFSIZ is a macro that is defined in stdio.h (glibc 8192 / musl 1024) and BUFSIZE is set to 256 in src/linux.h.

The fgets tries to read max BUFSIZ-1 (on my amd64 with glibc 8191) chars into a buffer that is only BUFSIZE (256) chars big.

That indeed looks wrong. I was wondering how this could ever work but looking at the first two lines of /proc/net/dev i see that this lines only ~125 chars long. If your slurm really crashes on this code line i would like to see the content of /proc/net/dev.
But in the actual interface / data lines i can imagine that we can reach a line lengths of more then 256 chars.

@klockeph
Copy link

klockeph commented Oct 4, 2021

I just installed slurm on arch linux from the AUR (https://aur.archlinux.org/packages/slurm/) and I experience a similar behaviour; it segfaults after a few seconds of runtime.
While I did not investigate with gdb yet, I imagine it could be the same issue. my proc/net/dev looks like the following

Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo: 391504450597 8543650    0    0    0     0          0         0 391504450597 8543650    0    0    0     0       0          0
enp2s0f0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
wlp3s0: 132889587  223204    0  529    0     0          0         0 32549645   67318    0    0    0     0       0          0
docker0: 144529703   98316    0    0    0     0          0         0 337031942   58305    0    0    0     0       0          0
veth6a04989: 145906127   98316    0    0    0     0          0         0 337033088   58320    0    0    0     0       0          0

@mattthias
Copy link
Owner

Hello @klockeph & @prspkt ,

the issue was solved by this 43ed060

Thanks for your report (even i have a horrible lag in answering / fixing stuff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants