-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs: change p_job variable type from short to int #763
jobs: change p_job variable type from short to int #763
Conversation
This change addresses an issue where ksh hangs when the number of jobs exceeds `2^15 - 1` (32767), which is the maximum value for a `short`.
src/cmd/INIT/mamake.c: - Option -V: Silence an annoying gcc warning by checking the return value of write(2). It's kind of pointless just for printing the version, but on gcc, (void)write will not suppress the warning. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66425 src/cmd/ksh93/tests/arith.sh: - The Gentoo ebuild explicitly unsets A, which is apparently exported in their environment, killing the arith.sh tests. So this unsets it in the script. https://gitweb.gentoo.org/repo/gentoo.git/tree/app-shells/ksh/ksh-1.0.9.ebuild README.md, src/cmd/ksh93/README: - Fix typos.
I could not run the reproducer without freezing my entire system. Changing In my version of the reproducer, ksh "hung" at 32767 jobs as you expected, but after the first So I don't think this is a really a bug. ksh dutifully waits for some jobs to exit and make their job table entries available so it can create new jobs. This never happens in your reproducer, because your background jobs sleep for infinite time. I am not necessarily opposed to making it possible to have more than 32767 simultaneous jobs, although I do wonder if there is an actual use case for that. Either way, I want to research the possible implications more first. For instance, if the job table can now be up to 2^31-1 (2147483647) entries long, I imagine that could take up a heck of a lot of memory. |
On second thought, it's a bug. The overflow happens exactly on jobs.c line 1210. The job_alloc function happily returns an integer value > 32767. Assigning that to a short is a signed short integer overflow, which is undefined behaviour in C. But on most current systems (all that I know of, anyway), it will simply wrap around to -1, which also happens to be the temporary-failure return code, so it waits. Plus, ksh will re-use low job numbers as they become available again. So, the correct thing accidentally happens and ksh recovers as soon as some jobs exit. But that doesn't make it not a bug. Your PR cleanly fixes it. Thanks for that. |
This change addresses an issue where ksh has undefined behavior when the number of jobs exceeds `2^15 - 1` (32767), which is the maximum value for a `short`. The short integer overflow happened in jobs.c, line 1210.
This change addresses an issue where ksh has undefined behavior when the number of jobs exceeds `2^15 - 1` (32767), which is the maximum value for a `short`. The short integer overflow happened in jobs.c, line 1210.
Actually, I'd say this a bug in
edit: typos |
I can't reproduce that on either macOS or Debian (both arm64). On Debian arm64, I get this strace output for
...and it sleeps, If you'd like to help debug this on Fedora, the relevant sources are at src/lib/libast/tm/tvsleep.c and src/cmd/ksh93/bltins/sleep.c. |
Also, possibly related: #750 |
This PR addresses a problem where the number of jobs exceeds
2^15 - 1
(32767), the variableshort p_job
overflows and ksh hangs (here).How to reproduce?
reproducer.ksh
:and run
ksh -lic '. ./reproducer.ksh'
.This bug was found during a debugging session with @lzaoral and @msekletar - thank you!