Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The default meson option epoll=false causes a busy loop and segmentation fault. #1391

Closed
amano-kenji opened this issue Feb 3, 2024 · 10 comments
Labels
bug This is not expected behavior, and needs fixing

Comments

@amano-kenji
Copy link
Contributor

amano-kenji commented Feb 3, 2024

If I build janet with the default meson option epoll=false, then the following code causes a busy loop.

(def proc (os/spawn ["ls"] :p {:out :pipe}))
(print (ev/read (proc :out) :all))

(forever
  (ev/sleep 1))

The workaround is to replace ev/sleep with os/sleep, remove {:out :pipe}, or close proc with os/proc-close which closes pipes and calls os/proc-wait if it hasn't been called already.

If I build janet with epoll=true, then the issue goes away.

This issue was separated from #1386

@bakpakin bakpakin added the bug This is not expected behavior, and needs fixing label Feb 3, 2024
@amano-kenji
Copy link
Contributor Author

amano-kenji commented Feb 3, 2024

With epoll=false, the following code causes segmentation fault.

(import spork/sh)
(def devnull (sh/devnull))
(ev/spawn-thread (ev/write devnull "ok"))

Welcome to hell.

@amano-kenji amano-kenji changed the title The default meson option epoll=false causes a busy loop. The default meson option epoll=false causes a busy loop and segmentation fault. Feb 3, 2024
@sogaiu
Copy link
Contributor

sogaiu commented Feb 3, 2024

I got the segfault as well.

I edited src/conf/janetconf.h to have:

#define JANET_EV_NO_EPOLL

and rebuilt janet to reproduce with the code in the comment above.

I don't know how reliable the results are but I used rr to record execution and then snooped around a bit using rr replay.

This is the output I got around the report of a segfault:

Thread 2 received signal SIGSEGV, Segmentation fault.
0x00007f05b167366e in janet_stream_close_impl (stream=<optimized out>) at src/core/ev.c:349
349	        JanetStream *last = janet_vm.streams[j];

I guess that's this line.

@bakpakin
Copy link
Member

The issue is the copying of streams between threads. There is some thread local (interpreter global) state that needs copying. This only exists for the poll implementation, though.

@bakpakin
Copy link
Member

This should be fixed for me as of 12630d3 - the issue was janet_unmarshal_stream didn't re-register the stream with the event loop.

@amano-kenji
Copy link
Contributor Author

amano-kenji commented Feb 20, 2024

I built the latest commit with epoll=false, and this issue can still be reproduced with the sample code above.

@bakpakin
Copy link
Member

Is the segfault still happening? The 100% cpu is not yet addressed

@amano-kenji
Copy link
Contributor Author

At least, the segmentation fault is gone now. The 100% cpu usage is not gone, yet.

@czkz
Copy link
Contributor

czkz commented Mar 13, 2024

I used "TCP Echo Server" example from README as the server and janet -e '(def c (net/connect "127.0.0.1" 8000)) (:write c "abc") (ev/sleep 1e10)' as client.
The server gets stuck in a busy loop after the client sends some data.
Janet compiled with meson setup build && ninja -C build.

0ff8f58 is the first bad commit.
The commit is quite big unfortunately.

Server strace before commit (fd=5 - connection socket):

<...>
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 3, -1
<waits for data>

Server strace after commit (fd=5 - connection socket):

<...>
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 3, -1) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 3, -1) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 3, -1) = 1 ([{fd=5, revents=POLLOUT}])
<...>

From man poll:
POLLIN - There is data to read.
POLLOUT - Writing is now possible.

Janet doesn't want to write anything to the socket, yet still waits for it to be writable. Poll immediately signals that it is writable, hence the busy loop.

@bakpakin
Copy link
Member

Found the issue, the problem is that passing a valid file descriptor to poll with events = 0 immediately returns with POLLHUP.

See https://groups.google.com/g/comp.unix.programmer/c/bNNadBIEpTo/m/G5gs1mqNhbIJ?pli=1 for a conversation and workaround.

@bakpakin
Copy link
Member

@czkz Thanks for that secondary repro, that seems to be a closely related issue with the poll backend. This is not fixed by a9b8f8e however

bakpakin added a commit that referenced this issue Apr 16, 2024
Properly set read_fiber and write_fiber to NULL when unused.
This was causing extra listening in the poll implemenation leading to
busy loops where a read was accidentally listening for POLLOUT.
bakpakin added a commit that referenced this issue Jun 7, 2024
This doesn't seem to reintroduce the original issue. There was
definitely some interplay with #1431

Doing git bisect landed me at commit
2f0c789 as the first bad commit for
issue #1452.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This is not expected behavior, and needs fixing
Projects
None yet
Development

No branches or pull requests

4 participants