Skip to content

[SR-9033] Dispatch spins in a tight loop when receiving EPOLLHUP #644

@weissi

Description

@weissi
Previous ID SR-9033
Radar rdar://problem/45369001
Original Reporter @weissi
Type Bug
Status Resolved
Resolution Done

Attachment: Download

Additional Detail from JIRA
Votes 0
Component/s libdispatch
Labels Bug, Linux
Assignee None
Priority Medium

md5: 61275b614c6118d52abb224da518046b

is duplicated by:

  • SR-5773 DispatchIO.read on Linux inconsistent with macOS

Issue Description:

Dispatch internally assumes that epoll_wait will only ever send events that Dispatch subscribed to. That's not true however because EPOLLHUP is an unmaskable event that one is always subscribed to. That leads to epoll_wait returning and Dispatch immediately going back into epoll_wait which then returns again --> 100% spin.

man epoll_ctl says:

EPOLLHUP
Hang up happened on the associated file descriptor.
epoll_wait(2) will always wait for this event; it is not nec‐
essary to set it in events.

Note that when reading from a channel such as a pipe or a
stream socket, this event merely indicates that the peer
closed its end of the channel. Subsequent reads from the
channel will return 0 (end of file) only after all outstanding
data in the channel has been consumed.

EPOLLHUB seems to happen in the following events:

  • other end of a FIFO hang up

  • TCP resets

  • other end of a UNIX domain socket hung up

This simple demo program demonstrates that:

import Dispatch
#if os(macOS)
import Darwin
#else
import Glibc
#endif

func withPipe(_ body: (CInt, CInt) -> Void) -> Void {
    var fds: [Int32] = [-1, -1]
    fds.withUnsafeMutableBufferPointer { ptr in
        let err = pipe(ptr.baseAddress!)
        precondition(err == 0)
    }
    body(fds[0], fds[1])
}

withPipe { readFD, writeFD in
    print("readFD=\(readFD), writeFD=\(writeFD)")
    let q = DispatchQueue(label: "q")
    let io = DispatchIO(type: .stream, fileDescriptor: readFD, queue: q) { err in
        print("cleanup, err=\(err)")
        close(readFD)
        print("all done")
        exit(0)
    }
    io.setLimit(lowWater: 0)
    io.read(offset: 0, length: .max, queue: q) { done, data, err in
        print("read: \(done), \((data?.count).debugDescription), \(err)")
        if let data = data, data.count > 0 {
            // will only happen once
            print("closing writeFD")
            close(writeFD)
            q.asyncAfter(deadline: .now() + 1) {
                io.close()
            }
        }
    }
    io.resume()
    print("writing")
    write(writeFD, "x", 1)
    print("wrtten")
    dispatchMain()
}

On Darwin, running it looks like this:

readFD=3, writeFD=4
writing
wrtten
read: false, Optional(1), 0
closing writeFD
read: true, Optional(0), 0
cleanup, err=0
all done
Program ended with exit code: 0

on Linux however we get

readFD=3, writeFD=4
writing
wrtten
read: false, Optional(1), 0
closing writeFD
[hang with 100% CPU spin...]

stracing the program looks like this:

 strace -f -e trace=epoll_ctl,epoll_wait ./main
readFD=3, writeFD=4
Process 103 attached
writing
wrtten
Process 104 attached
Process 105 attached
[pid   105] epoll_ctl(5, EPOLL_CTL_ADD, 6, {EPOLLIN|0x4000, {u32=1, u64=1}}) = 0
Process 106 attached
[pid   106] epoll_wait(5, {{EPOLLIN, {u32=1, u64=1}}}, 16, 0) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLONESHOT, {u32=3, u64=3}}read: false, Optional(1), 0
) = 0
closing writeFD
[pid   106] epoll_wait(5, {}, 16, 0)    = 0
[pid   106] epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLIN, {u32=1, u64=1}}, {EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 2
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[continued forever...]

FWIW, SwiftNIO used to have the same bug: apple/swift-nio@b109389?diff=unified

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions