New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move erts IO Polling to dedicated threads #1552

merged 25 commits into from Oct 2, 2017


None yet
6 participants

garazdawi commented Aug 29, 2017

This PR implements a solution to the problem described in

The implementation moves all IO polling out from the main scheduler loops into separate dedicated thread(s). By default one thread is started to be the poller and all file descriptors (or waitable objects) are handled by it using a kernel-poll mechanism.
It is possible to tune the number of polling threads using the +IOt and +IOPt options. e.g. +IOt 2 will start 2 polling threads, +IOPt 50 will start 50% as many polling threads as there are schedulers.
It is possible to tune the number of poll-sets using the +IOp and +IOPp options. On Linux and BSD it is possible to have 1..t poll-sets configured, on all other OSs the number of poll-sets have to be equal to the number of poll-threads.

On operating systems that can do concurrent updates of the kernel poll-set (linux and bsd), the new implementation has no global locks that need to be taken when updating the poll-set. On operating systems that cannot do concurrent updates, there is still a lock per poll-set that has to be taken.

The +K flag no longer has any effect and is ignored. If the user wants to not use kernel-poll, this can now be achieved through a configure flag when building the emulator.

Implementation details

Linux and BSD

The basic flow of operation for handling a driver_select call is as follows:

  • driver_select is called in a linked-in driver in a scheduler thread
    • the fd is used to hash into a global array to find it's slot
      • the hash is protected by striped mutexes
    • the port that was interested in the fd event in the slot
    • the pollset is modified to reflect the new interest using epoll_ctl(kp_fd, EPOLL_CTL_ADD, fd, { .events = EPOLLIN|EPOLLONESHOT, .data = fd })
  • During all this, the poll-threads are using epoll_wait
    • When the event comes one poll-thread will be woken (because we use ONESHOT)
    • It looks for the correct slot and locks it
    • Delivers the correct signal to the port which calls ready_input/output

Solaris/Windows/non-kernel poll

Basically the same flow happens, the major difference is that the actual inserting into the poll-set is done by the poll-thread itself instead of by the scheduler threads.


I've done quite a bit of benchmarking, mainly to make sure that we don't get any regressions when using this approach. The throughput and latency figures look promising, through I don't really have access to any great benchmarks or hardware to run them on. So if you have any socket IO intensive benchmarks, please run them and report back with the good or bad results.

Outstanding issues

There are a few testcases that still fail on various platforms due to testcase timing issues, and some new testcases have to be written for the new erts options.


@garazdawi garazdawi self-assigned this Aug 29, 2017

@garazdawi garazdawi requested a review from sverker Aug 29, 2017


This comment has been minimized.

Show comment
Hide comment

vans163 Aug 30, 2017


Is it possible to lock the polling threads to specific logical cores?


vans163 commented Aug 30, 2017

Is it possible to lock the polling threads to specific logical cores?


This comment has been minimized.

Show comment
Hide comment

garazdawi Aug 30, 2017


@vans163, at the moment it is not possible to bind the polling threads. I've been thinking that it may be a good thing to make this possible, but it is not implemented yet.


garazdawi commented Aug 30, 2017

@vans163, at the moment it is not possible to bind the polling threads. I've been thinking that it may be a good thing to make this possible, but it is not implemented yet.

sverker and others added some commits Mar 23, 2017

erts: Increase number of check_io-fd-locks
and use correct cache alignment.
erts: Rename ErtsPollSet_ structS to not confuse gdb
by having different structs with same name.
erts: Refactor move check_io interface from sys to check_io
# Conflicts:
#	erts/emulator/beam/erl_process.c
#	erts/emulator/beam/sys.h
#	erts/emulator/sys/common/erl_check_io.c
#	erts/emulator/sys/common/erl_check_io.h
#	erts/emulator/sys/unix/sys.c
erts: Refactor check_io to use one static struct
that is shared between _kp and _nkp versions.
Makes it easier to access in debugger.
erts: temp_alloc can no longer be disabled
temp_alloc is used in such a way that if it ever results
in a malloc/free sequence it will slow down the system
alot. So it will no longer be possible to disable it and
it will not be disabled when using +Mea min.

usable from any (managed?) thread.
erts: Optimize port_task quick allocator
for non scheduler threads by using ERTS_THR_PREF_QUICK_ALLOC_IMPL.
erts: Fix smp_select testcase to use ERL_DRV_USE
This is needed with the new poll-thread implementation
as now closed fd's in the pollset will be triggered much
faster than before.
erts: Remove eager check io
It is not longer relevant when using the poll thread
erts: disable kernel-poll on OS X vsn < 16
kqueue is broken on earlier versions of OS X.
kernel: Rewrite gen_udp_SUITE:read_packet tc
The old testcase did not test anything at all, it seems like
it was written with the non-smp emulator inmind.

@garazdawi garazdawi merged commit 0c6df1f into erlang:master Oct 2, 2017

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress

@mrallen1 mrallen1 referenced this pull request Oct 2, 2017


Discussion for Oct 6 #18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment