Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
eventfd: support delayed wakeup for non-semaphore eventfd to reduce c…
…pu utilization For the NON SEMAPHORE eventfd, if it's counter has a nonzero value, then a read(2) returns 8 bytes containing that value, and the counter's value is reset to zero. Therefore, in the NON SEMAPHORE scenario, N event_writes vs ONE event_read is possible. However, the current implementation wakes up the read thread immediately in eventfd_write so that the cpu utilization increases unnecessarily. By adding a configurable delay after eventfd_write, these unnecessary wakeup operations are avoided, thereby reducing cpu utilization. We used the following test code: #include <assert.h> #include <errno.h> #include <unistd.h> #include <stdio.h> #include <string.h> #include <poll.h> #include <sys/eventfd.h> #include <sys/prctl.h> void publish(int fd) { unsigned long long i = 0; int ret; prctl(PR_SET_NAME,"publish"); while (1) { i++; ret = write(fd, &i, sizeof(i)); if (ret < 0) printf("XXX: write error: %s\n", strerror(errno)); } } void subscribe(int fd) { unsigned long long i = 0; struct pollfd pfds[1]; int ret; prctl(PR_SET_NAME,"subscribe"); pfds[0].fd = fd; pfds[0].events = POLLIN; usleep(10); while(1) { ret = poll(pfds, 1, -1); if (ret == -1) printf("XXX: poll error: %s\n", strerror(errno)); if(pfds[0].revents & POLLIN) read(fd, &i, sizeof(i)); } } int main(int argc, char *argv[]) { pid_t pid; int fd; fd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK | EFD_NONBLOCK); assert(fd); pid = fork(); if (pid == 0) subscribe(fd); else if (pid > 0) publish(fd); else { printf("XXX: fork error!\n"); return -1; } return 0; } # taskset -c 2-3 ./a.out The original cpu usage is as follows: 07:02:55 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 07:02:57 PM all 16.43 0.00 16.28 0.16 0.00 0.00 0.00 0.00 0.00 67.14 07:02:57 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 07:02:57 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 07:02:57 PM 2 29.21 0.00 34.83 1.12 0.00 0.00 0.00 0.00 0.00 34.83 07:02:57 PM 3 51.97 0.00 48.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:02:57 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 07:02:59 PM all 18.75 0.00 17.47 2.56 0.00 0.32 0.00 0.00 0.00 60.90 07:02:59 PM 0 6.88 0.00 1.59 5.82 0.00 0.00 0.00 0.00 0.00 85.71 07:02:59 PM 1 1.04 0.00 1.04 2.59 0.00 0.00 0.00 0.00 0.00 95.34 07:02:59 PM 2 26.09 0.00 35.87 0.00 0.00 1.09 0.00 0.00 0.00 36.96 07:02:59 PM 3 52.00 0.00 47.33 0.00 0.00 0.67 0.00 0.00 0.00 0.00 07:02:59 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 07:03:01 PM all 16.15 0.00 16.77 0.00 0.00 0.00 0.00 0.00 0.00 67.08 07:03:01 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 07:03:01 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 07:03:01 PM 2 27.47 0.00 36.26 0.00 0.00 0.00 0.00 0.00 0.00 36.26 07:03:01 PM 3 51.30 0.00 48.70 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Then settinga the new control parameter, as follows: echo 5 > /proc/sys/fs/eventfd_wakeup_delay_msec The cpu usagen was observed to decrease by more than 20% (cpu #2, 26% -> 0.x%), as follows: 07:03:01 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 07:03:03 PM all 10.31 0.00 8.36 0.00 0.00 0.00 0.00 0.00 0.00 81.34 07:03:03 PM 0 0.00 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00 98.99 07:03:03 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 07:03:03 PM 2 0.52 0.00 1.05 0.00 0.00 0.00 0.00 0.00 0.00 98.43 07:03:03 PM 3 56.59 0.00 43.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:03:03 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 07:03:05 PM all 10.61 0.00 7.82 0.00 0.00 0.00 0.00 0.00 0.00 81.56 07:03:05 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 07:03:05 PM 1 0.00 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00 98.99 07:03:05 PM 2 0.53 0.00 0.53 0.00 0.00 0.00 0.00 0.00 0.00 98.94 07:03:05 PM 3 58.59 0.00 41.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:03:05 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 07:03:07 PM all 8.99 0.00 7.25 0.72 0.00 0.00 0.00 0.00 0.00 83.04 07:03:07 PM 0 0.00 0.00 1.52 2.53 0.00 0.00 0.00 0.00 0.00 95.96 07:03:07 PM 1 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 99.50 07:03:07 PM 2 0.54 0.00 0.54 0.00 0.00 0.00 0.00 0.00 0.00 98.92 07:03:07 PM 3 57.55 0.00 42.45 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dylan Yudaken <dylany@fb.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Fu Wei <wefu@redhat.com> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org
- Loading branch information