CPU pinned / spinning on epoll_wait #53

igrigorik · 2010-12-19T16:45:23Z

Haven't been able to identify the exact culprit, but out of the blue one of our beanstalk processes is pinning the CPU (system). Doing an strace shows:

clock_gettime(CLOCK_MONOTONIC, {3162510, 551700172}) = 0
epoll_wait(4, {{EPOLLIN, {u32=20, u64=20}}, {EPOLLIN, {u32=26, u64=26}}, {EPOLLIN, {u32=23, u64=23}}}, 32, 998) = 3
clock_gettime(CLOCK_MONOTONIC, {3162510, 551773172}) = 0
epoll_wait(4, {{EPOLLIN, {u32=23, u64=23}}, {EPOLLIN, {u32=26, u64=26}}, {EPOLLIN, {u32=20, u64=20}}}, 32, 998) = 3
clock_gettime(CLOCK_MONOTONIC, {3162510, 551844172}) = 0
epoll_wait(4, {{EPOLLIN, {u32=20, u64=20}}, {EPOLLIN, {u32=26, u64=26}}, {EPOLLIN, {u32=23, u64=23}}}, 32, 998) = 3
clock_gettime(CLOCK_MONOTONIC, {3162510, 551916172}) = 0
epoll_wait(4, {{EPOLLIN, {u32=23, u64=23}}, {EPOLLIN, {u32=26, u64=26}}, {EPOLLIN, {u32=20, u64=20}}}, 32, 998) = 3
clock_gettime(CLOCK_MONOTONIC, {3162510, 551987172}) = 0
epoll_wait(4, {{EPOLLIN, {u32=20, u64=20}}, {EPOLLIN, {u32=26, u64=26}}, {EPOLLIN, {u32=23, u64=23}}}, 32, 998) = 3
clock_gettime(CLOCK_MONOTONIC, {3162510, 552059172}) = 0
epoll_wait(4, {{EPOLLIN, {u32=23, u64=23}}, {EPOLLIN, {u32=26, u64=26}}, {EPOLLIN, {u32=20, u64=20}}}, 32, 998) = 3
clock_gettime(CLOCK_MONOTONIC, {3162510, 552131172}) = 0
epoll_wait(4, {{EPOLLIN, {u32=7, u64=7}}, {EPOLLIN, {u32=20, u64=20}}, {EPOLLIN, {u32=26, u64=26}}, {EPOLLIN, {u32=23, u64=23}}}, 32, 998) = 4
clock_gettime(CLOCK_MONOTONIC, {3162510, 552203172}) = 0

Running 1.4.6 -- anyone else experienced this?

lucky · 2011-02-09T23:02:16Z

Haven't seen this. What version of libevent?

lucky · 2011-03-11T17:49:36Z

As an update, I believe we've run into the same behavior. Note that this elevated CPU usage does not correspond with an increase in throughput. Mean reserves per second is 150, though it does spike to 1000/s for very short periods of time (a few minutes).

Here's what out CPU usage looked like before and after this problem cropped up.

We're running a 32-bit EC2 instance (a small):

Ubuntu 10.04.1 LTS
Ubuntu/Debian package of libevent 1.4.13-stable-1 - libc6 (2 2.4)
beanstalkd 1.4.6 (built from source)

CPU Info
model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
stepping : 10
cpu MHz : 2666.760
cache size : 6144 KB

uname -a:
Linux queue-0 2.6.32-305-ec2 #9-Ubuntu SMP Thu Apr 15 04:14:01 UTC 2010 i686 GNU/Linux

kr · 2011-05-08T06:25:17Z

We don't use libevent at all any more, so it's possible the bug is
simply gone. Even if not, it should be easier to fix now.

Can anyone reproduce this with recent code from git (since April 19)?
If so, please reopen the issue.

igrigorik · 2011-05-08T15:42:18Z

Keith, would you consider trunk be "production quality" at this point? We still have the IO pinned instance in prod, which I'd love to fix up, but given that I've had a very had time trying to synthetically reproduce this behavior.. don't want to break other stuff while trying building from master. ;-)

kr · 2011-05-08T22:23:10Z

Understood. Yes, I'd consider it production quality, but if you're
using the binlog, be aware that the format has changed (from
version 5 to 6) and I haven't yet written code to read version 5.
Also, it will change again (to at least version 7) before the next
release, and I wasn't planning on writing a routine to read version 6.

igrigorik · 2011-05-08T23:06:03Z

Ok, thanks, we'll give it a shot and let you know what we find out.

await · 2011-06-10T18:44:29Z

Hi Keith, compiled and deployed the tip of beanstalkd as of June 10, 2011 and I'm seeing the same behavior as described in the bug. It's not completely pegging the cpu, but we're getting the following from an strace. Didn't see the same behavior on 1.4.6. We have a large number of clients doing a 'reserve-with-timeout'. Would that cause timestamp calls in a tight for-loop?

epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 547447}, NULL) = 0
gettimeofday({1307731157, 547522}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 553925}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 563925}, NULL) = 0
gettimeofday({1307731157, 564007}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 573923}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 583918}, NULL) = 0
gettimeofday({1307731157, 583975}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 598114}, NULL) = 0
gettimeofday({1307731157, 598163}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 603916}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 613894}, NULL) = 0
gettimeofday({1307731157, 613947}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 620005}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 633839}, NULL) = 0
gettimeofday({1307731157, 633897}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 643883}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 649118}, NULL) = 0
gettimeofday({1307731157, 649167}, NULL) = 0
epoll_wait(6, {}, 500, 10) = 0
gettimeofday({1307731157, 660758}, NULL) = 0
gettimeofday({1307731157, 660833}, NULL) = 0

kr · 2011-06-15T07:01:40Z

This output looks normal. The process wakes up at least once
every 10ms to see if anything needs to be done (such as moving
a job from the delay queue to the ready queue).

Beanstalkd used to compute the exact amount of time to wait,
so you'd see absolutely no activity when nothing needs to be
done. I changed it to wake up at a fixed period while refactoring
the networking code and removing libevent, because doing it
this way is easier to get right. Eventually I'd like to switch back
to an exact calculation of the sleep time, but that's pretty far
down on the priority list, since the performance impact is tiny,
and it only makes a difference when the system is idle.

kr · 2012-02-05T06:25:19Z

See https://groups.google.com/d/topic/beanstalk-talk/MZNodUgzGHM for
further discussion. I suspect it's the same problem.

Edit: fixed link.

chexov · 2012-02-07T12:15:05Z

Just to keep it in the github issue tracker here is my current situation with
event loop on beanstalkd 1.5 and openbsd (compiled with port-bsd.c, sock-bsd.c)

Here is the loop eating up to 100% CPU (1 core)

CALL kevent(0x6,0,0,0x7f7ffffc5f80,0x1,0x829e80)
RET kevent 1
CALL gettimeofday(0x7f7ffffc5f50,0)
RET gettimeofday 0
CALL read(0x1d,0x207db7854,0xd0)
RET read -1 errno 35 Resource temporarily unavailable
CALL kevent(0x6,0,0,0x7f7ffffc5f80,0x1,0x829e80)
RET kevent 1
CALL gettimeofday(0x7f7ffffc5f50,0)
RET gettimeofday 0
CALL read(0x1b,0x207db7c54,0xd0)
RET read -1 errno 35 Resource temporarily unavailable
CALL kevent(0x6,0,0,0x7f7ffffc5f80,0x1,0x829e80)
RET kevent 1
CALL gettimeofday(0x7f7ffffc5f50,0)
RET gettimeofday 0
CALL kevent(0x6,0,0,0x7f7ffffc5f80,0x1,0x829e80)
RET kevent 1
CALL gettimeofday(0x7f7ffffc5f50,0)
RET gettimeofday 0
CALL kevent(0x6,0,0,0x7f7ffffc5f80,0x1,0x829e80)
RET kevent 1
CALL gettimeofday(0x7f7ffffc5f50,0)
RET gettimeofday 0
CALL read(0x10,0x20cd73054,0xd0)
RET read -1 errno 35 Resource temporarily unavailable
CALL kevent(0x6,0,0,0x7f7ffffc5f80,0x1,0x829e80)
RET kevent 1
CALL gettimeofday(0x7f7ffffc5f50,0)
RET gettimeofday 0
CALL kevent(0x6,0,0,0x7f7ffffc5f80,0x1,0x829e80)
RET kevent 1

kr · 2012-02-13T03:13:16Z

This was separately reported as #96, and it's fixed now.

ghost assigned kr Apr 10, 2011

kr closed this as completed May 8, 2011

kr reopened this Feb 5, 2012

kr closed this as completed Feb 13, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU pinned / spinning on epoll_wait #53

CPU pinned / spinning on epoll_wait #53

igrigorik commented Dec 19, 2010

lucky commented Feb 9, 2011

lucky commented Mar 11, 2011

kr commented May 8, 2011

igrigorik commented May 8, 2011

kr commented May 8, 2011

igrigorik commented May 8, 2011

await commented Jun 10, 2011

kr commented Jun 15, 2011

kr commented Feb 5, 2012

chexov commented Feb 7, 2012

kr commented Feb 13, 2012

CPU pinned / spinning on epoll_wait #53

CPU pinned / spinning on epoll_wait #53

Comments

igrigorik commented Dec 19, 2010

lucky commented Feb 9, 2011

lucky commented Mar 11, 2011

kr commented May 8, 2011

igrigorik commented May 8, 2011

kr commented May 8, 2011

igrigorik commented May 8, 2011

await commented Jun 10, 2011

kr commented Jun 15, 2011

kr commented Feb 5, 2012

chexov commented Feb 7, 2012

kr commented Feb 13, 2012