Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not install a file event to send data to rewrite child when parent stop sending diff to child in aof rewrite. #8767

Merged
merged 3 commits into from
Jul 11, 2021

Conversation

huangzhw
Copy link
Collaborator

Do not install a file event to send data to rewrite child when parent
stop sending diff to child in aof rewrite.

In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.

stop sending diff to child in aof rewrite.

In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.
@huangzhw huangzhw requested a review from yossigo June 3, 2021 12:03
@huangzhw huangzhw requested review from oranagra and removed request for yossigo July 8, 2021 11:35
@oranagra
Copy link
Member

WOW... that's bad.. a busy loop though the kernel eating 100% CPU.
i suppose it was like that since forever.. @huangzhw how did you notice it?

i wonder if we should maybe improve the tests to detect it?
i.e. similar to what i did in replication.tcl (compute_cpu_usage) to measure stime.
although unlike that diskless replication code, which is quite complicated, this one is quite simple, and i don't imagine we'll ever have a regression here, so maybe instead we should try to think of some generic mechanism that can detect these.
@yossigo do you have an idea?

@oranagra oranagra added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Jul 11, 2021
@oranagra oranagra added this to To do in 6.2 Backport via automation Jul 11, 2021
@oranagra oranagra added this to To Do in 6.0 Backport via automation Jul 11, 2021
@huangzhw
Copy link
Collaborator Author

@oranagra I read the code to find it.
But why a busy loop though the kernel eating 100% CPU?
This event is just installed once in an event loop. Then the event handler is executed, the event is removed. Just some useless event but not so severe. If there are no other commands written to aof, this event will not be installed.

@yossigo
Copy link
Member

yossigo commented Jul 11, 2021

@oranagra The only thing I can think of is extend ae to track and report event creation/deletion, theoretically we could then use this info to find unexpectedly large number of operations on given fds in a given time. Practically I doubt if this effort would be very effective - I'm not in favor of it.

@oranagra
Copy link
Member

@huangzhw you're right, it is not as severe as i thought... it just reminded me similar cases we had in the past that could stay in a loop due to epoll being level triggered, getting an event, exiting without doing nothing and then getting it again.

in this case, for each time we wake up due to a socket event and get some write command, we'll wake up just one additional time to do nothing. and each of these is paired with two additional epoll_ctl calls, so in total 3 excessive system calls per event.

@oranagra oranagra merged commit cb961d8 into redis:unstable Jul 11, 2021
@huangzhw huangzhw deleted the aof-rewrite branch July 11, 2021 12:19
x77a1 pushed a commit to x77a1/redis that referenced this pull request Jul 11, 2021
… stop sending diff to child in aof rewrite. (redis#8767)

In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.

This bug used to waste a few system calls per excessive wake-up
(epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving
a write command from a client.
oranagra pushed a commit to oranagra/redis that referenced this pull request Jul 20, 2021
… stop sending diff to child in aof rewrite. (redis#8767)

In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.

This bug used to waste a few system calls per excessive wake-up
(epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving
a write command from a client.

(cherry picked from commit cb961d8)
oranagra pushed a commit to oranagra/redis that referenced this pull request Jul 21, 2021
… stop sending diff to child in aof rewrite. (redis#8767)

In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.

This bug used to waste a few system calls per excessive wake-up
(epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving
a write command from a client.

(cherry picked from commit cb961d8)
oranagra pushed a commit that referenced this pull request Jul 21, 2021
… stop sending diff to child in aof rewrite. (#8767)

In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.

This bug used to waste a few system calls per excessive wake-up
(epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving
a write command from a client.

(cherry picked from commit cb961d8)
oranagra pushed a commit that referenced this pull request Jul 21, 2021
… stop sending diff to child in aof rewrite. (#8767)

In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.

This bug used to waste a few system calls per excessive wake-up
(epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving
a write command from a client.

(cherry picked from commit cb961d8)
@oranagra oranagra moved this from To Do to In progress in 6.0 Backport Jul 22, 2021
@oranagra oranagra moved this from In progress to Done in 6.0 Backport Jul 22, 2021
@oranagra oranagra moved this from To Do to Done in 6.2 Backport Jul 22, 2021
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
… stop sending diff to child in aof rewrite. (redis#8767)

In aof rewrite, when parent stop sending data to child, if there is
new rewrite data, aofChildWriteDiffData write event will be installed.
Then this event is issued and deletes the file event without do anyting.
This will happen over and over again until aof rewrite finish.

This bug used to waste a few system calls per excessive wake-up
(epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving
a write command from a client.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants