Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix module assertion crash when timer and timeout are unlocked in the same event loop #13015

Merged
merged 2 commits into from Jan 31, 2024

Conversation

enjoy-binbin
Copy link
Collaborator

@enjoy-binbin enjoy-binbin commented Jan 30, 2024

When we use a timer to unblock a client in module, if the timer
period and the block timeout are very close, they will unblock the
client in the same event loop, and it will trigger the assertion.
The reason is that in moduleBlockedClientTimedOut we will protect
against re-processing, so we don't actually call updateStatsOnUnblock
(see #12817), so we are not able to reset the c->duration.

The reason is unblockClientOnTimeout() didn't realize that bc had
been unblocked. We add a function to the module to determine if bc
is blocked, and then use it in unblockClientOnTimeout() to exit.

There is the stack:

beforeSleep
blockedBeforeSleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
unblockClient
resetClient
-- assertion, crash the server
'c->duration == 0' is not true

@enjoy-binbin
Copy link
Collaborator Author

enjoy-binbin commented Jan 30, 2024

second commit:

=== REDIS BUG REPORT START: Cut & paste starting from here ===
44562:M 30 Jan 2024 22:24:25.784 # === ASSERTION FAILED ===
44562:M 30 Jan 2024 22:24:25.784 # ==> networking.c:2089 'c->duration == 0' is not true

------ STACK TRACE ------

Backtrace:
0   redis-server                        0x000000010ad9b5fe resetClient + 238
1   redis-server                        0x000000010ae402ba unblockClient + 394
2   redis-server                        0x000000010ae40fae unblockClientOnTimeout + 78
3   redis-server                        0x000000010ae9b356 checkBlockedClientTimeout + 86
4   redis-server                        0x000000010ae9b82f handleBlockedClientsTimeout + 239
5   redis-server                        0x000000010ae410d9 blockedBeforeSleep + 9
6   redis-server                        0x000000010ad6e467 beforeSleep + 279
7   redis-server                        0x000000010ad5fb6f aeProcessEvents + 159
8   redis-server                        0x000000010ad604cf aeMain + 63
9   redis-server                        0x000000010ad7ee22 main + 4066
10  dyld                                0x00007ff817c9c41f start + 1903

------ STACK TRACE DONE ------

@enjoy-binbin
Copy link
Collaborator Author

the second crash seem introduced in #12817, in the test case, we did not clear the duration.
@oranagra i am short on time, please suggest a fix (if you have an idea soon)

    if (!from_module)
        updateStatsOnUnblock(c, bc->background_duration, 0, server.stat_total_error_replies != prev_error_replies);

@oranagra
Copy link
Member

i think we can give up the assertion in resetClient. @sundb PTAL and tell us if you have a better idea.

@sundb
Copy link
Collaborator

sundb commented Jan 31, 2024

@enjoy-binbin Not sure it's a good idea to remove this assertion, i need some time to recall it.

@enjoy-binbin
Copy link
Collaborator Author

enjoy-binbin commented Jan 31, 2024

Not sure it's a good idea to remove this assertion, i need some time to recall it.

sure, take your time, i just gona push the commit first and see how the Daily situation. https://github.com/enjoy-binbin/redis/actions/runs/7720486045

please also take a look at the top comment, see if i get it right.

@enjoy-binbin enjoy-binbin changed the title Fix two crash around module block Fix module two unblock crash: no timeout_callback crash and c->duration assertion crash Jan 31, 2024
@sundb
Copy link
Collaborator

sundb commented Jan 31, 2024

eventloop()
  - timer_callback()
    - RM_UnblockClient()
      - moduleUnblockClientByHandle() /* set bc->unblocked to 1, and add bc to moduleUnblockedClients */
  - handleBlockedClientsTimeout()
    - replyToBlockedClientTimedOut()
      - moduleBlockedClientTimedOut() /* since bc->unblocked is 1, return ASAP and doesn't call  
                                       * updateStatsOnUnblock() */
    - unblockClient() /* crash due to not calling updateStatsOnUnblock() which will
                           reset c->duration */

The main reason for this is that bc was unblocked when the timer timeout, both timeouts were triggered at the same time in the same event loop, but unblockClientOnTimeout() didn't realize that bc had been unblocked.
We can add an method to the module to determine if bc is blocked, and then use it in unblockClientOnTimeout() to exit ASAP.
@oranagra @enjoy-binbin WDYT?

@enjoy-binbin
Copy link
Collaborator Author

The main reason for this is that bc was unblocked when the timer timeout, both timeouts were triggered at the same time in the same event loop, but unblockClientOnTimeout() didn't realize that bc had been unblocked.
We can add an method to the module to determine if bc is blocked, and then use it in unblockClientOnTimeout() to exit ASAP.

great, this is consistent with what i understood in the top comment, and the fix sounds like a good idea

@oranagra
Copy link
Member

ok. good.
maybe fix it in two separate PRs, otherwise i'll have to avoid squash-merging it, or replace the references to first commit and second commit with references to functions.

@enjoy-binbin enjoy-binbin changed the title Fix module two unblock crash: no timeout_callback crash and c->duration assertion crash Fix module assertion crash when timer and timeout are unlocked in the same event loop Jan 31, 2024
@enjoy-binbin
Copy link
Collaborator Author

src/server.h Outdated Show resolved Hide resolved
src/blocked.c Outdated Show resolved Hide resolved
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready for merge?

@enjoy-binbin
Copy link
Collaborator Author

ready for merge?

yes, i think so

@oranagra oranagra merged commit 6016973 into redis:unstable Jan 31, 2024
13 checks passed
@enjoy-binbin enjoy-binbin deleted the fix_module_block branch January 31, 2024 11:11
roggervalf pushed a commit to roggervalf/redis that referenced this pull request Feb 11, 2024
… same event loop (redis#13015)

When we use a timer to unblock a client in module, if the timer
period and the block timeout are very close, they will unblock the
client in the same event loop, and it will trigger the assertion.
The reason is that in moduleBlockedClientTimedOut we will protect
against re-processing, so we don't actually call updateStatsOnUnblock
(see redis#12817), so we are not able to reset the c->duration. 

The reason is unblockClientOnTimeout() didn't realize that bc had
been unblocked. We add a function to the module to determine if bc
is blocked, and then use it in unblockClientOnTimeout() to exit.

There is the stack:
```
beforeSleep
blockedBeforeSleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
unblockClient
resetClient
-- assertion, crash the server
'c->duration == 0' is not true
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants