-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail fast when systemic error occurs in poll #8749
Conversation
AFAIK, we don't deal with EINTR in more than one place, I remember we ever discussed that |
I don't quite get this point, would you share more details about it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong (and it's also wrong in existing code in ae_evport.c).
it doesn't really handle EINTER.
EINTER is actually already implicitly handled by the caller (aeProcessEvents), it'll do a before/after sleep cycle and go back to sleep without processing any file events.
what this does, is terminate the process on any other error (so the title is misleading).
i suppose it may be a good idea to do that, but not in this way.
the perror
is likely to go nowhere (if redis is daemonized).
i suppose that a better way to deal with that is let the caller check for the negative return value and errno
, and handle it.
but actually the caller (aeProcessEvents) doesn't have access to serverLog these days either.
considering the implication of the missing error check (if there are any?), i'm not sure this is worth fixing.
You are right about the title, changed it. As for this fix, I think Redis ought to fail fast when systemic errors (which are irreparable) occur, notifying users about that, otherwise, it will just fail over and over again without being dealt with properly |
you are right.. if such error ever happen, the user will see a busy loop, and no log line or anything that can explain what happens.
i'm not sure there are actually any such errors in reality, and terminating a the server on a temporary error would lead to a regression, which will take time to fix and re-release. but anyway, the current solution of using |
Maybe set up a number of error retry? we will only terminate the redis server when it exceeds that limitation. |
then we use |
i'm not sure we can easily distinguish between a transient error and a repeated one, without adding too much code complexity which honestly i don't think is worth it. regarding the error log, yes, we need to propagate the error upwards and print it there to serverLog. |
So we just include "server.h" in |
i don't think we wanna include server.h in ae.c.
@yossigo WDYT? |
@oranagra I think |
@yossigo note that in ziplist.c and few other places we now include |
So actually we've already broken some boundaries in Redis? |
yes. but including redisassert.h is not like including server.h.. it just exposes panic and assert. |
i don't know why we put energy on that, is there a bad case? |
I agree both options are ok. |
@panjf2000 the link errors are probably easily solvable. the current code may also be ok, maybe a bit more lines, and less hermetic, since some callers may ignore the error, and/or need to know exactly which errors skip, like what aeMain does... |
OK, I will try to fix those link errors first. |
BTW, where exactly should I put those two function symbols in redis-cli and redis-benchmark? I don't want to put it in the wrong place that you're not gonna like it. |
if it were me, i would have probably tried to put them at the very bottom of each C file (below if that's more than one or two line per function, and this code gets complex, maybe we wanna have it shared between them, so maybe cli_common.c, or create a redisassert.c |
0e65802
to
061e437
Compare
061e437
to
e7be49e
Compare
e7be49e
to
d0944fa
Compare
Most of the ae.c backends didn't explicitly handle errors, and instead ignored all errors and did an implicit retry. This is desired for EAGAIN and EINTER, but in case of other systematic errors, we prefer to fail and log the error we got rather than get into a busy loop.
Most of the ae.c backends didn't explicitly handle errors, and instead ignored all errors and did an implicit retry. This is desired for EAGAIN and EINTER, but in case of other systematic errors, we prefer to fail and log the error we got rather than get into a busy loop.
Most of the ae.c backends didn't explicitly handle errors, and instead ignored all errors and did an implicit retry.
This is desired for EAGAIN and EINTER, but in case of other systematic errors, we prefer to fail and log the error we got rather than get into a busy loop.
ae_evport.c
handledEINTR
:redis/src/ae_evport.c
Lines 288 to 296 in fb66e2e
Therefore we should also do that in other polls.