-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: optimize auth/etcdserver logs to facilitate troubleshooting data inconsistency #11670
Conversation
be3177e
to
d688e97
Compare
Codecov Report
@@ Coverage Diff @@
## master #11670 +/- ##
==========================================
- Coverage 66.65% 65.83% -0.82%
==========================================
Files 402 402
Lines 36655 36673 +18
==========================================
- Hits 24432 24145 -287
- Misses 10734 11020 +286
- Partials 1489 1508 +19
Continue to review full report at Codecov.
|
d688e97
to
d17dca1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content LGTM. The commit logs should be auth: ...
. etcd's commit log format is <package name>: <a short description of the commit>
.
d17dca1
to
e9d2b42
Compare
e9d2b42
to
0d084d3
Compare
we encounter another serious data inconsistency issue and can make sure that it has nothing to do with auth revision. we are striving to reproduce it but there is no valid information in etcd log, so it is better to print warning log when failed to apply request. i have added it in this pr. |
@@ -132,6 +132,9 @@ func (a *applierV3backend) Apply(r *pb.InternalRaftRequest) *applyResult { | |||
ar := &applyResult{} | |||
defer func(start time.Time) { | |||
warnOfExpensiveRequest(a.s.getLogger(), start, &pb.InternalRaftStringer{Request: r}, ar.resp, ar.err) | |||
if ar.err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering how much does this affect server throughput. Maybe use debug level to print?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging overhead will be introduced only for failed requests, so I feel adding the warning log would help troubleshooting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, there is a known data inconsistency bug which is fixed by #11613. |
526add1
to
e811990
Compare
e811990
to
5a17367
Compare
We have excluded this bug and have reproduced it, and we are investigating further. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, defer to @jingyih
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There are three minor changes:
add warning log for troubleshooting when auth revision is inconsistent(current,no log is printed when node failed to apply command because of inconsistent auth revision).
for example.
17:04:56 etcd1 | {"level":"warn","ts":"2020-03-03T17:04:56.173+0800","caller":"auth/store.go:856","msg":"request auth revision is less than current node auth revision","auth store revision":23,"request auth revision":19,"request key":"hello","error":"auth: revision in header is old"}
no need to save consistentIndex in NewAuthStore(no command is executed) and it will cause the error log to be printed when starting an empty etcd cluster.
print warning log when failed to apply request.