-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly stop logger during (re)connect failure #82
Conversation
4bca5d4
to
24b698a
Compare
The motivation is reasonable, so I'm positive to fix that problem. |
05f5f84
to
9978fb3
Compare
CI is fixed but actually I'm still doing some manual testing to make sure everything is ok. I'll try to add more test cases 😉 |
565a204
to
401c01b
Compare
@NiR- Great! Please let me know when you finished that. |
401c01b
to
8ae97ca
Compare
Any news on this? I'm experiencing the same issue with |
@angulito We first want to merge #83 which is introducing a new way of testing edge cases like the one identified and fixed by this PR. The new test cases and the fix were first submitted in a single PR, but since the diff was quite large and important, I splitted it in two PRs to make it less difficult for the repo's maintainer(s) to review and accept these changes. Hopefully this will be included in the next major docker release. In the mean time, I created a logger plugin including this fix. It's working like the original fluentd logger and it's available in akerouanton/fluentd-async-logger. |
Thank you for working hard on it! @akerouanton |
8ae97ca
to
2317ded
Compare
5a1bb14
to
992f423
Compare
Hey @tagomoris, I took a look at this PR in the past days and I finally found out why tests were failing on the CI. I also overhauled the previous implementation to make the code substantially simpler. CI was failing because of tests with On the overhaul part, I decided to remove the newly introduced This PR is ready to be re-reviewed 🙂 |
I'll try to find time to review this change this or the next week. |
fluent/fluent.go
Outdated
// Unfortunately, using a seeded random number generator isn't enough to ensure | ||
// ack hashes are deterministically generated during tests, thus we need to | ||
// change randomGenerator value to ensure the hashes are stable during tests | ||
// such that we can expect how the logger behaves with RequestAck option enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about replacing this command with make this "var" to replace the random generator in tests
and write the content of the above comment in the test file?
The content is a little complex, and looks almost only about testing.
I took a glance at this, and need more time. |
I reviewed the updated change. The clear problem is, this code will acquire the lock of connection during writing messages and waiting for the ack message. This causes terrible performance degradation if the ack option is enabled. Even without enabling the ack option, the critical section for writing messages could cause performance problems. And I found another problem (this is not introduced by this change) that arriving ack messages may be disordered, but the current implementation doesn't consider such cases. 🤔 |
Hi! I'd like to know if you were able to move forward with this issue. We're still facing this problem described in moby/moby#40063 Thanks! |
We are having this issue as well. Do we know if there's anything preventing a merge? |
Hey @tagomoris,
Hey @akerouanton, |
@hakashya Unordered ack messages are not the problem of this change. Of course, it's good if we can solve it. |
@akerouanton @tagomoris Hi, I'm Wesley, from the AWS containers team, and I'm also the AWS maintainer of fluent/fluent-bit. We need to get this change out since it fixes the linked Moby/Docker issue. My proposal is that I will take over this fix. I will preserve @akerouanton's commit so that they get credit. And then add commits on top of it as needed to get this PR to a merge-able state. To confirm, the issues blocking merge are:
If I fix those, we can merge it? I will begin work on this on Wednesday September 29th. |
I'm totally ok to merge the change written by 2 contributors (it's almost equal to independent 2 pull-reqs) if it can solve problems. |
I can rename the current |
Btw, I did not move
This is due to how // SetWriteDeadline sets the deadline for future Write calls
// and any currently-blocked Write call.
// Even if write times out, it may return n > 0, indicating that
// some of the data was successfully written.
// A zero value for t means Write will not time out.
SetWriteDeadline(t time.Time) error Of course this is limited to usecases where many goroutines are used to send logs, but I prefer to point it out. |
PR fluent#77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer before closing. Without this parameter, the logger hangs out whenever it has logs to write and the target Fluentd server is down. But this new parameter is not enough: the connection is currently lazily initialized when the logger receive its first message. This blocks the select reading messages from the queue (until the connection is ready). Moreover, the connection dialing uses an exponential back-off retry. Because of that, the logger won't look for messages on `stopRunning` channel (the channel introduced by PR fluent#77), either because it's blocked by the Sleep used for the retry or because the connection dialing is waiting until dialing timeout (eg. TCP timeout). To fix these edge cases, the time.Sleep() used for back-off retry has been transformed into a time.After(). Moreover, the dialer.Dial() call used to connect to Fluentd has been replaced with dialer.DialContext() and a cancellable context is now used to stop the call to that method. The associated cancel function is stored in Fluent and got called by Close() when ForceStopAsyncSend option is enabled. Finally, the Fluent.run() method has been adapted to wait for both new messages on f.pending and stop signal sent to f.stopRunning channel. Previously, both channels were selected independently, in a procedural fashion. This fix is motivated by the issue described in: moby/moby#40063. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
17f8703
to
b48c086
Compare
I moved the body of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 👀 couldn't find any clear problems about locking/consistency on this change. Great work!
I left some comments about comments and naming.
fluent/fluent.go
Outdated
// Here, we don't want to retry the write since connectOrRetry already | ||
// retries Config.MaxRetry times to connect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Using defered calls to unlock muconn mutex ensures the mutex doesn't stay locked if the lib user's have a panic recovering mechanism in place and f.connect(), f.conn.Write() or f.close() panic. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
b48c086
to
ede25d4
Compare
LGTM! |
Done. Thank you for the great work! |
Updates the fluent logger library to v1.7.0. Following PRs were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Updates the fluent logger library to v1.7.0. Following PRs were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.7.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com> Upstream-commit: e24d61b7efac787ff3d5176d994608937a057522 Component: engine
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Wesley <wppttt@amazon.com>
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com> (cherry picked from commit e24d61b) Signed-off-by: Wesley <wppttt@amazon.com>
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com> (cherry picked from commit e24d61b) Signed-off-by: Wesley <wppttt@amazon.com>
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com> (cherry picked from commit e24d61b) Signed-off-by: Wesley <wppttt@amazon.com>
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com> (cherry picked from commit e24d61b) Signed-off-by: Wesley <wppttt@amazon.com>
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com> (cherry picked from commit e24d61b) Signed-off-by: Wesley <wppttt@amazon.com>
Updates the fluent logger library to v1.8.0. Following PRs/commits were merged since last bump: * [Add callback for error handling when using async](fluent/fluent-logger-golang#97) * [Fix panic when accessing unexported struct field](fluent/fluent-logger-golang#99) * [Properly stop logger during (re)connect failure](fluent/fluent-logger-golang#82) * [Support a TLS-enabled connection](fluent/fluent-logger-golang@e5d6aa1) See https://github.com/fluent/fluent-logger-golang/compare/v1.6.1..v1.8.0 Signed-off-by: Albin Kerouanton <albinker@gmail.com> (cherry picked from commit e24d61b) Signed-off-by: Wesley <wppttt@amazon.com>
PR #77 introduced a new parameter named ForceStopAsyncSend. It can be
used to tell the logger to not try to send all the log messages in its
buffer before closing. Without this parameter, the logger hangs out
whenever it has logs to write and the target Fluentd server is down.
But this new parameter is not enough: the connection is currently lazily
initialized when the logger receive its first message. This blocks the
select reading messages from the queue (until the connection is ready).
Moreover, the connection dialing uses an exponential back-off retry.
Because of that, the logger won't look for messages on
stopRunning
channel (the channel introduced by PR #77), either because it's
blocked by the Sleep used for the retry or because the connection
dialing is waiting until dialing timeout (eg. TCP timeout).
To fix these edge cases, the time.Sleep() used for back-off retry has
been transformed into a time.After() and a new stopAsyncConnect channel
has been introduced to stop it. Moreover, the dialer.Dial() call used
to connect to Fluentd has been replaced with dialer.DialContext() and a
cancellable context is now used to stop the call to that method.
Finally, the Fluent.run() method has been adapted to wait for both new
messages on f.pending and stop signal sent to f.stopRunning channel.
Previously, both channels were selected independently, in a procedural
fashion.
This fix is motivated by the issue described in: moby/moby#40063.