Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't buffer doctor logger #19982

Merged
merged 3 commits into from
Jun 16, 2022
Merged

Conversation

Gusted
Copy link
Contributor

@Gusted Gusted commented Jun 15, 2022

  • We don't need to buffer the logger with a thousand capacity. It's not a high-throughput logger, this also caused issue whereby the logger can't keep up with repeated messages being send(somehow they are lost in the queue? Seems like a issue in MultiChannelledLog's Start code).
  • Resolves gitea doctor --run output is sometime truncated #19969

- We don't need to buffer the logger with a thousand capacity. It's not
a high-throughput logger, this also caused issue whereby the logger
can't keep up with repeated messages being send(somehow they are lost in
the queue?).
- Resolves go-gitea#19969
@GiteaBot GiteaBot added the lgtm/need 1 This PR needs approval from one additional maintainer to be merged. label Jun 15, 2022
Copy link
Contributor

@wxiaoguang wxiaoguang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this fix can significantly reduce the possibility of log loose.

My thought is, as long as the code is :

put log in channel
fetch log from channel
output log

Then, if there is no real Flush, there is still a (small possibility) chance that the exit occurs before the output, which will lose the last log.

It's fine to have this fix in 1.17.

If my understanding is correct, it's better to add a TODO or FIXME in code, in case some future users will be bitten by this problem.~~

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jun 16, 2022
@Gusted
Copy link
Contributor Author

Gusted commented Jun 16, 2022

it's better to add a TODO or FIXME in code, in case some future users will be bitten by this problem.

Eh, I haven't been able to find any lost logs anymore. Unless someone is still able to reproduce it with a low probability I'm fine with that otherwise the bug is fixed to me.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 16, 2022

Because the Gitea's logger system works like this (which is not ideal):

func main() {
	c := make(chan string, 1)
	go func() {
		s := <-c
		_, _ = os.Stdout.WriteString(s)
	}()
	c <- "test\n"
}

You may not see the "test" output. But if you add a time.Sleep after the c <- ..., then you can see the "test" again.

And how Gitea's logger works:

func (l *ChannelledLog/MultiChannelledLog) Close() {
	l.close <- true
	<-l.closed
}

func (l *ChannelledLog/MultiChannelledLog) Flush() {
	l.flush <- true
}

func (l *ChannelledLog/MultiChannelledLog) Start() {
		case _, ok := <-l.flush:
			subLoggers.Flush()
		case <-l.close:
			m.closed <- true
}

I think there might be still a low possibility chance to make the log still in the channel when exiting, not that serious.

log.Warn("...") // put the event in channel, assume that the `queue` channel is not handled due to concurrency
log.Flush() // activate the `flush` channel, and assume that the `flush` channel is handled in next for-select loop
log.Close() // activate the `close` channel, and assume that the `close` channel is handled in next for-select loop
exit // then the event is still in the channel

In most cases, then queue channel will be activated before flush or close, then the event in channel can be processed correctly.

@Gusted
Copy link
Contributor Author

Gusted commented Jun 16, 2022

You may not see the "test" output. But if you add a time.Sleep after the c <- ..., then you can see the "test" again.

Yeah, but it seems like early exiting the program isn't the issue in this case, it's rather that close channel can take precedence over the queue channel.

I think there might be still a low possibility chance to make the log still in the channel when existing, not that serious.

You end up with non-clean code IMO, as within the case <-m.close: you would need to check for the channel again and then either say goto xxx or duplicate the code.


This due that buffered channels don't seem to take precedence in a select operation(which is pseudo-random according to the spec), however it seems like if you have unbuffered channel, you can actually take precedence in the select operation, if this is pure luck or actual defined behavior is a question.

CLOSING
RECEIVED AFTER CLOSING: true, &log.Event{level:3, msg:" - \x1b[1;33m[W]\x1b[0m old hook file /home/gusted/Desktop/coding/gitea/data/gitea-repositories/gusted/test_push_mirror.git/hooks/update does not exist", caller:"code.gitea.io/gitea/modules/doctor.checkHooks.func1()", filename:"modules/doctor/misc.go", line:67, time:time.Time{wall:0xc0a2d98bc1497275, ext:142911751, loc:(*time.Location)(0x57c9940)}, stacktrace:""}
 - [W] new hook file /home/gusted/Desktop/coding/gitea/data/gitea-repositories/gusted/test_push_mirror.git/hooks/pre-receive.d/gitea does not exist
diff --git a/modules/log/event.go b/modules/log/event.go
index f66ecd179..41b80c99c 100644
--- a/modules/log/event.go
+++ b/modules/log/event.go
@@ -328,6 +328,7 @@ func (m *MultiChannelledLog) Start() {
                                m.level = INFO
                        }
                case event, ok := <-m.queue:
+                       fmt.Printf("RECEIVING: %t, %#v\n", ok, event)
                        if !ok {
                                m.closeLoggers()
                                return
@@ -351,6 +352,14 @@ func (m *MultiChannelledLog) Start() {
                        }
                        m.rwmutex.RUnlock()
                case <-m.close:
+                       fmt.Println("CLOSING")
+                       select {
+                       case event, ok := <-m.queue:
+                               fmt.Printf("RECEIVED AFTER CLOSING: %t, %#v\n", ok, event)
+                       case <-time.After(100 * time.Millisecond):
+                               fmt.Println("No item in queue after closing.")
+                       }
+
                        m.closeLoggers()
                        return
                }
@@ -359,10 +368,12 @@ func (m *MultiChannelledLog) Start() {
 
 // LogEvent logs an event to this MultiChannelledLog
 func (m *MultiChannelledLog) LogEvent(event *Event) error {
+       fmt.Printf("SENDING: %#v\n", event)
        select {
        case m.queue <- event:
                return nil
        case <-time.After(100 * time.Millisecond):
+               fmt.Println("TIMING ERROR")
                // We're blocked!
                return ErrTimeout{
                        Name:     m.name,

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 16, 2022

Yup, the select operation is pseudo-random, I updated my demo steps. Even the flush and close channels are activated, the queue channel might not be processed in rare cases, so the event is left in the queue channel.

Not serious, just my thoughts. 😊


And you are correct, 0 length channel doesn't have such problem. 👍

@codecov-commenter

This comment was marked as off-topic.

@wxiaoguang wxiaoguang merged commit e399f0f into go-gitea:main Jun 16, 2022
@zeripath
Copy link
Contributor

zeripath commented Jun 16, 2022

It'd probably be better to add the code the empty the eventstream into close than to get rid of the buffer really.

I knew that this was a potential problem with the closing code but I thought that it would be a rare enough occurrence that this should not happen or actually matter.

Not having a buffer means that there is the potential for the logger to block on logging and I think that that is more likely to cause problems than the one you're trying to fix.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 16, 2022

Agree to "empty the eventstream into close".

ps: I think this PR has done what should do for 1.17. Maybe the logger system could be refactored in next milestone. And maybe LevelLoggerLogger could have a better name then, appender like Java? 😊

@zeripath
Copy link
Contributor

OK,

I think:

diff --git a/modules/log/event.go b/modules/log/event.go
index f66ecd179..40beefedb 100644
--- a/modules/log/event.go
+++ b/modules/log/event.go
@@ -345,18 +345,41 @@ func (m *MultiChannelledLog) Start() {
 				m.closeLoggers()
 				return
 			}
+			m.emptyQueue()
 			m.rwmutex.RLock()
 			for _, logger := range m.loggers {
 				logger.Flush()
 			}
 			m.rwmutex.RUnlock()
 		case <-m.close:
+			m.emptyQueue()
 			m.closeLoggers()
 			return
 		}
 	}
 }
 
+func (m *MultiChannelledLog) emptyQueue() bool {
+	for {
+		select {
+		case event, ok := <-m.queue:
+			if !ok {
+				return false
+			}
+			m.rwmutex.RLock()
+			for _, logger := range m.loggers {
+				err := logger.LogEvent(event)
+				if err != nil {
+					fmt.Println(err)
+				}
+			}
+			m.rwmutex.RUnlock()
+		default:
+			return true
+		}
+	}
+}
+
 // LogEvent logs an event to this MultiChannelledLog
 func (m *MultiChannelledLog) LogEvent(event *Event) error {
 	select {

Will do the correct thing here.

zeripath added a commit to zeripath/gitea that referenced this pull request Jun 17, 2022
It is possible for log events to remain in the buffer off the multichannelledlog
and thus not be logged despite close or flush.

This PR simply adds a function to empty the queue before closing or flushing.
(Except when the logger is paused.)

Reference go-gitea#19982

Signed-off-by: Andrew Thornton <art27@cantab.net>
lunny added a commit that referenced this pull request Jun 18, 2022
* Empty log queue on flush and close

It is possible for log events to remain in the buffer off the multichannelledlog
and thus not be logged despite close or flush.

This PR simply adds a function to empty the queue before closing or flushing.
(Except when the logger is paused.)

Reference #19982

Signed-off-by: Andrew Thornton <art27@cantab.net>

* and do similar for ChannelledLog

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
vsysoev pushed a commit to IntegraSDL/gitea that referenced this pull request Aug 10, 2022
- We don't need to buffer the logger with a thousand capacity. It's not
a high-throughput logger, this also caused issue whereby the logger
can't keep up with repeated messages being send(somehow they are lost in
the queue?).
- Resolves go-gitea#19969

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
vsysoev pushed a commit to IntegraSDL/gitea that referenced this pull request Aug 10, 2022
* Empty log queue on flush and close

It is possible for log events to remain in the buffer off the multichannelledlog
and thus not be logged despite close or flush.

This PR simply adds a function to empty the queue before closing or flushing.
(Except when the logger is paused.)

Reference go-gitea#19982

Signed-off-by: Andrew Thornton <art27@cantab.net>

* and do similar for ChannelledLog

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
AbdulrhmnGhanem pushed a commit to kitspace/gitea that referenced this pull request Aug 24, 2022
- We don't need to buffer the logger with a thousand capacity. It's not
a high-throughput logger, this also caused issue whereby the logger
can't keep up with repeated messages being send(somehow they are lost in
the queue?).
- Resolves go-gitea#19969

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
AbdulrhmnGhanem pushed a commit to kitspace/gitea that referenced this pull request Aug 24, 2022
* Empty log queue on flush and close

It is possible for log events to remain in the buffer off the multichannelledlog
and thus not be logged despite close or flush.

This PR simply adds a function to empty the queue before closing or flushing.
(Except when the logger is paused.)

Reference go-gitea#19982

Signed-off-by: Andrew Thornton <art27@cantab.net>

* and do similar for ChannelledLog

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gitea doctor --run output is sometime truncated
7 participants