Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: profileReporter internal event circulation #77

Merged
merged 71 commits into from
Mar 16, 2022
Merged

fix: profileReporter internal event circulation #77

merged 71 commits into from
Mar 16, 2022

Conversation

songzhibin97
Copy link
Member

No description provided.

@songzhibin97
Copy link
Member Author

本质上面对的还是用户,如果发送到channel的消息不能够被成功的发送,合理吗~

另外,channel本来是解决速差的,如果选择发送是否可以在send的地方选择select适当丢弃?

@Jun10ng
Copy link
Contributor

Jun10ng commented Mar 10, 2022

本质上面对的还是用户,如果发送到channel的消息不能够被成功的发送,合理吗~

另外,channel本来是解决速差的,如果选择发送是否可以在send的地方选择select适当丢弃?

你说服我了。
“如果发送到channel的消息不能够被成功的发送”这个确实是不合理的。

holmes.go Outdated
func (h *Holmes) ReportProfile(pType string, buf []byte, reason string, eventID string) {
if atomic.LoadInt64(&h.stopped) == 1 {
close(h.rptEventsCh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方是不是有bug会导致panic。
场景如下:
timeline1: ticke.c-> grDump() -> close(rptEventsCh) -> memDump -> close(rptEventsCh) ...
timeline2: stop holmes->

rptEventsCh 似乎会被close多次? 因为stop标识位不会在Dump 过程中检查

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close(h.rptEventsCh) 可以移到 h.Stop() 中?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,应该放 stop 里,最好保留 stopReporter 这个函数名,语义更清晰

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实这里可能会有异常

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,应该放 stop 里,最好保留 stopReporter 这个函数名,语义更清晰

如果单独开一个stop方法,是否我们的start和stop的粒度就变了呢,我更倾向在发送端去close,当然我现在的做法可能有点问题,我需要修复一下

@Jun10ng
Copy link
Contributor

Jun10ng commented Mar 10, 2022

能否增加一个单元测试:不stop holmes的同时只关闭reporter,然后再开启。
改造下 reporter_test.go 里的单元测试应该就可以,tks。

@songzhibin97
Copy link
Member Author

好的,我尝试一下

holmes.go Outdated
Comment on lines 767 to 769
h.rptEventsChLock.Lock()
h.rptEventsCh = make(chan rptEvent, 32)
h.rptEventsChLock.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好多个锁...

我有个小建议,Start 里初始化这些 channel,stop 里关闭这些 channel
同时,start 和 stop 用 CompareAndSwap 保护一下重入

这样 race 应该足够小了,除非是 start 和 stop 的并发 race 了,这个我觉得这个很小 的race 是可以接受的了

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果在stop上去close,有可能在发送端的时候发送消息的时候发送到close channel中

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,也对,close 得放在 sender 这里

我们搞成一个锁吧,fin 和 reporter 都放在 h.mutex 里,复用一下

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holmes/holmes.go

Lines 284 to 289 in 17ac08a

h.goroutineCheckAndDump(gNum)
h.memCheckAndDump(mem)
h.cpuCheckAndDump(cpu)
h.threadCheckAndDump(tNum)
h.threadCheckAndShrink(tNum)
}

判断并关闭的操作放在 每次dumpLoop函数的最后面如何,这样就不会有多次关闭channe的问题了,锁也可以删掉。
然后每个类型只需要判断stopped 然后 return。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

想法不错,不过,gcheap 是另外一个循环,不在这个 loop 里呢

Copy link
Contributor

@Jun10ng Jun10ng Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

想法不错,不过,gcheap 是另外一个循环,不在这个 loop 里呢

gcheap 也做类似判断吧。
2处判断 比 6处判断+锁竞争好一些。
@songzhibin97 你觉得呢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

放外面的话感觉别人阅读代码是不是有点抽象

holmes.go Outdated
Comment on lines 169 to 173
ch := gc.h.getGcEventCh()
if ch == nil {
return
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码没必要了?

holmes.go Outdated
Comment on lines 175 to 178
if ch != nil {
close(gc.h.gcEventsCh)
gc.h.setGcEventCh(nil)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段也可以删了?

holmes.go Outdated
@@ -189,7 +229,6 @@ func (h *Holmes) Start() {
// Stop the dump loop.
func (h *Holmes) Stop() {
atomic.StoreInt64(&h.stopped, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在这里 close channel,set channel = nil 就可以?在锁的保护下

holmes.go Outdated
@@ -161,6 +200,7 @@ type gcHeapFinalizer struct {
}

func (h *Holmes) startGCCycleLoop() {
h.setGcEventCh(make(chan struct{}, 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些 channel 初始化,可以统一放到 Start 函数里?

holmes.go Outdated
continue
}
opts.reporter.Report(evt.PType, evt.Buf, evt.Reason, evt.EventID) // nolint: errcheck
ch := h.getRptEventsCh()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接通过函数参数传参,更好

holmes.go Outdated
return
}

if h.opts.rptOpts.allowDiscarding {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个配置很鸡肋呢,要是不允许丢失,就阻塞主流程了,这个是更坏的影响

设计上来说,就不应该允许这种破坏主流程的行为

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要想不丢失消息,有的是办法,消费者直接起协程来处理就可以

holmes.go Outdated
return
}
// Waiting to be sent
ch <- rptEvent{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种有问题的,如果发生了阻塞写,此时再 close 就 panic 了

holmes_test.go Outdated
@@ -102,6 +102,9 @@ func createThread(n int, blockTime time.Duration) {
}

func TestWithShrinkThread(t *testing.T) {
h.Stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥要在这里加呢?单独起个用例更好一些

go.mod Outdated
@@ -6,5 +6,5 @@ require (
github.com/gin-gonic/gin v1.7.7
github.com/shirou/gopsutil v3.20.11+incompatible
github.com/stretchr/testify v1.7.0
mosn.io/pkg v0.0.0-20211217101631-d914102d1baf
mosn.io/pkg v0.0.0-20220308091858-ea728aacbe63
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请不要更新这个

pType,
buf,
reason,
eventID}
eventID}:
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

channel 满了的时候,最好加个 infof log

holmes.go Outdated
}

// startReporter starts a background goroutine to consume event channel,
// and finish it at after receive from cancel channel.
func (h *Holmes) startReporter() {
func (h *Holmes) startReporter(ch chan rptEvent) {
if ch == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不能是 nil 了吧?

holmes.go Outdated
@@ -191,17 +222,35 @@ func (h *Holmes) Start() {
h.Errorf("Holmes has started, please don't start it again.")
return
}
h.setGcEventCh(make(chan struct{}, 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 这里的 channel 也可以当做参数往下传,可以简化很多逻辑
  2. 这两个 set channel,不用搞封装了,直接在这里显示 lock 就行了

holmes.go Outdated
Comment on lines 139 to 166
// getGcEventCh get gcEventsCh
func (h *Holmes) getGcEventCh() chan struct{} {
h.Lock()
defer h.Unlock()
return h.gcEventsCh
}

// setGcEventCh set gcEventsCh
func (h *Holmes) setGcEventCh(ch chan struct{}) {
h.Lock()
defer h.Unlock()
h.gcEventsCh = ch
}

// getRptEventsCh get rptEventsCh
func (h *Holmes) getRptEventsCh() chan rptEvent {
h.Lock()
defer h.Unlock()
return h.rptEventsCh
}

// setRptEventsCh set rptEventsCh
func (h *Holmes) setRptEventsCh(ch chan rptEvent) {
h.Lock()
defer h.Unlock()
h.rptEventsCh = ch
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些应该都没什么必要了,你可以捋一下的,传参可以简化很多

@doujiang24
Copy link
Member

提了一些改动在这个 PR:#85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla:yes size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants