Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logrus logger hook data race condition #181

Merged
merged 6 commits into from
Mar 24, 2021
Merged

Fix logrus logger hook data race condition #181

merged 6 commits into from
Mar 24, 2021

Conversation

bjsignalfx
Copy link
Contributor

@bjsignalfx bjsignalfx commented Mar 18, 2021

Data race issue #166 (comment) fixed by updating logrus to version v1.8.1. Plus refactoring.

@codecov
Copy link

codecov bot commented Mar 18, 2021

Codecov Report

Merging #181 (729f746) into main (a1c52ac) will increase coverage by 0.20%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
+ Coverage   88.58%   88.78%   +0.20%     
==========================================
  Files          18       18              
  Lines         981      981              
==========================================
+ Hits          869      871       +2     
+ Misses         78       77       -1     
+ Partials       34       33       -1     
Impacted Files Coverage Δ
internal/receiver/smartagentreceiver/log.go 95.18% <0.00%> (+2.40%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1c52ac...729f746. Read the comment docs.

@rmfitzpatrick
Copy link
Contributor

I don't believe this resolves the noticed issue: #166 (comment)

If you merge that branch and run go test -failfast -race -run TestMultipleStartAndShutdownInvocations -count 10000 it occurs with and without these changes.

@bjsignalfx
Copy link
Contributor Author

bjsignalfx commented Mar 23, 2021

@rmfitzpatrick you are correct. Nonetheless it looks like this problem is fixed in logrus v1.8.1 (see v1.8.1 here in contrast to v1.8.0 here). I cleanup a few things and added tests. Can you please review those changes?

@bjsignalfx
Copy link
Contributor Author

@rmfitzpatrick I forgot to mention that we are using v1.8.0.

@rmfitzpatrick
Copy link
Contributor

@bjsignalfx would just bumping the logrus version be able to resolve on its own?

@bjsignalfx
Copy link
Contributor Author

bjsignalfx commented Mar 23, 2021

@rmfitzpatrick yes that resolves the data race condition. But, below is an excerpt of the error we are getting now:

goroutine 23150 [semacquire, 1 minutes]:
github.com/shirou/gopsutil/cpu.allCPUTimes(0x61eb6e5, 0x4010ceb, 0x70d6250, 0x0, 0x0)
        /Users/bjonathan/go/pkg/mod/github.com/shirou/gopsutil@v3.21.2+incompatible/cpu/cpu_darwin_cgo.go:87 +0x55
github.com/shirou/gopsutil/cpu.TimesWithContext(0x8427050, 0xc0001a4008, 0xc00b2df800, 0xc00007a900, 0x4011458, 0xc0024c6f00, 0xc00b2df860, 0xc006ca0cd0)
        /Users/bjonathan/go/pkg/mod/github.com/shirou/gopsutil@v3.21.2+incompatible/cpu/cpu_darwin.go:44 +0xcb
github.com/shirou/gopsutil/cpu.Times(0x9813b00, 0x18, 0x0, 0xffffffffffffffff, 0xc006ca0d70, 0x403e8c5)
        /Users/bjonathan/go/pkg/mod/github.com/shirou/gopsutil@v3.21.2+incompatible/cpu/cpu_darwin.go:36 +0x65
github.com/signalfx/signalfx-agent/pkg/monitors/cpu.(*Monitor).generateDatapoints(0xc016436480, 0xc006ca0f20, 0xc006512ee0, 0xc005856000)
        /Users/bjonathan/go/pkg/mod/github.com/signalfx/signalfx-agent@v1.0.1-0.20210218155823-9b6186460a32/pkg/monitors/cpu/cpu.go:108 +0x65
github.com/signalfx/signalfx-agent/pkg/monitors/cpu.(*Monitor).Configure.func1()
        /Users/bjonathan/go/pkg/mod/github.com/signalfx/signalfx-agent@v1.0.1-0.20210218155823-9b6186460a32/pkg/monitors/cpu/cpu.go:243 +0x76
github.com/signalfx/signalfx-agent/pkg/utils.RunOnInterval.func1(0xc007391a60, 0xc0119281e0, 0x8427018, 0xc016436500)
        /Users/bjonathan/go/pkg/mod/github.com/signalfx/signalfx-agent@v1.0.1-0.20210218155823-9b6186460a32/pkg/utils/time.go:56 +0x9c
created by github.com/signalfx/signalfx-agent/pkg/utils.RunOnInterval
        /Users/bjonathan/go/pkg/mod/github.com/signalfx/signalfx-agent@v1.0.1-0.20210218155823-9b6186460a32/pkg/utils/time.go:46 +0x79

@bjsignalfx
Copy link
Contributor Author

bjsignalfx commented Mar 23, 2021

@rmfitzpatrick Then again, maybe not. I ran the test again and I'm not getting the error I mentioned above. I got the result below.

. . .

ERRO[0490]/Users/bjonathan/go/pkg/mod/github.com/signalfx/signalfx-agent@v1.0.1-0.20210218155823-9b6186460a32/pkg/monitors/cpu/cpu.go:79 github.com/signalfx/signalfx-agent/pkg/monitors/cpu.(*Monitor).generatePerCoreDatapoints() failed to calculate utilization for cpu core cpu0  error="usedDiff < 0" monitorType=cpu

2021-03-23T11:29:53.273-0400    ERROR   cpu/cpu.go:79   failed to calculate utilization for cpu core cpu0       {"component_kind": "receiver", "component_type": "smartagent", "monitorType": "cpu", "error": "usedDiff < 0"}
race: limit on 8128 simultaneously alive goroutines is exceeded, dying
exit status 66
FAIL    github.com/signalfx/splunk-otel-collector/internal/receiver/smartagentreceiver  493.321s

@bjsignalfx bjsignalfx changed the title Handle logging events in a thread safe manner Fix logrus logger hook data race condition Mar 24, 2021
@bjsignalfx bjsignalfx merged commit 4517674 into main Mar 24, 2021
@delete-merged-branch delete-merged-branch bot deleted the fix-data-race branch March 24, 2021 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants