Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(inputs.mongodb): resolve SIGSEGV when restarting MongoDB node #12604

Merged

Conversation

dkhamitov
Copy link
Contributor

@dkhamitov dkhamitov commented Feb 2, 2023

Below is an example of the SIGSEGV I occasionally get when restarting a MongoDB node that is being monitored by Telegraf MongoDB plugin. I think it rather happens during the node shutdown, though. Because according to the stacktrace, it's the newStat variable that doesn't have Locks populated as the code expects. The proposed fix is quite straightforward. It just adds adds necessary nil checks that are the same (symmetric) as those performed on the oldStat data.

The stacktrace below is for Telegraf 1.24.2. The line with "nil pointer dereference" can be found at

readWaitCountDiff := newStat.Locks["Collection"].AcquireWaitCount.Read - oldStat.Locks["Collection"].AcquireWaitCount.Read

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2637e51]

goroutine 6343563 [running]:
[github.com/influxdata/telegraf/plugins/inputs/mongodb.NewStatLine(](http://github.com/influxdata/telegraf/plugins/inputs/mongodb.NewStatLine(){{0xc0eeb390fb1262f5, 0x1a554c27316d, 0x95c0b40}, 0xc00167a360, 0xc0027761a0, 0xc001db02d0, 0xc000760060, 0xc0029b8e88, 0xc00076e2d0, 0xc0019800e8, ...}, ...)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/mongodb/mongostat.go:1227 +0x1871
[github.com/influxdata/telegraf/plugins/inputs/mongodb.(*Server).gatherData(0xc00146a5a0,](http://github.com/influxdata/telegraf/plugins/inputs/mongodb.(*Server).gatherData(0xc00146a5a0,) {0x671b7a0, 0xc0009a6460}, 0x1, 0x1, 0x1, 0x1, {0x9603620, 0x0, 0x0})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/mongodb/mongodb_server.go:363 +0x79d
[github.com/influxdata/telegraf/plugins/inputs/mongodb.(*MongoDB).Gather.func1(0xc002622850?)](http://github.com/influxdata/telegraf/plugins/inputs/mongodb.(*MongoDB).Gather.func1(0xc002622850?))
	/go/src/github.com/influxdata/telegraf/plugins/inputs/mongodb/mongodb.go:164 +0x1a9
created by [github.com/influxdata/telegraf/plugins/inputs/mongodb.(*MongoDB).Gather](http://github.com/influxdata/telegraf/plugins/inputs/mongodb.(*MongoDB).Gather)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/mongodb/mongodb.go:155 +0x65

Also, quite often but not always, I can catch the following error in the plugin before the crash:

2023-02-01T12:11:10Z E! [inputs.mongodb] failed to gather data: %!w(mongo.CommandError={13436 node is not in primary or recovering state [] NotPrimaryOrSecondary <nil> [180 0 0 0 3 116 111 112 111 108 111 103 121 86 101 114 115 105 111 110 0 45 0 0 0 7 112 114 111 99 101 115 115 73 100 0 99 218 86 207 130 25 202 195 230 35 33 178 18 99 111 117 110 116 101 114 0 0 0 0 0 0 0 0 0 0 1 111 107 0 0 0 0 0 0 0 0 0 2 101 114 114 109 115 103 0 43 0 0 0 110 111 100 101 32 105 115 32 110 111 116 32 105 110 32 112 114 105 109 97 114 121 32 111 114 32 114 101 99 111 118 101 114 105 110 103 32 115 116 97 116 101 0 16 99 111 100 101 0 124 52 0 0 2 99 111 100 101 78 97 109 101 0 22 0 0 0 78 111 116 80 114 105 109 97 114 121 79 114 83 101 99 111 110 100 97 114 121 0 0]})

Required for all PRs

resolves #12611

@dkhamitov dkhamitov force-pushed the fix/mongodb-sigsegv-newstat-locks branch from ca9271a to 005d1a8 Compare February 2, 2023 17:01
@dkhamitov dkhamitov changed the title SIGSEGV when restarting MongoDB node fix: SIGSEGV when restarting MongoDB node Feb 2, 2023
Fairly often reproducible but not always. The issue is that the newStat (ServerStatus) doesn't have its Locks populated. This change adds necessary "nil" checks to avoid a "nil pointer dereference" error crashing the telegraf process. The oldStat already has the exact same (symmetric) checks in place.
@dkhamitov dkhamitov force-pushed the fix/mongodb-sigsegv-newstat-locks branch from 005d1a8 to 90f1997 Compare February 2, 2023 17:02
@dkhamitov dkhamitov changed the title fix: SIGSEGV when restarting MongoDB node fix(inputs.mongodb): SIGSEGV when restarting MongoDB node Feb 2, 2023
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Feb 2, 2023

@powersj powersj changed the title fix(inputs.mongodb): SIGSEGV when restarting MongoDB node fix(inputs.mongodb): resolve SIGSEGV when restarting MongoDB node Feb 2, 2023
Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and tests! Is there an issue that is associated with this fix that we can close as well?

@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Feb 2, 2023
@dkhamitov
Copy link
Contributor Author

Hi @powersj, I haven't found any related issues, neither open nor closed.

@srebhan
Copy link
Contributor

srebhan commented Feb 3, 2023

@dkhamitov I took the liberty to open an issue with the information you provided. In the future, please open an issue first describing the problem and then reference this issue in your PR. This is to allow other users with a similar problem to easier discover the solution (your PR) as they might only search in issues...

Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @dkhamitov for tracking this down and fixing it!

@srebhan srebhan merged commit e466cab into influxdata:master Feb 3, 2023
@dkhamitov
Copy link
Contributor Author

@srebhan

In the future, please open an issue first describing the problem and then reference this issue in your PR

Got it. Thanks!

powersj pushed a commit that referenced this pull request Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic/SIGSEGV when restarting MongoDB node
3 participants