Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy contention in filestore could result in underflow and panic. #2959

Merged
merged 1 commit into from
Mar 28, 2022

Conversation

derekcollison
Copy link
Member

Under heavy contention skipMsg combined with removeMsg and writeIndexInfo could result in index being stamped with underflow for number of messages. This would lead to possible panic when using mb.msgs to allocate a block buffer.

We had a report of a panic on server restart with 2.8.0-beta.1 (Thanks to Derek Wang!). The panic was trying to malloc/make the size of a load block based off of the number of messages we thought the block had from the index file. Before, SkipMsg would decrement and when we added the record via writeMsgRecord we would add it back in. However we did release the lock, meaning other things could run in between under load. If in between the decrement, say to 0 (we did protect against underflow there), then a remove and subsequent writeIndexInfo would stamp an underflow.

[168] 2022/03/26 03:13:33.264814 [INF] Starting nats-server
[168] 2022/03/26 03:13:33.264850 [INF]   Version:  2.8.0-beta.1
[168] 2022/03/26 03:13:33.264854 [INF]   Git:      [edcddfae]
[168] 2022/03/26 03:13:33.264858 [DBG]   Go build: go1.17.7
[168] 2022/03/26 03:13:33.264861 [INF]   Cluster:  default
[168] 2022/03/26 03:13:33.264863 [INF]   Name:     isbs-default-js-2
[168] 2022/03/26 03:13:33.264869 [INF]   Node:     PGEBPas1
[168] 2022/03/26 03:13:33.264873 [INF]   ID:       NAT3OC54QTW67RFGU5LWV7NBFQENCNEWJXT4OBAN657HZ7YIHN5F5HNS
[168] 2022/03/26 03:13:33.264877 [WRN] Plaintext passwords detected, use nkeys or bcrypt
[168] 2022/03/26 03:13:33.264880 [INF] Using configuration file: /etc/nats-config/nats-js.conf
[168] 2022/03/26 03:13:33.266101 [INF] Starting http monitor on 0.0.0.0:8222
[168] 2022/03/26 03:13:33.266162 [INF] Starting JetStream
[168] 2022/03/26 03:13:33.266432 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[168] 2022/03/26 03:13:33.266442 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[168] 2022/03/26 03:13:33.266444 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[168] 2022/03/26 03:13:33.266446 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[168] 2022/03/26 03:13:33.266447 [INF]
[168] 2022/03/26 03:13:33.266449 [INF]          https://docs.nats.io/jetstream
[168] 2022/03/26 03:13:33.266451 [INF]
[168] 2022/03/26 03:13:33.266453 [INF] ---------------- JETSTREAM ----------------
[168] 2022/03/26 03:13:33.266459 [INF]   Max Memory:      5.00 GB
[168] 2022/03/26 03:13:33.266462 [INF]   Max Storage:     50.00 GB
[168] 2022/03/26 03:13:33.266465 [INF]   Store Directory: "/data/jetstream/store/jetstream"
[168] 2022/03/26 03:13:33.266466 [INF] -------------------------------------------
[168] 2022/03/26 03:13:33.266522 [DBG]   Exports:
[168] 2022/03/26 03:13:33.266525 [DBG]      $JS.API.>
[168] 2022/03/26 03:13:33.266546 [DBG] Enabled JetStream for account "js"
[168] 2022/03/26 03:13:33.266554 [DBG]   Max Memory:      -1 B
[168] 2022/03/26 03:13:33.266556 [DBG]   Max Storage:     -1 B
[168] 2022/03/26 03:13:33.266571 [DBG] Recovering JetStream state for account "js"
[168] 2022/03/26 03:13:33.369794 [INF]   Restored 50,000 messages for stream 'js > kafka-rw-pipeline-oss-analytics-sampledataflow-usw2-prd-kafka-rw-pipeline-input-p1'
[168] 2022/03/26 03:13:33.377819 [INF] Server is ready
panic: runtime error: makeslice: cap out of range

goroutine 1 [running]:
github.com/nats-io/nats-server/v2/server.(*msgBlock).indexCacheBuf(0xc00021fa00, {0xc003408000, 0xecd348, 0x1000000})
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:3075 +0x195
github.com/nats-io/nats-server/v2/server.(*msgBlock).loadMsgsWithLock(0xc00021fa00)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:3375 +0x414
github.com/nats-io/nats-server/v2/server.(*msgBlock).generatePerSubjectInfo(0xc00021fa00)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:4502 +0xf0
github.com/nats-io/nats-server/v2/server.(*msgBlock).readPerSubjectInfo(0xc00021fa00)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:4555 +0x394
github.com/nats-io/nats-server/v2/server.(*fileStore).recoverMsgBlock(0xc000334780, {0xb02ad8, 0xc00021cd00}, 0x108b)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:641 +0xff0
github.com/nats-io/nats-server/v2/server.(*fileStore).recoverMsgs(0xc000334780)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:911 +0x858
github.com/nats-io/nats-server/v2/server.newFileStoreWithCreated({{0xc000316ab0, 0x84}, _, _, _, _}, {{0xc001875ec0, 0x59}, {0x0, 0x0}, ...}, ...)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:311 +0x774
github.com/nats-io/nats-server/v2/server.(*stream).setupStore(0xc000334500, 0xc0000f4a88)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/stream.go:2519 +0x2d5
github.com/nats-io/nats-server/v2/server.(*Account).addStreamWithAssignment(0xc0000b3440, 0xc00031e108, 0x0, 0x0)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/stream.go:413 +0xd8c
github.com/nats-io/nats-server/v2/server.(*Account).addStream(0xc001875f20, 0xc0001ffe50)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/stream.go:264 +0x1d
github.com/nats-io/nats-server/v2/server.(*Account).EnableJetStream(0xc0000b3440, 0xe0c3a0)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream.go:1159 +0x2be5
github.com/nats-io/nats-server/v2/server.(*Server).configJetStream(0xc0001c0000, 0xc0000b3440)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream.go:630 +0x9c
github.com/nats-io/nats-server/v2/server.(*Server).configAllJetStreamAccounts(0xc0001c0000)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream.go:677 +0x176
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamAccounts(0xc000162370)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream.go:563 +0xb1
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStream(0xc0001c0000, {0x140000000, 0xc80000000, {0xc00014a520, 0x1f}, {0x0, 0x0}})
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream.go:377 +0x919
github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream(0xc0001c0000, 0xc0000f5e38)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream.go:189 +0x40c
github.com/nats-io/nats-server/v2/server.(*Server).Start(0xc0001c0000)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/server.go:1754 +0xf25
github.com/nats-io/nats-server/v2/server.Run(...)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/service.go:22
main.main()
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/main.go:118 +0x2fa

Signed-off-by: Derek Collison derek@nats.io

/cc @nats-io/core

…x being stamped with underflow for number of messages.

We had a report of a panic on server restart with 2.8.0-beta.1. The panic was trying to malloc the size of a load block based off of the number of messages we thought the block had from the index.
Before, SkipMsg would decrement and when we added the record via writeMsgRecord we would add it back in. However we did release the lock, meaning other things could run.
If in between the decrement, say to 0 (we did protect against underflow there), then a remove and subsequent writeIndexInfo would stamp and underflow.

Signed-off-by: Derek Collison <derek@nats.io>
Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit 0b8aa47 into main Mar 28, 2022
@derekcollison derekcollison deleted the fs_msgs_underflow branch March 28, 2022 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants