Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering: possible panic when channel is being deleted during replication #757

Closed
kozlovic opened this issue Mar 1, 2019 · 0 comments
Closed
Assignees

Comments

@kozlovic
Copy link
Member

kozlovic commented Mar 1, 2019

This would produce a stack similar to this:

[1] 2019/02/28 06:01:24.220198 [INF] STREAM: Compaction in progress...
[1] 2019/02/28 06:01:24.229041 [INF] STREAM: Deletion took 8.837128ms
[1] 2019/02/28 06:03:12.365609 [INF] STREAM: server lost leadership, performing leader stepdown actions
[1] 2019/02/28 06:04:13.031298 [INF] STREAM: Deleting raft logs from 3594733 to 3613126
[1] 2019/02/28 06:04:13.031307 [INF] STREAM: Compaction in progress...
[1] 2019/02/28 06:04:13.034431 [INF] STREAM: Deletion took 3.129704ms
[1] 2019/02/28 06:05:03.315186 [INF] STREAM: finished leader stepdown actions
panic: failed to store replicated message 177 on channel fmc2.segment.1009: stan: channel is being deleted

goroutine 36 [running]:
github.com/nats-io/nats-streaming-server/server.(*raftFSM).Apply(0xc0004a2000, 0xc007be8ae0, 0xfdade0, 0xc0077eff08)
        /go/src/github.com/nats-io/nats-streaming-server/server/clustering.go:479 +0x71b
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc0088dc250)
        /go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/fsm.go:57 +0x155
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).runFSM(0xc0004b2000)
        /go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/fsm.go:120 +0x2ef
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).runFSM-fm()
        /go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/api.go:506 +0x2a
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc0004b2000, 0xc0004ac060)
        /go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/state.go:146 +0x53
created by github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*raftState).goFunc
        /go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/state.go:144 +0x66
kozlovic added a commit that referenced this issue Mar 8, 2019
If a server was leader and lost leadership while about to delete
a channel due to inactivity, but then becomes a follower and need
to replicate data for this channel, there was a risk that the
server would panic trying to store messages for this channel.

Resolves #757

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
@kozlovic kozlovic self-assigned this Mar 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant