Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nats stream cluster peer-remove puts R1 stream in non-recoverable state #4396

Closed
2 tasks done
jzhn opened this issue Aug 14, 2023 · 1 comment · Fixed by #4420
Closed
2 tasks done

nats stream cluster peer-remove puts R1 stream in non-recoverable state #4396

jzhn opened this issue Aug 14, 2023 · 1 comment · Fixed by #4420
Assignees
Labels
defect Suspected defect such as a bug or regression

Comments

@jzhn
Copy link

jzhn commented Aug 14, 2023

Defect

Make sure that these boxes are checked before submitting your issue -- thank you!

Versions of nats-server and affected client libraries used:

$ nats-server --version
nats-server: v2.9.21

$ nats --version
0.0.35

OS/Container environment:

macOS 13.5 (22G74)

Steps or code to reproduce the issue:

  1. Setup a simple 3-cluster super cluster, each cluster with 1 server.
    Use steps from here: https://natsbyexample.com/examples/topologies/supercluster-jetstream/cli
$ nats --context east-sys server report jetstream
╭───────────────────────────────────────────────────────────────────────────────────────────────╮
│                                       JetStream Summary                                       │
├────────┬─────────┬─────────┬───────────┬──────────┬───────┬────────┬──────┬─────────┬─────────┤
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err │
├────────┼─────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│ n2     │ central │ 1       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 3       │ 0       │
│ n1*    │ east    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 22      │ 1       │
│ n3     │ west    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
├────────┼─────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│        │         │ 1       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 25      │ 1       │
╰────────┴─────────┴─────────┴───────────┴──────────┴───────┴────────┴──────┴─────────┴─────────╯

╭────────────────────────────────────────────────────────────╮
│                RAFT Meta Group Information                 │
├──────┬──────────┬────────┬─────────┬────────┬────────┬─────┤
│ Name │ ID       │ Leader │ Current │ Online │ Active │ Lag │
├──────┼──────────┼────────┼─────────┼────────┼────────┼─────┤
│ n1   │ fjFyEjc1 │ yes    │ true    │ true   │ 0.00s  │ 0   │
│ n2   │ 44jzkV9D │        │ true    │ true   │ 0.44s  │ 0   │
│ n3   │ BXScrY9i │        │ true    │ true   │ 0.44s  │ 0   │
╰──────┴──────────┴────────┴─────────┴────────┴────────┴─────╯
  1. Create a simple R1 stream
$ nats --context east stream add \
  --subjects test \
  --storage file \
  --replicas 1 \
  --retention limits \
  --discard old \
  --max-age 1m \
  --max-msgs=100 \
  --max-msgs-per-subject=-1 \
  --max-msg-size=-1 \
  --max-bytes=-1 \
  --dupe-window=1m  \
  --no-allow-rollup \
  --no-deny-delete \
  --no-deny-purge \
  test
  1. Verify that stream is created and landed on one of the cluster
$ nats --context east stream report
Obtaining Stream stats

╭─────────────────────────────────────────────────────────────────────────────────────────╮
│                                      Stream Report                                      │
├────────┬─────────┬───────────┬───────────┬──────────┬───────┬──────┬─────────┬──────────┤
│ Stream │ Storage │ Placement │ Consumers │ Messages │ Bytes │ Lost │ Deleted │ Replicas │
├────────┼─────────┼───────────┼───────────┼──────────┼───────┼──────┼─────────┼──────────┤
│ test   │ File    │           │ 0         │ 0        │ 0 B   │ 0    │ 0       │ n2*      │
╰────────┴─────────┴───────────┴───────────┴──────────┴───────┴──────┴─────────┴──────────╯
  1. use peer-remove command on the newly created stream
$ nats --context east stream cluster peer-remove test
? Select a Peer n2
11:33:19 Removing peer "n2"
nats: error: peer remap failed (10075)

Expected result:

The peer-remove command either

  • fails with error message that stream cannot be re-located in another server of the same cluster (since all clusters in this super cluster are single-node)
  • succeeds and relocates the stream to another cluster.

Actual result:

  • The peer-remove command fails and leaves the stream in middle state.
  • The stream does not have any replicas
$ nats --context east stream report
Obtaining Stream stats

╭─────────────────────────────────────────────────────────────────────────────────────────╮
│                                      Stream Report                                      │
├────────┬─────────┬───────────┬───────────┬──────────┬───────┬──────┬─────────┬──────────┤
│ Stream │ Storage │ Placement │ Consumers │ Messages │ Bytes │ Lost │ Deleted │ Replicas │
├────────┼─────────┼───────────┼───────────┼──────────┼───────┼──────┼─────────┼──────────┤
│ test   │ File    │           │ 0         │ 0        │ 0 B   │ 0    │ 0       │          │
╰────────┴─────────┴───────────┴───────────┴──────────┴───────┴──────┴─────────┴──────────╯
  • Any command to manage or inspect the stream returns error. There's no way to unblock the stream, or to remove it from the cluster.
$ nats --context east stream edit test
nats: error: could not request Stream test configuration: stream is offline (10118)

$ nats --context east stream rm test
? Really delete Stream test Yes
nats: error: could not remove Stream: stream is offline (10118)

$ nats --context east stream info test
nats: error: could not request Stream info: stream is offline (10118)
  • It is impossible to create another stream that subscribe to the same subject(s). So when this issue happens, the cluster is in really bad shape that certain subjects cannot be subscribed by jetstream.
@jzhn jzhn added the 🐞 bug label Aug 14, 2023
@derekcollison derekcollison self-assigned this Aug 14, 2023
@bruth bruth added defect Suspected defect such as a bug or regression and removed 🐞 bug labels Aug 18, 2023
@derekcollison
Copy link
Member

Left the error code the same as we will pull this into 2.9.22. Can look at expanding the error description possibly in 2.10.

And if you want to move a stream, you can place it in any cluster or provide placement tags that it will use to select new peers and possibly a new cluster.

derekcollison added a commit that referenced this issue Aug 23, 2023
We should not remove a peer from a stream when we can not find a
replacement unless R>1.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4396
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants