Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition allocator creates clustered node groups #17925

Closed
StephanDollberg opened this issue Apr 17, 2024 · 2 comments · Fixed by #17962
Closed

Partition allocator creates clustered node groups #17925

StephanDollberg opened this issue Apr 17, 2024 · 2 comments · Fixed by #17962
Labels
area/replication kind/bug Something isn't working
Milestone

Comments

@StephanDollberg
Copy link
Member

StephanDollberg commented Apr 17, 2024

Version & Environment

Redpanda version: dev

What went wrong?

In a 12 node cluster, one topic with 1860 partitions we get the following partition distribution where it seems there are being two groups (one three node and one 9 node) of nodes that share partition replicas.

For example looking at the the first group we see:

stephan@fedora@ducktape-node-0-jolly-vital-polliwog:~$ rpk topic describe test-topic-rMlZSlE-0000 -a | head -n 100 | grep "\[0"
3          5       1      [0 5 11]  0                 576598
7          0       1      [0 5 11]  0                 578720
11         5       1      [0 5 11]  0                 576760
15         0       1      [0 5 11]  0                 576672
19         5       1      [0 5 11]  0                 578980
23         0       1      [0 5 11]  0                 577444
27         5       2      [0 5 11]  0                 574880
31         11      1      [0 5 11]  0                 571951
35         11      1      [0 5 11]  0                 576533
39         5       1      [0 5 11]  0                 580554
43         5       1      [0 5 11]  0                 581244
47         11      1      [0 5 11]  0                 578692
51         5       1      [0 5 11]  0                 575650
55         0       1      [0 5 11]  0                 566436
59         11      2      [0 5 11]  0                 583530
63         11      2      [0 5 11]  0                 578035
67         0       1      [0 5 11]  0                 577500
stephan@fedora@ducktape-node-0-jolly-vital-polliwog:~$ rpk topic describe test-topic-rMlZSlE-0000 -a | grep "\[0 5 11\]" | wc -l
465 # 1860 / 12 * 3

We see that for those three nodes all their partitions are shared between each other.

Leadership is otherwise balanced.

What should have happened instead?

Clustering should not have happened. The problem is that this creates uneven traffic between nodes:

image

We see that this node only receives traffic from the other nodes in the "group".

How to reproduce the issue?

Benchrunner config as follows but this is probably easier to reproduce than that:

environment:
  client:
    provider: gcp
    provider_config:
      region: us-west1
      zone: b
      client_machine_type: n2d-standard-32
      subnet: omb-benchrunner-2
      client_disks: 2
  redpanda:
    provider: gcp
    provider_config:
      subnet: omb-benchrunner-2
      nodes: 12
      region: us-west1
      zone: b
      machine_type: n2d-standard-32
      disks: 24
      disk_interface: SCSI
      enable_monitoring: true

deployment:
  prometheus_scrape_interval: 5s
  prometheus_scrape_timeout: 5s
  openmessaging_benchmark_repo: https://github.com/redpanda-data/openmessaging-benchmark
  openmessaging_benchmark_version: main

benchmark:
  create_canary_topic: false
  provider: openmessaging.OpenMessaging
  client_count: 9
  driver:
    name: simple-driver
    replication_factor: 3
    request_timeout: 300000
    reset: true
    producer:
      enable.idempotence: true
      max.in.flight.requests.per.connection: 5
      acks: all
      linger.ms: 1
      batch.size: 16384
    consumer:
      auto.offset.reset: latest
      enable.auto.commit: false
      max.partition.fetch.bytes: 131072
  workload:
    name: Simple-workload-config
    topics: 1
    partitions_per_topic: 1860
    subscriptions_per_topic: 2
    consumer_per_subscription: 320
    producers_per_topic: 320
    consumer_backlog_size_GB: 0
    warmup_duration_minutes: 3
    test_duration_minutes: 8
    producer_rate: 1600000
    message_size: 1024
    payload_file: payload/payload-1Kb.data

Full rpk topic list -a output: https://gist.github.com/StephanDollberg/482127f2dc427219ef7fac42cbb9a1ab

stephan@fedora@ducktape-node-4-jolly-vital-polliwog:~$ rpk redpanda admin brokers list
NODE-ID  NUM-CORES  MEMBERSHIP-STATUS  IS-ALIVE  BROKER-VERSION
0        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
1        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
2        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
3        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
4        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
5        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
6        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
7        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
8        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
9        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
10       32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
11       32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
stephan@fedora@ducktape-node-4-jolly-vital-polliwog:~$ rpk cluster health
CLUSTER HEALTH OVERVIEW
=======================
Healthy:                          true
Unhealthy reasons:                []
Controller ID:                    0
All nodes:                        [0 1 2 3 4 5 6 7 8 9 10 11]
Nodes down:                       []
Leaderless partitions (0):        []
Under-replicated partitions (0):  []
stephan@fedora@ducktape-node-4-jolly-vital-polliwog:~$ rpk cluster partitions balancer-status
Status:                      ready
Seconds Since Last Tick:     0
Current Reassignment Count:  0

JIRA Link: CORE-2412

@StephanDollberg StephanDollberg added kind/bug Something isn't working area/replication labels Apr 17, 2024
@StephanDollberg
Copy link
Member Author

See also #4809

@travisdowns
Copy link
Member

travisdowns commented Apr 17, 2024

Easy to reproduce from a clean cluster on nightly:

rpk container start --nodes=6 --set rpk.additional_start_flags="--smp=1" --image redpandadata/redpanda-nightly:v0.0.0-20240417git13273f3
rpk topic create foo -r3 -p50
rpk topic describe foo --print-partitions | grep -Eo '\[.*\]' | sort | uniq -c

output:

      6 [0 1 2]
      3 [0 1 4]
      8 [0 1 5]
      4 [0 2 3]
      1 [0 2 5]
      1 [0 3 4]
      1 [0 3 5]
      1 [0 4 5]
      1 [1 2 3]
      1 [1 2 4]
      1 [1 2 5]
      1 [1 3 4]
      4 [1 4 5]
      8 [2 3 4]
      3 [2 3 5]
      6 [3 4 5]

I.e., a very biased distribution of replica sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/replication kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants