Partition allocator creates clustered node groups #17925

StephanDollberg · 2024-04-17T16:45:54Z

Version & Environment

Redpanda version: dev

What went wrong?

In a 12 node cluster, one topic with 1860 partitions we get the following partition distribution where it seems there are being two groups (one three node and one 9 node) of nodes that share partition replicas.

For example looking at the the first group we see:

stephan@fedora@ducktape-node-0-jolly-vital-polliwog:~$ rpk topic describe test-topic-rMlZSlE-0000 -a | head -n 100 | grep "\[0"
3          5       1      [0 5 11]  0                 576598
7          0       1      [0 5 11]  0                 578720
11         5       1      [0 5 11]  0                 576760
15         0       1      [0 5 11]  0                 576672
19         5       1      [0 5 11]  0                 578980
23         0       1      [0 5 11]  0                 577444
27         5       2      [0 5 11]  0                 574880
31         11      1      [0 5 11]  0                 571951
35         11      1      [0 5 11]  0                 576533
39         5       1      [0 5 11]  0                 580554
43         5       1      [0 5 11]  0                 581244
47         11      1      [0 5 11]  0                 578692
51         5       1      [0 5 11]  0                 575650
55         0       1      [0 5 11]  0                 566436
59         11      2      [0 5 11]  0                 583530
63         11      2      [0 5 11]  0                 578035
67         0       1      [0 5 11]  0                 577500
stephan@fedora@ducktape-node-0-jolly-vital-polliwog:~$ rpk topic describe test-topic-rMlZSlE-0000 -a | grep "\[0 5 11\]" | wc -l
465 # 1860 / 12 * 3

We see that for those three nodes all their partitions are shared between each other.

Leadership is otherwise balanced.

What should have happened instead?

Clustering should not have happened. The problem is that this creates uneven traffic between nodes:

We see that this node only receives traffic from the other nodes in the "group".

How to reproduce the issue?

Benchrunner config as follows but this is probably easier to reproduce than that:

environment:
  client:
    provider: gcp
    provider_config:
      region: us-west1
      zone: b
      client_machine_type: n2d-standard-32
      subnet: omb-benchrunner-2
      client_disks: 2
  redpanda:
    provider: gcp
    provider_config:
      subnet: omb-benchrunner-2
      nodes: 12
      region: us-west1
      zone: b
      machine_type: n2d-standard-32
      disks: 24
      disk_interface: SCSI
      enable_monitoring: true

deployment:
  prometheus_scrape_interval: 5s
  prometheus_scrape_timeout: 5s
  openmessaging_benchmark_repo: https://github.com/redpanda-data/openmessaging-benchmark
  openmessaging_benchmark_version: main

benchmark:
  create_canary_topic: false
  provider: openmessaging.OpenMessaging
  client_count: 9
  driver:
    name: simple-driver
    replication_factor: 3
    request_timeout: 300000
    reset: true
    producer:
      enable.idempotence: true
      max.in.flight.requests.per.connection: 5
      acks: all
      linger.ms: 1
      batch.size: 16384
    consumer:
      auto.offset.reset: latest
      enable.auto.commit: false
      max.partition.fetch.bytes: 131072
  workload:
    name: Simple-workload-config
    topics: 1
    partitions_per_topic: 1860
    subscriptions_per_topic: 2
    consumer_per_subscription: 320
    producers_per_topic: 320
    consumer_backlog_size_GB: 0
    warmup_duration_minutes: 3
    test_duration_minutes: 8
    producer_rate: 1600000
    message_size: 1024
    payload_file: payload/payload-1Kb.data

Full rpk topic list -a output: https://gist.github.com/StephanDollberg/482127f2dc427219ef7fac42cbb9a1ab

stephan@fedora@ducktape-node-4-jolly-vital-polliwog:~$ rpk redpanda admin brokers list
NODE-ID  NUM-CORES  MEMBERSHIP-STATUS  IS-ALIVE  BROKER-VERSION
0        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
1        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
2        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
3        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
4        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
5        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
6        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
7        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
8        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
9        32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
10       32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
11       32         active             true      v24.1.1-rc5-74-g13273f3482 - 13273f3482d5f7bf74ec2e287d9fe7416c7dd702
stephan@fedora@ducktape-node-4-jolly-vital-polliwog:~$ rpk cluster health
CLUSTER HEALTH OVERVIEW
=======================
Healthy:                          true
Unhealthy reasons:                []
Controller ID:                    0
All nodes:                        [0 1 2 3 4 5 6 7 8 9 10 11]
Nodes down:                       []
Leaderless partitions (0):        []
Under-replicated partitions (0):  []
stephan@fedora@ducktape-node-4-jolly-vital-polliwog:~$ rpk cluster partitions balancer-status
Status:                      ready
Seconds Since Last Tick:     0
Current Reassignment Count:  0

JIRA Link: CORE-2412

The text was updated successfully, but these errors were encountered:

StephanDollberg · 2024-04-17T16:46:46Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition allocator creates clustered node groups #17925

Partition allocator creates clustered node groups #17925

StephanDollberg commented Apr 17, 2024 •

edited

StephanDollberg commented Apr 17, 2024

travisdowns commented Apr 17, 2024 •

edited

Partition allocator creates clustered node groups #17925

Partition allocator creates clustered node groups #17925

Comments

StephanDollberg commented Apr 17, 2024 • edited

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

StephanDollberg commented Apr 17, 2024

travisdowns commented Apr 17, 2024 • edited

StephanDollberg commented Apr 17, 2024 •

edited

travisdowns commented Apr 17, 2024 •

edited