Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka rebalance with a failure broker needs repair when online #20

Closed
joestein opened this issue Mar 22, 2015 · 1 comment
Closed

kafka rebalance with a failure broker needs repair when online #20

joestein opened this issue Mar 22, 2015 · 1 comment

Comments

@joestein
Copy link
Contributor

I started 4 brokers , created topic with rep 3 and then add 3 more brokers and did rebalance, during that one of the brokers (id==0) died and error in partitions on status, mesos restarted broker 0 (woo hoo) but then when i ran it again it all went to replication 4 and need to have it back to 3 spread evenly.


# ./kafka-mesos.sh add 0..3
Brokers added

brokers:
  id: 0
  active: false
  state: stopped
  resources: cpus:0.50, mem:128, heap:128
  failover: delay:10s, maxDelay:60s

  id: 1
  active: false
  state: stopped
  resources: cpus:0.50, mem:128, heap:128
  failover: delay:10s, maxDelay:60s

  id: 2
  active: false
  state: stopped
  resources: cpus:0.50, mem:128, heap:128
  failover: delay:10s, maxDelay:60s

  id: 3
  active: false
  state: stopped
  resources: cpus:0.50, mem:128, heap:128
  failover: delay:10s, maxDelay:60s

# ./kafka-mesos.sh start 0..3
Brokers 0,1,2,3 started

# bin/kafka-topics.sh --zookeeper master0:2181 --topic TESTING --create --replication-factor 3 --partitions 12
Created topic "TESTING".

# ./kafka-mesos.sh add 4..6
Brokers added

brokers:
  id: 4
  active: false
  state: stopped
  resources: cpus:0.50, mem:128, heap:128
  failover: delay:10s, maxDelay:60s

  id: 5
  active: false
  state: stopped
  resources: cpus:0.50, mem:128, heap:128
  failover: delay:10s, maxDelay:60s

  id: 6
  active: false
  state: stopped
  resources: cpus:0.50, mem:128, heap:128
  failover: delay:10s, maxDelay:60s

# ./kafka-mesos.sh start 4..6
Brokers 4,5,6 started

# bin/kafka-topics.sh --zookeeper master0:2181 --topic TESTING --describe
Topic:TESTING PartitionCount:12 ReplicationFactor:3 Configs:
  Topic: TESTING  Partition: 0  Leader: 1 Replicas: 1,3,0 Isr: 1,3,0
  Topic: TESTING  Partition: 1  Leader: 2 Replicas: 2,0,1 Isr: 2,0,1
  Topic: TESTING  Partition: 2  Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
  Topic: TESTING  Partition: 3  Leader: 0 Replicas: 0,2,3 Isr: 0,2,3
  Topic: TESTING  Partition: 4  Leader: 1 Replicas: 1,0,2 Isr: 1,0,2
  Topic: TESTING  Partition: 5  Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
  Topic: TESTING  Partition: 6  Leader: 3 Replicas: 3,2,0 Isr: 3,2,0
  Topic: TESTING  Partition: 7  Leader: 0 Replicas: 0,3,1 Isr: 0,3,1
  Topic: TESTING  Partition: 8  Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
  Topic: TESTING  Partition: 9  Leader: 2 Replicas: 2,3,0 Isr: 2,3,0
  Topic: TESTING  Partition: 10 Leader: 3 Replicas: 3,0,1 Isr: 3,0,1
  Topic: TESTING  Partition: 11 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2
root@master0:/vagrant/


# ./kafka-mesos.sh rebalance 0..6
Rebalance started: 
TESTING
  0: 1,3,0 -> 6,5,0 - running
  1: 2,0,1 -> 0,6,1 - running
  2: 3,1,2 -> 1,0,2 - running
  3: 0,2,3 -> 2,1,3 - running
  4: 1,0,2 -> 3,2,4 - running
  5: 2,1,3 -> 4,3,5 - running
  6: 3,2,0 -> 5,4,6 - running
  7: 0,3,1 -> 6,0,1 - running
  8: 1,2,3 -> 0,1,2 - running
  9: 2,3,0 -> 1,2,3 - running
  10: 3,0,1 -> 2,3,4 - running
  11: 0,1,2 -> 3,4,5 - running

# ./kafka-mesos.sh rebalance status
Rebalance is running: 
TESTING
  0: 1,3,0 -> 6,5,0 - running
  1: 2,0,1 -> 0,6,1 - running
  2: 3,1,2 -> 1,0,2 - running
  3: 0,2,3 -> 2,1,3 - running
  4: 1,0,2 -> 3,2,4 - running
  5: 2,1,3 -> 4,3,5 - running
  6: 3,2,0 -> 5,4,6 - running
  7: 0,3,1 -> 6,0,1 - running
  8: 1,2,3 -> 0,1,2 - running
  9: 2,3,0 -> 1,2,3 - running
  10: 3,0,1 -> 2,3,4 - running
  11: 0,1,2 -> 3,4,5 - running

# ./kafka-mesos.sh rebalance status
Rebalance is idle: 
TESTING
  0: 1,3,0 -> 6,5,0 - error
  1: 2,0,1 -> 0,6,1 - error
  2: 3,1,2 -> 1,0,2 - error
  3: 0,2,3 -> 2,1,3 - done
  4: 1,0,2 -> 3,2,4 - done
  5: 2,1,3 -> 4,3,5 - done
  6: 3,2,0 -> 5,4,6 - done
  7: 0,3,1 -> 6,0,1 - error
  8: 1,2,3 -> 0,1,2 - error
  9: 2,3,0 -> 1,2,3 - done
  10: 3,0,1 -> 2,3,4 - done
  11: 0,1,2 -> 3,4,5 - done

# bin/kafka-topics.sh --zookeeper master0:2181 --topic TESTING --describe
Topic:TESTING PartitionCount:12 ReplicationFactor:5 Configs:
  Topic: TESTING  Partition: 0  Leader: 1 Replicas: 0,5,1,6,3 Isr: 0,5,1,6,3
  Topic: TESTING  Partition: 1  Leader: 2 Replicas: 0,6,1,2 Isr: 2,0,1,6
  Topic: TESTING  Partition: 2  Leader: 3 Replicas: 1,0,2,3 Isr: 3,1,2,0
  Topic: TESTING  Partition: 3  Leader: 2 Replicas: 2,1,3 Isr: 2,3,1
  Topic: TESTING  Partition: 4  Leader: 3 Replicas: 3,2,4 Isr: 2,3,4
  Topic: TESTING  Partition: 5  Leader: 4 Replicas: 4,3,5 Isr: 5,3,4
  Topic: TESTING  Partition: 6  Leader: 5 Replicas: 5,4,6 Isr: 5,6,4
  Topic: TESTING  Partition: 7  Leader: 3 Replicas: 6,0,1,3 Isr: 3,1,6,0
  Topic: TESTING  Partition: 8  Leader: 1 Replicas: 0,1,2,3 Isr: 1,2,3,0
  Topic: TESTING  Partition: 9  Leader: 2 Replicas: 1,2,3 Isr: 2,3,1
  Topic: TESTING  Partition: 10 Leader: 3 Replicas: 2,3,4 Isr: 2,3,4
  Topic: TESTING  Partition: 11 Leader: 5 Replicas: 3,4,5 Isr: 5,3,4


# ./kafka-mesos.sh rebalance 0..6
Rebalance started: 
TESTING
  0: 0,5,1,6,3 -> 2,1,3,4 - running
  1: 0,6,1,2 -> 3,2,4,5 - running
  2: 1,0,2,3 -> 4,3,5,6 - running
  3: 2,1,3 -> 5,4,6,0 - running
  4: 3,2,4 -> 6,5,0,1 - running
  5: 4,3,5 -> 0,6,1,2 - running
  6: 5,4,6 -> 1,0,2,3 - running
  7: 6,0,1,3 -> 2,3,4,5 - running
  8: 0,1,2,3 -> 3,4,5,6 - running
  9: 1,2,3 -> 4,5,6,0 - running
  10: 2,3,4 -> 5,6,0,1 - running
  11: 3,4,5 -> 6,0,1,2 - running

# ./kafka-mesos.sh rebalance status
Rebalance is idle: 
TESTING
  0: 0,5,1,6,3 -> 2,1,3,4 - done
  1: 0,6,1,2 -> 3,2,4,5 - done
  2: 1,0,2,3 -> 4,3,5,6 - done
  3: 2,1,3 -> 5,4,6,0 - done
  4: 3,2,4 -> 6,5,0,1 - done
  5: 4,3,5 -> 0,6,1,2 - done
  6: 5,4,6 -> 1,0,2,3 - done
  7: 6,0,1,3 -> 2,3,4,5 - done
  8: 0,1,2,3 -> 3,4,5,6 - done
  9: 1,2,3 -> 4,5,6,0 - done
  10: 2,3,4 -> 5,6,0,1 - done
  11: 3,4,5 -> 6,0,1,2 - done

# bin/kafka-topics.sh --zookeeper master0:2181 --topic TESTING --describe
Topic:TESTING PartitionCount:12 ReplicationFactor:4 Configs:
  Topic: TESTING  Partition: 0  Leader: 1 Replicas: 2,1,3,4 Isr: 1,2,3,4
  Topic: TESTING  Partition: 1  Leader: 2 Replicas: 3,2,4,5 Isr: 5,2,3,4
  Topic: TESTING  Partition: 2  Leader: 3 Replicas: 4,3,5,6 Isr: 5,6,3,4
  Topic: TESTING  Partition: 3  Leader: 5 Replicas: 5,4,6,0 Isr: 0,5,6,4
  Topic: TESTING  Partition: 4  Leader: 6 Replicas: 6,5,0,1 Isr: 0,5,1,6
  Topic: TESTING  Partition: 5  Leader: 0 Replicas: 0,6,1,2 Isr: 0,1,6,2
  Topic: TESTING  Partition: 6  Leader: 1 Replicas: 1,0,2,3 Isr: 0,1,2,3
  Topic: TESTING  Partition: 7  Leader: 3 Replicas: 2,3,4,5 Isr: 5,2,3,4
  Topic: TESTING  Partition: 8  Leader: 3 Replicas: 3,4,5,6 Isr: 5,6,3,4
  Topic: TESTING  Partition: 9  Leader: 4 Replicas: 4,5,6,0 Isr: 0,5,6,4
  Topic: TESTING  Partition: 10 Leader: 5 Replicas: 5,6,0,1 Isr: 0,5,1,6
  Topic: TESTING  Partition: 11 Leader: 6 Replicas: 6,0,1,2 Isr: 0,1,6,2

@dmitrypekar
Copy link
Contributor

This should be fixed after merging rebalance-replication-factor PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants