Skip to content

Kafka Assigner

Todd Palino edited this page Apr 28, 2016 · 3 revisions

Sometimes, working with partition assignments in Kafka clusters is a pain. The standard admin CLI tools are quite simple, and do not make it easy to perform simple tasks such as "remove a broker from the cluster". Over time, LinkedIn SRE developed a number of tools for working with partitions in clusters, and they've now been consolidated into a single script that performs most common functions.

Prerequisites

In order to run kafka-assigner, you will need to have the following Python modules installed:

  • Paramiko
  • Kazoo

In addition, you will need to run it on a host that has the following:

  • A copy of the Kafka admin tools (including kafka-reassign-partitions.sh)
  • Access to the Zookeeper ensemble for the cluster
  • SSH access to the Kafka brokers (with credentials preferably loaded into ssh-agent)

Running kafka-assigner.py

At the high level, kafka-assigner is run as follows:

kafka-assigner.py -z <zkhost:port/path> [OPTIONS] <module name> [MODULE OPTIONS]

The argument to the -z command line option is the full zookeeper connect string for your Kafka cluster. So if your zookeeper host is zook.example.com, running on port 2181, and the Kafka cluster uses a chroot path of /kafka/clustername, then the argument is zook.example.com:2181/kafka/clustername.

The following command line options can be used as [OPTIONS] and are all optional:

Option Argument Default Description
--leadership none Show the cluster leadership balance before and after module processing
--generate none Generate the partition reassignment file(s) and print them out
--execute none Execute the partition reassignment (if omitted, dry run only)
--moves integer 10 The number of partition moves to execute in a single step
--ple-size integer 900000 Max size in bytes for a preferred leader election string
--ple-wait integer 300 Time in seconds to wait between preferred leader elections
--tools-path path none Path to Kafka admin utilities, overriding the PATH env var

Module Documentation

The following modules are currently available and can be specified as the module name. Click through for specifics on using the module

Module Description
[[clone module-clone]]
[[trim module-trim]]
[[remove module-remove]]
[[elect module-elect]]
[[set-replication-factor module-set-replication-factor]]
[[reorder module-reorder]]
[[balance module-balance]]

Known Issues and TODOs

  • Positional arguments suck. This is the way argparse works for common arguments, however, so we're stuck with it for now. If someone has a better way I'd love to see it.
  • There should be helper functions for changing the partition replica lists that maintain the partition and the brokers at the same time.
  • The reorder module is very heavy handed with the way it operates, as it doesn't consider the initial state of the cluster. It should be changed to perform a minimal number of moves.
  • The batching of partition moves is pretty simple. It could be optimized to move similar sized partitions in a batch, as well as only having 1 move per broker tuple in a batch
  • A rack aware balancing module would be awesome