Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpk: Enhance partition management commands #9205

Closed
daisukebe opened this issue Mar 2, 2023 · 4 comments · Fixed by #13258 or #13684
Closed

rpk: Enhance partition management commands #9205

daisukebe opened this issue Mar 2, 2023 · 4 comments · Fixed by #13258 or #13684
Assignees
Labels

Comments

@daisukebe
Copy link
Contributor

daisukebe commented Mar 2, 2023

Who is this for and what problem do they have today?

Revised description:
Add the following new commands under rpk cluster partitions, move, move-status, move-cancel by take advantage of powerful admin APIs.

Complete plan is in #9205 (comment)

Original description:
rpk should have commands that invoke the new API in 23.1, AlterPartitionReassignments and ListPartitionReassigments

What are the success criteria?

Revised:
Implement move, move-status, and move-cancel commands

Original:
rpk should be able to change replica assignments easily

Why is solving this problem impactful?

Improve supportability and troubleshooting

Additional notes

We may go in the same way as https://github.com/twmb/kcl/blob/master/commands/admin/partas/partition_assignments.go does?

@jrkinley jrkinley self-assigned this Mar 6, 2023
@daisukebe
Copy link
Contributor Author

@twmb @r-vasquez @jrkinley
Does it make sense to add a new command under the rpk topic namespace? Here are rough examples where I'm borrowing the idea from the trim-prefix command in terms of reading from a file.

Alter the replica location of partition 1 in 'foo' topic to 2, 3, and 4

rpk topic alter-partition foo --partition 1 --replicas 2,3,4

Cancel the ongoing reassignment

rpk topic alter-partition foo --partition 1 

Alter the location from a JSON file

rpk topic alter-partition --from-file /tmp/alt_partition.json

@twmb
Copy link
Contributor

twmb commented Aug 10, 2023

  • Cancel -- we might want to be explicit and require a flag -- i.e. either --replicas or --cancel should be specified
  • +1 to the rest -- and we probably want --partition to support multiple partitions at once
  • We need a corresponding command to list partition reassignments. Maybe this could be alter-partition --list or list-alter-partitions

Another two syntax alternatives if we want to support multiple partitions to different replicas is

rpk topic alter-partition foo 1:2,3,4 2:4,3,2
rpk topic alter-partition foo -p 1:2,3,4 -p 2:3,4,5

basically {partition}:{replica},{replica}..., either from a flag or as an arg. I'm undecided on what's best. The simpler idea is easiest to understand but requires multiple invocations if you want to move different partitions to different replicas, but the more complicated idea is more complicated, and how common is it anyway..?

@daisukebe
Copy link
Contributor Author

daisukebe commented Aug 18, 2023

Thanks @twmb.

Cancel -- we might want to be explicit and require a flag -- i.e. either --replicas or --cancel should be specified

This makes sense as long as we adopt only one partition reassignment when not using --from-file because kam.AlterPartitionAssignments requires at least one topic-partition to be specified. Probably the benefit of this approach, vs adopting multiple partitions at once, is its simplicity. If we want to move multiple partitions at once, we can use --from-file.

We need a corresponding command to list partition reassignments. Maybe this could be alter-partition --list or list-alter-partitions

Yes and per my research I feel the command will become more powerful when using the /v1/partitions/reconfigurations endpoint than using ListPartitionReassignments because the former exposes the progress in a granular fashion nowadays per #10201.

Wdyt?

@daisukebe
Copy link
Contributor Author

daisukebe commented Aug 30, 2023

Upon further discussion with @twmb offline, we're leaning toward this direction; where we use admin APIs entirely instead of Kafka API because the former is getting more powerful nowadays, hence insightful to users.

Changes we're going to make at glance

  • Add move, move-status, move-cancel
  • Hide movement-cancel (hide by making new command, marking it hidden, use the same Run function as move-cancel)

Command details (priority in descending order)

Show status of partition movement
rpk cluster partitions move-status TOPICS... --partition {partition},{partition}...

Command specs:

  • Entirely admin API (/v1/partitions/reconfigurations)
  • Optional partition flag
  • Optional TOPICS
  • The command reports the progress in bytes (with -H available) and percentage, similar to decommission-status
  • If a TOPIC is missing the {namespace}/ prefix (note the slash), we assume kafka/. If the topic has a slash, then we accept as is (this allows showing internal topics).
  • If no TOPIC, all topics and all partitions are printed
  • If no --partition, all partitions for the given topics are printed
  • If --partition and TOPIC, only the set of partitions for the requested topics are printed
  • If --partition and no TOPIC, fail because it doesn't make any sense to print specific partitions in any topics

Cancel ongoing partition movements
rpk cluster partitions move-cancel TOPICS... --partition {partition},{partition} --node {node}

Command specs:

  • Entirely admin API
  • Optional TOPICS
  • Optional partition flag
  • --node and --partition are mutually exclusive
  • --node and TOPIC are mutually exclusive
  • If a TOPIC is missing the {namespace}/ prefix (note the slash), we assume kafka/. If the topic has a slash, then we accept as is (this allows canceling internal topic movements).
  • If no TOPIC and --partition, cancel everything (this already exists)
  • If --node, cancel all movement on the requested node (this already exists)
  • If TOPIC with no --partition: cancel movements for all partitions for the requested topics (ref: Add an admin API to cancel partition reassignments for a batch of partitions #13171)
  • If TOPIC with --partition, cancel only requested partitions
  • If no TOPIC but yes --partition: fail invalid

Move partition replicas
rpk cluster partitions move TOPICS... --partition {partition:replica,replica,replica} (repeatable)
rpk cluster partitions move -p {topic/partition:replica,replica,replica}(repeatable)

Command specs:

  • Entirely admin API (POST /v1/partitions/{namespace}/{topic}/{partition}/replicas)
  • If a TOPIC is missing a namespace/ prefix (note the slash), we assume kafka/. If the topic has a slash, then we accept as is (this allows moving internal topics).
  • TOPICS is optional. If it exists, all --partition must be number:number,number,.. (for partition:replica,replica,...).
  • If no TOPIC, each --partition must have {topic}/ prefix. Otherwise fail as invalid

Full command space at the end

rpk cluster partitions move            // alias: alter
rpk cluster partitions move-status     // alias: alter-status
rpk cluster partitions move-cancel     // alias: alter-cancel, hidden alias: movement-cancel (hide by making new command, marking it hidden, and use the same Run function)
rpk cluster partitions balancer-status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants