Add proposal for a router impl by tombentley · Pull Request #112 · kroxylicious/design

tombentley · 2026-05-22T04:04:14Z

No description provided.

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com> Signed-off-by: Tom Bentley <tombentley@users.noreply.github.com>

Signed-off-by: Tom Bentley <tbentley@redhat.com>

k-wall · 2026-05-22T14:31:14Z

+**Topic-based partitioning.** 
+Each topic is owned by exactly one upstream cluster.
+The router determines the owning cluster from the topic name (via an unambiguous mapping) and routes data-plane operations accordingly.
+When a single client request references topics on different clusters (e.g. a `PRODUCE` batch containing records for topics on two clusters), the router decomposes the request, fans out the sub-requests, fans-in the sub-responses and the recomposes a single response to be returned to the client.


"PRODUCE batch containing records for topics on two clusters". When would this happen in Kafka?

Something is confused here. With the virtualised node id scheme described in the Route API proposal every broker had a unique node id. The client knows all the node ids. Right?

Producing Kafka clients work by connecting to the broker that is the topic leader. So in the in the case where a client needs to send to two topic on different clusters, the client knows this and will be making two broker connections.

It will send a produce request for topic A to the first, and a produce request for topic B in the second.

The produce request that the router sees won't need to be decomposed because it will only contain topic partition request for the 'correct' broker.

Am I missing something?

You're right that with the current dedicated (one-to-one) BijectiveNodeIdMapping, a well-behaved Kafka producer will not send a PRODUCE request spanning routes. Each virtual node belongs to exactly one route, the client batches by partition leader, and so each ProduceRequest naturally contains only topics from one route.

However, the Router API is designed to support different NodeIdMapping strategies (see proposal 070's "Node ID mapping implementation" section). The current dedicated mapping is one strategy (the first one we'd implement).

I wanted to have an API which didn't lock router implementation into only working like that. I think it would be a strategic mistake to pour a lot of effort into implementing routers that could only work with a single kind of deployment if there's a relatively simple way to do something much more general.

For example: We may want to support future strategies where a single virtual node may serve brokers from multiple routes — for example, if proxy instances were presented to clients as broker nodes. With such a shared (many-to-one) mapping, a single virtual node fronts brokers from different clusters, and a client's leader-directed PRODUCE request can contain topics from different routes.

The router doesn't know which mapping is in use — that's a runtime implementation detail. So it must handle the general case. Decomposition logic would ensure correctness regardless of the mapping strategy.

That said, your observation prompted a clarification this in the proposal — the overview now explains when decomposition is and isn't needed depending on the mapping, and proposal 070 now describes the dedicated and shared strategies concretely.

Thank you. The explanation make sense. Having the API shaped in such a way to support the alternative pattern is useful.

It worries me that the Data Virtualisation Router PR resulting from this proposal will have code paths that we can't hit in production. I mean it will only ever see requests Produce/Fetch requests that don't need to be decomposed - so it we'll have redundant code. YAGNI tells us that is a the wrong approach. We can enhance Data Virtualisation to support other NodeIdMapping later if the need arrives.

Signed-off-by: Tom Bentley <tbentley@redhat.com>

k-wall · 2026-05-26T14:45:06Z

+| **Data** | `FETCH` | Split by partition leader (see _Leader caching_ below). |
+| **Data** | `LIST_OFFSETS` | Split by partition leader. |
+| **Metadata** | `METADATA` | Three variants: all-topics (fan out to all routes, filter responses to owned topics), empty-topics/broker-info-only (fan out to all routes for broker discovery), specific topics (split by ownership). Broker lists unioned by node ID. |
+| **Metadata** | `API_VERSIONS` | Forwarded to the specific broker via `sendRequestToNode(route, virtualNodeId, ...)`; response capped (see _Version capping_). |


I don't understand how the Router implementation knows the node id in order to call RoutingContext#sendRequest

Neither the Router or RouterContext have sight of it.

On the branch you have a version of the io.kroxylicious.proxy.routing.RoutingContext#sendRequest that doesn't require a node id.

k-wall · 2026-05-26T14:50:14Z

+| **Data** | `FETCH` | Split by partition leader (see _Leader caching_ below). |
+| **Data** | `LIST_OFFSETS` | Split by partition leader. |
+| **Metadata** | `METADATA` | Three variants: all-topics (fan out to all routes, filter responses to owned topics), empty-topics/broker-info-only (fan out to all routes for broker discovery), specific topics (split by ownership). Broker lists unioned by node ID. |
+| **Metadata** | `API_VERSIONS` | Forwarded to the specific broker via `sendRequestToNode(route, virtualNodeId, ...)`; response capped (see _Version capping_). |


If the Router implementation is agnostic to the NodeIdMapping strategy, then surely the ApiVersions response needs to be a logical AND of the all the brokers. I say this because we need to make sure the client sends requests that the can are intelligible to any of the brokers and the router can't predict in advance which broker(s) the channel will end up needing to talk to.

k-wall · 2026-05-26T14:54:05Z

+**Security consideration.**
+The topic ID resolution cache does not perform authorisation checks.
+A client that knows a topic ID can discover the corresponding topic name (and confirm the topic's existence) by submitting a request containing that ID, regardless of whether the client is authorised to access the topic.
+In practice we do not consider this a significant weakness: topic IDs are UUIDs with a vast keyspace that is neither guessable nor enumerable (for pratical purposes), so an attacker would need to have obtained a valid topic ID from another source (e.g. logs or a compromised client), at which point they likely already know the topic name.


Suggested change

In practice we do not consider this a significant weakness: topic IDs are UUIDs with a vast keyspace that is neither guessable nor enumerable (for pratical purposes), so an attacker would need to have obtained a valid topic ID from another source (e.g. logs or a compromised client), at which point they likely already know the topic name.

In practice we do not consider this a significant weakness: topic IDs are UUIDs with a vast keyspace that is neither guessable nor enumerable (for practical purposes), so an attacker would need to have obtained a valid topic ID from another source (e.g. logs or a compromised client), at which point they likely already know the topic name.

k-wall · 2026-05-26T15:02:44Z

+Subject-based routing is only needed for transactional producers, which have the stronger coordinator-affinity requirements described in the _Overview_.
+
+PID mappings are evicted after a configurable TTL (default: 7 days).
+If a mapping is evicted and a subsequent `PRODUCE` arrives, the router returns `UNKNOWN_PRODUCER_ID`, which causes the Kafka producer to re-initialise.


I think this strategy works even if proxy replicas are scaled, right. If a producing client reconnects any lands on a different proxy, it will follow the UNKNOWN_PRODUCER_ID path. The flow is the same as for eviction case.

k-wall · 2026-05-26T15:07:02Z

+If a mapping is evicted and a subsequent `PRODUCE` arrives, the router returns `UNKNOWN_PRODUCER_ID`, which causes the Kafka producer to re-initialise.
+
+
+### KIP-227 fetch session management


I think this section could use a bit more details.

k-wall · 2026-05-26T15:08:11Z

+These two sides are independent: a pre-v7 client (no session support) can still benefit from server-side sessions with backends, and vice versa.
+
+When a backend sends an _incremental_ response (containing only changed partitions), the router reconstructs a _full_ response using its cached session state before merging across routes.
+This ensures the client always receives a complete picture.


Are you saying the client will always receive a full-response when using this Router? Isn't that negaing KIP-227?

k-wall · 2026-05-26T15:10:23Z

+* `CONSUMER_GROUP_HEARTBEAT`, `CONSUMER_GROUP_DESCRIBE`
+* `OFFSET_COMMIT`, `OFFSET_FETCH` (when the subject is assigned to a route)
+
+**Prerequisite**: subject routing requires that the client's identity is known, which in practice means SASL termination must occur before the router in the topology.


Why doesn't downstream TLS client auth work for Subject routing?

k-wall · 2026-05-26T15:15:32Z

+
+### Assignment rejection
+
+`CREATE_TOPICS` and `CREATE_PARTITIONS` accept optional explicit broker assignments (replica placement lists that name specific broker node IDs).


Shouldn't create_topic need reject topic creations where topic name would be unroutable? Otherwise, the client would be allowed to create topics that would be invisible.

k-wall · 2026-05-26T15:21:38Z

+| **Offsets** | `OFFSET_FOR_LEADER_EPOCH` | Split by partition leader. |
+| **Admin** | `CREATE_TOPICS` | Split by topic ownership; reject explicit broker assignments (see _Assignment rejection_). |
+| **Admin** | `DELETE_TOPICS` | Split by topic ownership. |
+| **Admin** | `CREATE_PARTITIONS` | Split by topic ownership; reject explicit broker assignments. |


ALTER_PARTITION_REASSIGNMENTS should split by topic ownership. Same for LIST_PARTITION_REASSIGNMENTS It also needs to validate that the request does not try to move partitions between clusters. To be honest, I think it might be better to deliberately leave these out of scope for v1.

k-wall · 2026-05-26T15:25:50Z

+Many request types batch operations across multiple topics in a single request (e.g. a `PRODUCE` request may contain records for topics owned by different clusters).
+The router must decompose such requests, fan out the sub-requests in parallel, and recompose a single response.
+Beyond basic produce and consume, stateful protocol features — idempotent producers, transactions, consumer groups, fetch sessions — each add further complexity.
+This proposal addresses these features incrementally, with the simplest cases (produce, metadata) first and the more complex cases (transactions, consumer groups) building on that foundation.


I presume we are deferring things like Share Consumers for another proposal.

There are a lot of Kafka features which were omitted from the POC in the name of time, including share groups. They use topic ids, so there's a prereq to solve that. But otherwise I'm not aware of any showstoppers (which doesn't mean there aren't any). We could try to bite off more features as part of this proposal and its implementation (aiming for completeness), or we could prioritise timely delivery. The bottleneck will likely be the code review which we'd all need to feel comfortable that we had an understanding of the code.

They use topic ids, so there's a prereq to solve that. But otherwise I'm not aware of any showstoppers (which doesn't mean there aren't any). We could try to bite off more features as part of this proposal and its implementation (aiming for completeness), or we could prioritise timely delivery.

I'd favour the latter. The sooner we get a release out with the feature, the sooner we get feedback on real use-cases. There's also a chance we can decompose work on the missing parts into separate issues, that could be worked in parallel, perhaps by community members.

The proposal just need to enumerate what's out of scope.

k-wall · 2026-05-26T16:29:43Z

Couple more comments, but otherwise heading in the right direction.

k-wall · 2026-05-27T15:04:21Z

@tombentley please add a TL;DR to the PR's description. Also fix up the PR's title to reflect the proposal's name. Thank you.

tombentley and others added 11 commits May 21, 2026 16:31

Routing API

567e375

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Apply suggestions from code review

88e1bc3

Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com> Signed-off-by: Tom Bentley <tombentley@users.noreply.github.com>

Clarify wording

dc16dce

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Rename routing proposal for new numbering scheme

5a00700

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Add number to title

e5c581e

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Start rework

ccf38a6

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Big reworking

689e40b

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Tweak following review of the router impl proposal

90f82be

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Add a proposal for a router impl

2ffaee3

Signed-off-by: Tom Bentley <tbentley@redhat.com>

rm proposal destained for a separate branch

ceb3cd7

Signed-off-by: Tom Bentley <tbentley@redhat.com>

Use right number

643c31c

Signed-off-by: Tom Bentley <tbentley@redhat.com>

k-wall reviewed May 22, 2026

View reviewed changes

Fix for Keith's comment

671ab18

Signed-off-by: Tom Bentley <tbentley@redhat.com>

k-wall reviewed May 26, 2026

View reviewed changes

tombentley mentioned this pull request Jun 5, 2026

Routing API #70

Open

	In practice we do not consider this a significant weakness: topic IDs are UUIDs with a vast keyspace that is neither guessable nor enumerable (for pratical purposes), so an attacker would need to have obtained a valid topic ID from another source (e.g. logs or a compromised client), at which point they likely already know the topic name.
	In practice we do not consider this a significant weakness: topic IDs are UUIDs with a vast keyspace that is neither guessable nor enumerable (for practical purposes), so an attacker would need to have obtained a valid topic ID from another source (e.g. logs or a compromised client), at which point they likely already know the topic name.

		If a mapping is evicted and a subsequent `PRODUCE` arrives, the router returns `UNKNOWN_PRODUCER_ID`, which causes the Kafka producer to re-initialise.


		### KIP-227 fetch session management


		### Assignment rejection

		`CREATE_TOPICS` and `CREATE_PARTITIONS` accept optional explicit broker assignments (replica placement lists that name specific broker node IDs).

Uh oh!

Conversation

tombentley commented May 22, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-wall May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-wall May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-wall May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-wall May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-wall May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-wall commented May 26, 2026

Uh oh!

k-wall commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tombentley commented May 22, 2026 •

edited by github-actions Bot

Loading

k-wall May 22, 2026 •

edited

Loading

k-wall May 26, 2026 •

edited

Loading

k-wall May 26, 2026 •

edited

Loading

k-wall May 26, 2026 •

edited

Loading

k-wall May 26, 2026 •

edited

Loading

k-wall commented May 27, 2026 •

edited

Loading