Skip to content

Conversation

@ayirr7
Copy link
Contributor

@ayirr7 ayirr7 commented Jun 5, 2025

A really simple way to unblock the work where a batching step happens before parsing. I can also introduce the concept of Schema in this PR, where we have an enum of all the allowed schemas / topic names.

We should not stick with this in the long term, and likely enforce all Accumulators/Reduce steps to output a Tuple[MutableSequence[InputType], Schema]. Making this is a more significant typing change as it involves making sure primitives like FlatMap are also able to handle that output type.

@ayirr7 ayirr7 requested a review from fpacifici June 6, 2025 20:32
Copy link
Collaborator

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see the comments in line

schema=None,
)

if isinstance(payload, MutableSequence) and isinstance(payload[0], tuple):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only a MutableSequence? Shouldn't this work with Sequences as well ?

Comment on lines 177 to 198

if isinstance(payload, MutableSequence) and isinstance(payload[0], tuple):
batch = []
schema = None
for tup in payload:
batch.append(tup[0])

schema = payload[0][1]

msg = PyMessage(
payload=batch,
headers=[],
timestamp=timestamp,
schema=schema,
)
else:
msg = PyMessage(
payload=payload,
headers=[],
timestamp=timestamp,
schema=None,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let's write unit tests for this logic (there is already a test_reduce_delegate you can add onto).
As this is a bit of a hack, a good way to keep it under control is to have this behavior well tested so that when we refactor this we will not miss corner cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do. Initially I just wanted to check if we're okay with this sort of hack in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is already checking what we need here. I made it a bit more "complex"

@ayirr7 ayirr7 changed the title add msg schema to batch quick-fix: Ensure schema is retained after a batching step Jun 11, 2025
@ayirr7 ayirr7 marked this pull request as ready for review June 11, 2025 18:38
@ayirr7 ayirr7 merged commit 24673ae into main Jun 12, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants