Skip to content

FEAT: Add max_records to kafka task#67

Merged
Divyanshu Tiwari (divyanshu-tiwari) merged 3 commits into
mainfrom
feat/kafka-max-records
May 28, 2026
Merged

FEAT: Add max_records to kafka task#67
Divyanshu Tiwari (divyanshu-tiwari) merged 3 commits into
mainfrom
feat/kafka-max-records

Conversation

@divyanshu-tiwari
Copy link
Copy Markdown
Contributor

Summary

  • Adds a max_records field to the kafka read task that stops the reader after N records have been forwarded downstream.
  • Independent from end_after (wall-clock) and retry_limit (idle-based); the three can be combined.
  • In group consumer mode, offsets up to the last forwarded record are committed on shutdown via the deferred c.Close().
  • README updated with the new field, an example, and a behavior note; new test pipeline test/pipelines/kafka_read_max_records.yaml added.

Test plan

  • Run test/pipelines/kafka_read_max_records.yaml against a populated topic and verify exactly 10 records are emitted before the reader stops.
  • Re-run the same pipeline in group-consumer mode and confirm offsets for the consumed records are committed (subsequent runs resume past them).
  • Run against an empty/low-traffic topic and confirm the reader still exits cleanly via end_after / retry_limit when max_records cannot be reached.
  • go build ./... is clean.

🤖 Generated with Claude Code

Stop the kafka reader after a fixed number of records have been
forwarded downstream. Independent from end_after (wall-clock) and
retry_limit (idle-based). In group mode, offsets up to the last
forwarded record are committed on shutdown via deferred Close().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 27, 2026 19:30
@divyanshu-tiwari
Copy link
Copy Markdown
Contributor Author

Copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a max_records read-mode limit to the Kafka pipeline task so the consumer can stop cleanly after forwarding a fixed number of messages, independent of existing end_after (wall-clock) and retry_limit (idle/error-based) stop conditions.

Changes:

  • Introduces max_records configuration and enforces the stop condition in the Kafka consumer read loop.
  • Updates Kafka task README with the new field, examples, and behavior notes.
  • Adds a new test pipeline YAML demonstrating max_records + end_after.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
test/pipelines/kafka_read_max_records.yaml Adds an example pipeline that stops after forwarding 10 Kafka records (with end_after safety net).
internal/pkg/pipeline/task/kafka/README.md Documents max_records, including usage example and interaction with end_after/retry_limit.
internal/pkg/pipeline/task/kafka/kafka.go Implements max_records counter and early exit after N forwarded records.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/pkg/pipeline/task/kafka/kafka.go Outdated
Adds `validate:"omitempty,gte=0"` so a negative max_records is
rejected at config-load time rather than silently behaving as
unlimited (which contradicts the documented "0 = unlimited"
semantics).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@alephys26 Yash Shrivastava (alephys26) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic, the kafka consumer would fetch records in batch, but would not forward all of them downstream. This will lead to dropped records.
Add flush of all records here as well as in end_after part. So that we ensure at_least_once delivery.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: This is valid for the kafka task. The ctx cancellation happens before a new read and the commit happens before the count increment. Both cases are valid.

The only pain point now is the failure of record processing in any downstream task. If it fails there we will drop the messages, since the commits are already there on kafka.

@divyanshu-tiwari
Copy link
Copy Markdown
Contributor Author

Correction: This is valid for the kafka task. The ctx cancellation happens before a new read and the commit happens before the count increment. Both cases are valid.

The only pain point now is the failure of record processing in any downstream task. If it fails there we will drop the messages, since the commits are already there on kafka.

It's a known issue, same as the SQS.

@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) merged commit 922a7de into main May 28, 2026
7 checks passed
@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) deleted the feat/kafka-max-records branch May 28, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants