Skip to content

FEAT: Add end_after and auto_offset_reset to kafka task#65

Merged
Divyanshu Tiwari (divyanshu-tiwari) merged 3 commits into
mainfrom
kafka-task-feats
May 19, 2026
Merged

FEAT: Add end_after and auto_offset_reset to kafka task#65
Divyanshu Tiwari (divyanshu-tiwari) merged 3 commits into
mainfrom
kafka-task-feats

Conversation

@divyanshu-tiwari
Copy link
Copy Markdown
Contributor

@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) commented May 18, 2026

Description

This pull request adds two new features to the Kafka pipeline task: a wall-clock read deadline via the end_after field and configurable group consumer offset reset behavior via the auto_offset_reset field. It also updates documentation and configuration to reflect these enhancements and ensure correct defaulting and validation.

Kafka read behavior enhancements:

  • Added an end_after field to the Kafka task, allowing the reader to stop cleanly after a specified wall-clock duration, regardless of message traffic. This is implemented in both the code (kafka.go) and documented with YAML examples and notes. [1] [2] [3] [4] [5] [6]
  • Introduced the auto_offset_reset field for group consumers, letting users choose whether to start from the earliest or latest offset when no committed offset is found or the stored offset is out of range. The default is earliest, and this is now configurable and validated. Documentation and examples have been updated accordingly. [1] [2] [3] [4] [5] [6] [7]

Documentation and configuration updates:

  • Expanded the README with details and YAML examples for both end_after and auto_offset_reset, clarifying their behavior and use cases. [1] [2] [3] [4]
  • Updated the test pipeline YAML to demonstrate the new fields in action.
  • Set the default value for auto_offset_reset to earliest in code and ensured it is always initialized. [1] [2]

Internal refactoring:

  • Changed the Kafka task struct to embed task.ServerBase instead of task.Base for improved code consistency.

Test

  • Tested the change with Amazon SERP pipeline. The auto_offset_reset: latest was able to point the consumer to latest available offset after the messages were deleted from the topic due to Kafka retention policy.
  • Verified the timed read by running the pipeline, without end_after the pipeline continues to pull keywords from the topic and with end_after it stopped pulling keywords after the specified time.

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation and I have updated the documentation accordingly.
  • I have added tests to cover my changes.

- end_after: wall-clock read deadline that stops the consumer regardless
  of traffic, mirroring the ServerBase pattern used by the SQS task.
- auto_offset_reset: exposes the group-mode reset policy so pipelines
  can opt into 'latest' (skip historical data) instead of the previously
  hardcoded 'earliest'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 18, 2026 12:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Kafka pipeline task’s read-mode behavior by (1) supporting a wall-clock cutoff via end_after and (2) making the consumer-group offset reset policy configurable via auto_offset_reset (defaulting to earliest). It also updates task docs and example pipeline config to reflect the new options.

Changes:

  • Embed task.ServerBase in the Kafka task so end_after is supported and enforced via a context timeout in the read loop.
  • Add auto_offset_reset to Kafka group-consumer configuration (validated to earliest|latest, default earliest).
  • Update Kafka task README and the test pipeline YAML to document and demonstrate the new fields.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
test/pipelines/kafka_read.yaml Adds example usage of end_after and auto_offset_reset in a Kafka read pipeline.
internal/pkg/pipeline/task/kafka/README.md Documents end_after and auto_offset_reset, including YAML examples and behavioral notes.
internal/pkg/pipeline/task/kafka/kafka.go Implements end_after read cutoff (via ServerBase.EndAfter) and configures group consumer auto.offset.reset from auto_offset_reset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) merged commit 6da69a3 into main May 19, 2026
7 checks passed
@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) deleted the kafka-task-feats branch May 19, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants