FEAT: Support for S3 object tagging in file task by divyanshu-tiwari · Pull Request #59 · patterninc/caterpillar

Divyanshu Tiwari (divyanshu-tiwari) · 2026-04-20T06:10:03Z

Description

This pull request adds support for applying S3 object tags when writing files to S3 using the file pipeline task. It introduces a new tags configuration option, validates S3 tagging constraints, and ensures tags are applied to both data files and optional success markers. The documentation is updated with usage details and examples.

S3 Object Tagging Support

Added a tags field to the file task configuration, allowing users to specify S3 object tags as a map of key-value pairs. Tag values support macros and context templates. [1] [2]
Implemented validation for S3 tagging constraints: maximum of 10 tags, key length up to 128 UTF-16 code units, and value length up to 256 UTF-16 code units. Key uniqueness is enforced by the map structure. [1] [2] [3]
Ensured that tags are applied to both the main S3 object and the optional success_file marker when writing.
Tag values are evaluated per record, enabling dynamic tagging based on record content or context.

Documentation Updates

Updated the README.md for the file task to document the new tags option, S3 tagging constraints, and provided a configuration example. [1] [2]

Types of changes

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

My code follows the code style of this project.
My change requires a change to the documentation and I have updated the documentation accordingly.
I have added tests to cover my changes.

Adds a `tags` field to the file task that is applied as the `x-amz-tagging` header on S3 PutObject (including the optional _SUCCESS marker). Tag values support macro/context templating so they can be evaluated per record. Validates S3 limits: up to 10 tags per object, key length up to 128 UTF-16 code units, and value length up to 256 UTF-16 code units. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wiz-55ccc8b716 · 2026-04-20T06:10:30Z

Wiz Scan Summary

Scanner	Findings
Vulnerabilities	-
Sensitive Data	-
Secrets	-
IaC Misconfigurations	-
SAST Findings	1
Software Management Findings	-

Total	1

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

Copilot

Pull request overview

Adds support for applying S3 object tags when writing via the file pipeline task, including config/schema updates and documentation.

Changes:

Introduces tags in file task config (map of tag key → templated value).
Applies tags to S3 PutObject requests and adds validation for S3 tag limits.
Documents the new tags option, limits, and an example configuration.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
internal/pkg/pipeline/task/file/s3.go	Build/encode S3 tagging header per record; validate S3 tag constraints; add UTF-16 length helper.
internal/pkg/pipeline/task/file/file.go	Add `Tags` to task config; validate tags at startup; propagate tags to success marker writer.
internal/pkg/pipeline/task/file/README.md	Document `tags` option, limits, and usage example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Move tag validation out of task startup into writeS3File so read mode and local-scheme writes are unaffected by tag config. - Fix UTF-16 length accounting to use utf16.Encode (handles surrogate code points safely instead of letting RuneLen return -1). - Substitute an empty record when buildTags is called with nil (the _SUCCESS marker case) so unresolved {{ context }} placeholders surface as an explicit error rather than being uploaded verbatim. - Document the success-marker restriction in the file task README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds first-class support for applying Amazon S3 object tags when the file pipeline task writes to S3, including per-record tag value evaluation and documented configuration/limits.

Changes:

Introduces a tags field on the file task config (map of tag key → templated value).
Applies resolved tags on S3 PutObject uploads, including the optional _SUCCESS marker.
Adds validation for S3 tag count and UTF-16 key/value length limits, and updates task documentation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
internal/pkg/pipeline/task/file/s3.go	Builds URL-encoded tag strings for S3 uploads and validates tagging constraints.
internal/pkg/pipeline/task/file/file.go	Adds `Tags` to the task configuration and propagates tags to success-marker writes.
internal/pkg/pipeline/task/file/README.md	Documents the new `tags` option, constraints, and usage examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

validateS3Tags runs on every writeS3File call, not just the first. Update the README to match so the described timing matches the code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prasadlohakpure · 2026-04-20T12:54:48Z

+
+The `_SUCCESS` marker is not tied to any record, so tag values for the success marker must only use static strings or startup-time templates (`env`, `secret`, `macro`). A tag that references `{{ context "..." }}` will fail at the success-marker write with `context keys were not set: ...`, since there is no record context to resolve against.
+
+If you need record-derived tag values, either drop the context reference from the success-marker tags, or disable `success_file`.


Nit pick:

context reference from the success-marker tags

Same tags would be applied for both files, right?
So do you think we should change success-marker tags > tags ?

Copilot AI review requested due to automatic review settings April 20, 2026 06:10

Divyanshu Tiwari (divyanshu-tiwari) requested a review from a team as a code owner April 20, 2026 06:10

Copilot started reviewing on behalf of Divyanshu Tiwari (divyanshu-tiwari) April 20, 2026 06:10 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Comment thread internal/pkg/pipeline/task/file/README.md

Comment thread internal/pkg/pipeline/task/file/file.go Outdated

Comment thread internal/pkg/pipeline/task/file/file.go

Comment thread internal/pkg/pipeline/task/file/s3.go Outdated

Divyanshu Tiwari (divyanshu-tiwari) requested a review from Copilot April 20, 2026 06:24

Copilot started reviewing on behalf of Divyanshu Tiwari (divyanshu-tiwari) April 20, 2026 06:24 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Comment thread internal/pkg/pipeline/task/file/s3.go

Comment thread internal/pkg/pipeline/task/file/s3.go

Comment thread internal/pkg/pipeline/task/file/README.md Outdated

docs: align tag validation description with implementation

6f29753

validateS3Tags runs on every writeS3File call, not just the first. Update the README to match so the described timing matches the code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prasadlohakpure reviewed Apr 20, 2026

View reviewed changes

prasadlohakpure approved these changes Apr 21, 2026

View reviewed changes

Divyanshu Tiwari (divyanshu-tiwari) merged commit 18da6e8 into main Apr 22, 2026
7 checks passed

Divyanshu Tiwari (divyanshu-tiwari) deleted the DATA-8036_s3_tag_support branch April 22, 2026 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Support for S3 object tagging in file task#59

FEAT: Support for S3 object tagging in file task#59
Divyanshu Tiwari (divyanshu-tiwari) merged 3 commits intomainfrom
DATA-8036_s3_tag_support

Divyanshu Tiwari (divyanshu-tiwari) commented Apr 20, 2026

Uh oh!

wiz-55ccc8b716 Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

prasadlohakpure Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		The `_SUCCESS` marker is not tied to any record, so tag values for the success marker must only use static strings or startup-time templates (`env`, `secret`, `macro`). A tag that references `{{ context "..." }}` will fail at the success-marker write with `context keys were not set: ...`, since there is no record context to resolve against.

		If you need record-derived tag values, either drop the context reference from the success-marker tags, or disable `success_file`.

Conversation

Divyanshu Tiwari (divyanshu-tiwari) commented Apr 20, 2026

Description

Types of changes

Checklist

Uh oh!

wiz-55ccc8b716 Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Wiz Scan Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

prasadlohakpure Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wiz-55ccc8b716 Bot commented Apr 20, 2026 •

edited

Loading

prasadlohakpure Apr 20, 2026 •

edited

Loading