Skip to content

FEAT: Support for S3 object tagging in file task#59

Merged
Divyanshu Tiwari (divyanshu-tiwari) merged 3 commits intomainfrom
DATA-8036_s3_tag_support
Apr 22, 2026
Merged

FEAT: Support for S3 object tagging in file task#59
Divyanshu Tiwari (divyanshu-tiwari) merged 3 commits intomainfrom
DATA-8036_s3_tag_support

Conversation

@divyanshu-tiwari
Copy link
Copy Markdown
Contributor

Description

This pull request adds support for applying S3 object tags when writing files to S3 using the file pipeline task. It introduces a new tags configuration option, validates S3 tagging constraints, and ensures tags are applied to both data files and optional success markers. The documentation is updated with usage details and examples.

S3 Object Tagging Support

  • Added a tags field to the file task configuration, allowing users to specify S3 object tags as a map of key-value pairs. Tag values support macros and context templates. [1] [2]
  • Implemented validation for S3 tagging constraints: maximum of 10 tags, key length up to 128 UTF-16 code units, and value length up to 256 UTF-16 code units. Key uniqueness is enforced by the map structure. [1] [2] [3]
  • Ensured that tags are applied to both the main S3 object and the optional success_file marker when writing.
  • Tag values are evaluated per record, enabling dynamic tagging based on record content or context.

Documentation Updates

  • Updated the README.md for the file task to document the new tags option, S3 tagging constraints, and provided a configuration example. [1] [2]

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation and I have updated the documentation accordingly.
  • I have added tests to cover my changes.

Adds a `tags` field to the file task that is applied as the
`x-amz-tagging` header on S3 PutObject (including the optional
_SUCCESS marker). Tag values support macro/context templating so
they can be evaluated per record. Validates S3 limits: up to 10
tags per object, key length up to 128 UTF-16 code units, and
value length up to 256 UTF-16 code units.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wiz-55ccc8b716
Copy link
Copy Markdown

wiz-55ccc8b716 Bot commented Apr 20, 2026

Wiz Scan Summary

Scanner Findings
Vulnerability Finding Vulnerabilities -
Data Finding Sensitive Data -
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
SAST Finding SAST Findings 1 Medium
Software Management Finding Software Management Findings -
Total 1 Medium

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for applying S3 object tags when writing via the file pipeline task, including config/schema updates and documentation.

Changes:

  • Introduces tags in file task config (map of tag key → templated value).
  • Applies tags to S3 PutObject requests and adds validation for S3 tag limits.
  • Documents the new tags option, limits, and an example configuration.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
internal/pkg/pipeline/task/file/s3.go Build/encode S3 tagging header per record; validate S3 tag constraints; add UTF-16 length helper.
internal/pkg/pipeline/task/file/file.go Add Tags to task config; validate tags at startup; propagate tags to success marker writer.
internal/pkg/pipeline/task/file/README.md Document tags option, limits, and usage example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/pkg/pipeline/task/file/README.md
Comment thread internal/pkg/pipeline/task/file/file.go Outdated
Comment thread internal/pkg/pipeline/task/file/file.go
Comment thread internal/pkg/pipeline/task/file/s3.go Outdated
- Move tag validation out of task startup into writeS3File so read
  mode and local-scheme writes are unaffected by tag config.
- Fix UTF-16 length accounting to use utf16.Encode (handles surrogate
  code points safely instead of letting RuneLen return -1).
- Substitute an empty record when buildTags is called with nil (the
  _SUCCESS marker case) so unresolved {{ context }} placeholders
  surface as an explicit error rather than being uploaded verbatim.
- Document the success-marker restriction in the file task README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class support for applying Amazon S3 object tags when the file pipeline task writes to S3, including per-record tag value evaluation and documented configuration/limits.

Changes:

  • Introduces a tags field on the file task config (map of tag key → templated value).
  • Applies resolved tags on S3 PutObject uploads, including the optional _SUCCESS marker.
  • Adds validation for S3 tag count and UTF-16 key/value length limits, and updates task documentation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
internal/pkg/pipeline/task/file/s3.go Builds URL-encoded tag strings for S3 uploads and validates tagging constraints.
internal/pkg/pipeline/task/file/file.go Adds Tags to the task configuration and propagates tags to success-marker writes.
internal/pkg/pipeline/task/file/README.md Documents the new tags option, constraints, and usage examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/pkg/pipeline/task/file/s3.go
Comment thread internal/pkg/pipeline/task/file/s3.go
Comment thread internal/pkg/pipeline/task/file/README.md Outdated
validateS3Tags runs on every writeS3File call, not just the first.
Update the README to match so the described timing matches the code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The `_SUCCESS` marker is not tied to any record, so tag values for the success marker must only use static strings or startup-time templates (`env`, `secret`, `macro`). A tag that references `{{ context "..." }}` will fail at the success-marker write with `context keys were not set: ...`, since there is no record context to resolve against.

If you need record-derived tag values, either drop the context reference from the success-marker tags, or disable `success_file`.
Copy link
Copy Markdown
Contributor

@prasadlohakpure prasadlohakpure Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick:

context reference from the success-marker tags

Same tags would be applied for both files, right?
So do you think we should change success-marker tags > tags ?

@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) merged commit 18da6e8 into main Apr 22, 2026
7 checks passed
@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) deleted the DATA-8036_s3_tag_support branch April 22, 2026 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants