A GitHub Action to manage AWS Glue catalog partitions. Seamlessly add, delete, and check partition existence in your data lake workflows.
- Add partitions - Register new partitions after data lands in S3
- Delete partitions - Remove old partitions for data lifecycle management
- Check existence - Conditionally execute workflow steps based on partition presence
- Idempotent operations - Safe to run multiple times with
if-not-existsflag - Table inheritance - Automatically inherits storage descriptor from parent table
- Cross-account support - Works with cross-account Glue catalogs
- Custom storage descriptors - Override table defaults when needed
Configure AWS credentials before using this action. We recommend aws-actions/configure-aws-credentials@v4:
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/my-github-actions-role
aws-region: us-east-1Register a new partition after uploading data to S3:
- name: Add partition
uses: predictr-io/aws-glue-partition-manager@v1
with:
operation: 'add'
database: 'analytics'
table: 'events'
partition-values: 'date=2025-11-24'
s3-bucket: 'data-lake'
s3-prefix: 'raw/events/date=2025-11-24'For tables with multiple partition keys:
- name: Add partition with multiple keys
uses: predictr-io/aws-glue-partition-manager@v1
with:
operation: 'add'
database: 'analytics'
table: 'events'
partition-values: 'year=2025;month=11;day=24'
s3-bucket: 'data-lake'
s3-prefix: 'raw/events/year=2025/month=11/day=24'Conditionally execute steps based on whether a partition exists:
- name: Check if partition exists
id: check
uses: predictr-io/aws-glue-partition-manager@v1
with:
operation: 'exists'
database: 'analytics'
table: 'events'
partition-values: 'date=2025-11-24'
- name: Process only if partition exists
if: steps.check.outputs.exists == 'true'
run: |
echo "Partition found at: ${{ steps.check.outputs.location }}"
echo "Created at: ${{ steps.check.outputs.created-at }}"Remove old partitions for data retention:
- name: Delete old partition
uses: predictr-io/aws-glue-partition-manager@v1
with:
operation: 'delete'
database: 'analytics'
table: 'events'
partition-values: 'date=2024-01-01'Ingest data, add partition, validate:
name: Daily Data Pipeline
on:
schedule:
- cron: '0 2 * * *'
workflow_dispatch:
jobs:
ingest:
runs-on: ubuntu-latest
steps:
- name: Configure AWS
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Download data to S3
uses: predictr-io/url-to-s3@v1
with:
url: 'https://api.example.com/daily-export'
s3-bucket: 'data-lake'
s3-key: 'raw/events/date=${{ env.DATE }}/data.json'
- name: Register partition in Glue
id: add-partition
uses: predictr-io/aws-glue-partition-manager@v1
with:
operation: 'add'
database: 'analytics'
table: 'events_raw'
partition-values: 'date=${{ env.DATE }}'
s3-bucket: 'data-lake'
s3-prefix: 'raw/events/date=${{ env.DATE }}'
- name: Verify partition
run: |
echo "Partition added: ${{ steps.add-partition.outputs.exists }}"
echo "Location: ${{ steps.add-partition.outputs.location }}"| Input | Description |
|---|---|
operation |
Operation to perform: add, delete, or exists |
database |
Glue database name |
table |
Glue table name |
partition-values |
Partition values as key=value pairs (semicolon-separated for multiple keys) |
| Input | Description |
|---|---|
s3-bucket |
S3 bucket name for partition location |
s3-prefix |
S3 prefix/path for partition location |
| Input | Description | Default |
|---|---|---|
catalog-id |
AWS account ID for cross-account catalog access | Current account |
if-not-exists |
For "add" operation: skip if partition exists without error | true |
storage-descriptor |
Custom storage descriptor as JSON (overrides table inheritance) | Inherits from table |
| Output | Description |
|---|---|
success |
Whether the operation succeeded (true/false) |
exists |
Whether the partition exists after operation |
partition-values |
Partition values that were operated on (semicolon-separated) |
location |
S3 location of the partition (for "add" and "exists" operations) |
created-at |
Partition creation timestamp ISO 8601 format (for "exists" operation when found) |
partition-values: 'date=2025-11-24'partition-values: 'year=2025;month=11;day=24'Partition keys must match the table's partition key order.
By default, the action inherits the storage descriptor (schema, SerDe, input/output formats) from the parent table. This is the recommended approach for most use cases.
For advanced use cases, provide a custom storage descriptor:
- name: Add partition with custom storage descriptor
uses: predictr-io/aws-glue-partition-manager@v1
with:
operation: 'add'
database: 'analytics'
table: 'events'
partition-values: 'date=2025-11-24'
s3-bucket: 'data-lake'
s3-prefix: 'raw/events/date=2025-11-24'
storage-descriptor: |
{
"Columns": [
{"Name": "event_id", "Type": "string"},
{"Name": "timestamp", "Type": "bigint"}
],
"InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"SerdeInfo": {
"SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe"
}
}Access Glue catalogs in different AWS accounts:
- name: Add partition to cross-account catalog
uses: predictr-io/aws-glue-partition-manager@v1
with:
operation: 'add'
database: 'shared_analytics'
table: 'events'
partition-values: 'date=2025-11-24'
s3-bucket: 'cross-account-bucket'
s3-prefix: 'data/events/date=2025-11-24'
catalog-id: '987654321098'Ensure your IAM role has the necessary cross-account permissions.
The action handles common scenarios gracefully:
- Add existing partition with
if-not-exists=true: Succeeds without error - Delete non-existent partition: Succeeds without error
- Check non-existent partition: Returns
exists=falsewithout error - Missing S3 location for "add" operation: Fails with clear error message
- Invalid partition format: Fails with validation error
- AWS permission errors: Fails with AWS SDK error message
Clone and install dependencies:
git clone https://github.com/predictr-io/aws-glue-partition-manager.git
cd aws-glue-partition-manager
npm install# Build the action (compile TypeScript + bundle with dependencies)
npm run build
# Run TypeScript type checking
npm run type-check
# Run ESLint
npm run lint
# Run all checks (type-check + lint)
npm run checkThe build process uses @vercel/ncc to compile TypeScript and bundle all dependencies into a single dist/index.js file:
npm run buildOutput:
dist/index.js- Bundled action (2MB, includes AWS SDK)dist/index.js.map- Source map for debuggingdist/licenses.txt- License information for bundled dependencies
Important: The dist/ directory must be committed to git. GitHub Actions runs the compiled code directly from the repository.
- Edit source files in
src/ - Run checks to validate:
npm run check
- Build to update
dist/:npm run build
- Test locally (optional) - Use act or create a test workflow
- Commit everything including
dist/:git add src/ dist/ git commit -m "Description of changes"
Follow these steps to create a new release:
# Make your changes to src/
# Run checks
npm run check
# Build
npm run build
# Commit source and dist/
git add .
git commit -m "Add new feature"
git push origin main# Create annotated tag (use semantic versioning)
git tag -a v0.2.0 -m "Release v0.2.0: Description of changes"
# Push tag to trigger release workflow
git push origin v0.2.0GitHub Actions automatically:
- ✓ Verifies
dist/is committed - ✓ Verifies
dist/is up-to-date with source - ✓ Creates GitHub Release with auto-generated notes
- ✓ Updates major version tag (e.g.,
v0→v0.2.0)
Users can reference the action:
- Recommended:
predictr-io/aws-glue-partition-manager@v0(floating major version, gets updates) - Pinned:
predictr-io/aws-glue-partition-manager@v0.2.0(specific version, never changes)
Release workflow fails with "dist/ is out of date":
npm run build
git add dist/
git commit -m "Update dist/ for release"
git tag -f v0.2.0 # Re-tag
git push -f origin v0.2.0ESLint errors:
npm run lint # See errors
# Fix issues, then:
npm run check # Verify all checks passTypeScript errors:
npm run type-check # See type errorsMIT
Contributions welcome! Please submit a Pull Request.