GitHub Action to create or update AWS Glue Data Catalog tables using JSON metadata.
- Create new Glue tables or update existing ones
- Accept full table metadata as JSON (TableInput format)
- Automatic table existence detection
- Support for cross-account catalog access
- Comprehensive error reporting
- name: Create Glue table
uses: predictr-io/aws-glue-create-table@v0
with:
database-name: 'my_database'
table-name: 'my_table'
table-input: |
{
"Name": "my_table",
"StorageDescriptor": {
"Columns": [
{"Name": "id", "Type": "bigint"},
{"Name": "name", "Type": "string"},
{"Name": "timestamp", "Type": "timestamp"}
],
"Location": "s3://my-bucket/my-data/",
"InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"Parameters": {
"field.delim": ","
}
}
},
"PartitionKeys": [
{"Name": "year", "Type": "string"},
{"Name": "month", "Type": "string"}
]
}This action requires AWS credentials to be configured. Use the official AWS configure credentials action:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
aws-region: us-east-1
- uses: predictr-io/aws-glue-create-table@v0
with:
database-name: 'my_database'
table-name: 'my_table'
table-input: '{"Name": "my_table", ...}'| Input | Required | Default | Description |
|---|---|---|---|
database-name |
Yes | - | Name of the Glue database |
table-name |
Yes | - | Name of the table to create/update |
table-input |
Yes | - | Table metadata as JSON (TableInput object) |
catalog-id |
No | current account | AWS account ID for cross-account access |
| Output | Description |
|---|---|
table-name |
Name of the created/updated table |
database-name |
Name of the database containing the table |
table-arn |
ARN of the created/updated table |
The table-input must be a valid JSON object matching the AWS Glue TableInput structure. See AWS Glue TableInput documentation for full details.
{
"Name": "my_table",
"StorageDescriptor": {
"Columns": [
{"Name": "col1", "Type": "string"},
{"Name": "col2", "Type": "int"}
],
"Location": "s3://my-bucket/data/",
"InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"Parameters": {"field.delim": ","}
}
}
}{
"Name": "parquet_table",
"StorageDescriptor": {
"Columns": [
{"Name": "id", "Type": "bigint"},
{"Name": "value", "Type": "double"}
],
"Location": "s3://my-bucket/parquet-data/",
"InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
}
}
}- uses: predictr-io/aws-glue-create-table@v0
with:
database-name: 'analytics'
table-name: 'events'
table-input: |
{
"Name": "events",
"StorageDescriptor": {
"Columns": [
{"Name": "event_id", "Type": "string"},
{"Name": "user_id", "Type": "string"},
{"Name": "event_time", "Type": "timestamp"}
],
"Location": "s3://my-bucket/events/",
"InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
}
},
"PartitionKeys": [
{"Name": "date", "Type": "string"}
]
}- name: Prepare table metadata
run: |
cat > table.json <<EOF
{
"Name": "my_table",
"StorageDescriptor": {
"Columns": [{"Name": "id", "Type": "bigint"}],
"Location": "s3://my-bucket/data/"
}
}
EOF
- uses: predictr-io/aws-glue-create-table@v0
with:
database-name: 'mydb'
table-name: 'my_table'
table-input: ${{ steps.prep.outputs.table_json }}MIT