AWS Athena Query Action

GitHub Action to execute SQL queries on AWS Athena and fetch results with automatic LIMIT clause enforcement for safety.

Features

Execute SQL queries on AWS Athena
Automatic LIMIT clause enforcement on SELECT queries (default: 1000 rows)
Fetch and return query results as JSON
Wait for query completion with configurable timeout
Comprehensive query statistics (data scanned, execution time)
Support for multiple catalogs and workgroups

Usage

- name: Query Athena
  uses: predictr-io/aws-athena-query@v0
  with:
    query: 'SELECT * FROM my_table WHERE date = current_date'
    database: 'my_database'
    max-rows: 100

Authentication

This action requires AWS credentials to be configured. Use the official AWS configure credentials action:

- uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
    aws-region: us-east-1

- uses: predictr-io/aws-athena-query@v0
  with:
    query: 'SELECT * FROM my_table'
    database: 'my_database'

Inputs

Input	Required	Default	Description
`query`	Yes	-	SQL query to execute (LIMIT enforced on SELECT)
`database`	Yes	-	Athena database name
`workgroup`	No	`primary`	Athena workgroup name
`output-location`	No	-	S3 location for query results (s3://bucket/path/)
`max-rows`	No	`1000`	Maximum rows to return (enforced via LIMIT)
`catalog`	No	`AwsDataCatalog`	Data catalog name
`wait-timeout-seconds`	No	`300`	Max time to wait for query completion

Outputs

Output	Description
`query-execution-id`	Athena query execution ID
`state`	Final query state (SUCCEEDED, FAILED, CANCELLED)
`data-scanned-bytes`	Amount of data scanned by the query in bytes
`execution-time-ms`	Query execution time in milliseconds
`results`	Query results as JSON array of objects
`result-count`	Number of rows returned

LIMIT Clause Enforcement

For safety, this action automatically enforces a LIMIT clause on SELECT queries:

Automatic LIMIT Addition

-- Input query
SELECT * FROM users WHERE active = true

-- Executed query (with max-rows: 1000)
SELECT * FROM users WHERE active = true LIMIT 1000

LIMIT Override Protection

-- Input query with LIMIT 5000
SELECT * FROM users LIMIT 5000

-- Executed query (max-rows: 1000 enforced)
SELECT * FROM users LIMIT 1000

Non-SELECT Queries

CREATE, DROP, INSERT, and other non-SELECT queries are not modified:

-- These queries are executed as-is (no LIMIT added)
CREATE TABLE my_table AS SELECT * FROM source
DROP TABLE old_table
INSERT INTO table VALUES (...)

Examples

Basic Query

- name: Query recent events
  id: query
  uses: predictr-io/aws-athena-query@v0
  with:
    query: |
      SELECT event_type, COUNT(*) as count
      FROM events
      WHERE date = current_date
      GROUP BY event_type
    database: 'analytics'
    max-rows: 100

- name: Process results
  run: |
    echo "Query scanned: ${{ steps.query.outputs.data-scanned-bytes }} bytes"
    echo "Results: ${{ steps.query.outputs.results }}"

Query with Custom Workgroup

- name: Query with custom workgroup
  uses: predictr-io/aws-athena-query@v0
  with:
    query: 'SELECT * FROM large_table WHERE partition = "2024-01-01"'
    database: 'production'
    workgroup: 'high-memory-workgroup'
    output-location: 's3://my-bucket/athena-results/'
    max-rows: 5000
    wait-timeout-seconds: 600

Process Results in Workflow

- name: Query user data
  id: query
  uses: predictr-io/aws-athena-query@v0
  with:
    query: 'SELECT user_id, email FROM users WHERE status = "active"'
    database: 'users_db'

- name: Parse and use results
  run: |
    echo '${{ steps.query.outputs.results }}' > results.json

    # Process with jq
    cat results.json | jq '.[] | select(.email | contains("@example.com"))'

    # Count results
    echo "Found ${{ steps.query.outputs.result-count }} active users"

Query Multiple Tables

- name: Query customer orders
  uses: predictr-io/aws-athena-query@v0
  with:
    query: |
      SELECT
        c.customer_id,
        c.name,
        COUNT(o.order_id) as order_count,
        SUM(o.total) as total_spent
      FROM customers c
      LEFT JOIN orders o ON c.customer_id = o.customer_id
      WHERE o.order_date >= current_date - interval '30' day
      GROUP BY c.customer_id, c.name
      ORDER BY total_spent DESC
    database: 'sales'
    max-rows: 500

Cross-Catalog Query

- name: Query external data catalog
  uses: predictr-io/aws-athena-query@v0
  with:
    query: 'SELECT * FROM external_table'
    database: 'external_db'
    catalog: 'MyDataCatalog'
    workgroup: 'external-workgroup'

DDL Operations (No LIMIT Enforced)

- name: Create table from query
  uses: predictr-io/aws-athena-query@v0
  with:
    query: |
      CREATE TABLE daily_summary AS
      SELECT
        date,
        category,
        COUNT(*) as event_count
      FROM events
      WHERE date = current_date
      GROUP BY date, category
    database: 'analytics'

Error Handling

The action provides detailed error information:

- name: Query with error handling
  id: query
  continue-on-error: true
  uses: predictr-io/aws-athena-query@v0
  with:
    query: 'SELECT * FROM possibly_missing_table'
    database: 'test_db'

- name: Handle query failure
  if: steps.query.outcome == 'failure'
  run: |
    echo "Query failed"
    echo "Check the logs for detailed error information"

Query Statistics

The action provides useful statistics about query execution:

- name: Query with statistics
  id: query
  uses: predictr-io/aws-athena-query@v0
  with:
    query: 'SELECT * FROM large_table'
    database: 'data_warehouse'

- name: Log statistics
  run: |
    echo "Data scanned: ${{ steps.query.outputs.data-scanned-bytes }} bytes"
    echo "Execution time: ${{ steps.query.outputs.execution-time-ms }} ms"
    echo "Rows returned: ${{ steps.query.outputs.result-count }}"
    echo "Query ID: ${{ steps.query.outputs.query-execution-id }}"

Best Practices

Use Partitions: Always filter on partition columns to minimize data scanned
```
SELECT * FROM events WHERE date = '2024-01-01' AND hour = '12'
```

Limit Columns: Select only the columns you need

SELECT user_id, name FROM users  -- Good
SELECT * FROM users              -- Avoid

Set Appropriate max-rows: Set a reasonable limit for your use case

max-rows: 100  # For small result sets
max-rows: 10000  # For larger exports (be cautious of GitHub Actions limits)

Use Workgroups: Configure workgroups with output locations and cost controls

workgroup: 'cost-controlled-workgroup'
output-location: 's3://query-results-bucket/path/'

Handle Timeouts: Set appropriate timeouts for complex queries

wait-timeout-seconds: 600  # 10 minutes for complex queries

Limitations

Maximum result size is limited by GitHub Actions' output size limits (~1MB)
For very large result sets, use Athena's S3 output directly instead
Query results are stored in memory; very large result sets may cause issues
The max-rows parameter caps results at the specified limit

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
dist		dist
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AWS Athena Query Action

Features

Usage

Authentication

Inputs

Outputs

LIMIT Clause Enforcement

Automatic LIMIT Addition

LIMIT Override Protection

Non-SELECT Queries

Examples

Basic Query

Query with Custom Workgroup

Process Results in Workflow

Query Multiple Tables

Cross-Catalog Query

DDL Operations (No LIMIT Enforced)

Error Handling

Query Statistics

Best Practices

Limitations

License

About

Uh oh!

Releases 3

Packages

Contributors 2

Uh oh!

Languages

License

predictr-io/aws-athena-query

Folders and files

Latest commit

History

Repository files navigation

AWS Athena Query Action

Features

Usage

Authentication

Inputs

Outputs

LIMIT Clause Enforcement

Automatic LIMIT Addition

LIMIT Override Protection

Non-SELECT Queries

Examples

Basic Query

Query with Custom Workgroup

Process Results in Workflow

Query Multiple Tables

Cross-Catalog Query

DDL Operations (No LIMIT Enforced)

Error Handling

Query Statistics

Best Practices

Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Uh oh!

Languages

Packages