Run SQL queries against Delta Lake tables in AWS Glue Data Catalog with DataFusion.
Query your Glue-registered Delta tables without Spark. Fast, lightweight, Rust-native.
# Run a query
cargo run --release --bin sql-runner -- \
--file query.sql \
--database my_database
# With debug tracing
RUST_LOG=glue_query_runner=debug cargo run --release --bin sql-runner -- \
--file query.sql \
--database my_database- Builder API - Ergonomic, fluent configuration
- Automatic caching - Table metadata cached with TTL (5min default)
- Structured tracing - Optional debug logging with
RUST_LOG - Batch processing - Process entire folders of SQL files
- Query plans -
--explainflag for optimization insights
use glue_query_runner::GlueContextExt;
use datafusion::prelude::*;
let aws_config = aws_config::load_defaults(aws_config::BehaviorVersion::latest()).await;
let ctx = SessionContext::new()
.with_glue_database(&aws_config, "my_database")
.await?;
let df = ctx.sql("SELECT * FROM my_table LIMIT 10").await?;
df.show().await?;use glue_query_runner::{GlueSchemaProvider, catalog::GlueCatalog};
use std::sync::Arc;
let glue_catalog = Arc::new(GlueCatalog::new(&aws_config));
let schema = GlueSchemaProvider::builder("my_database")
.catalog(glue_catalog)
.cache_ttl(Duration::from_secs(600)) // 10 min cache
.build()
.await?;# Single file
--file query.sql --database my_db
# Batch folder
--input-folder ./queries --output-folder ./results --database my_db
# With execution plans
--file query.sql --database my_db --explainQueries separated by semicolons, comments with --:
-- Get top customers
SELECT customer_id, SUM(amount) as total
FROM orders
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;
-- Daily revenue
SELECT DATE(order_date), SUM(amount)
FROM orders
GROUP BY DATE(order_date);[features]
default = ["tracing", "cache"]
tracing = ["dep:tracing", "dep:tracing-subscriber"]
cache = ["dep:moka"]Disable with --no-default-features.
RUST_LOG=glue_query_runner=debug cargo run --release -- --file query.sqlOutput:
DEBUG Fetching table list from Glue catalog
INFO Retrieved table names from Glue count=100
DEBUG Table cache miss
INFO Resolved table location location=s3://bucket/table
INFO Delta table loaded files=21
DEBUG Table cached
DEBUG Table cache hit # <-- Second query instant!
- GlueCatalog - Fetches table locations from AWS Glue
- GlueSchemaProvider - Lazy-loads Delta tables, caches results
- DataFusion - Executes SQL with full query optimization
- Moka cache - In-memory TTL cache for table metadata
Uses default AWS credential chain:
export AWS_REGION=us-east-1
export AWS_PROFILE=your_profile
# or AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY- First query: ~3s (Glue lookup + Delta load)
- Cached query: ~instant (cache hit)
- Cache TTL: 5 minutes (configurable)
deltalake 0.28- Delta Lake with S3datafusion 49.0- Query engineaws-sdk-glue 1.73- Glue APImoka 0.12- Async cache (optional)tracing 0.1- Structured logging (optional)
Apache-2.0