Your AI-powered data analyst for AWS Redshift. Explore schemas, run queries, generate DDL, profile data — all read-only, all cross-platform, zero pip install. Works with both provisioned clusters and Redshift Serverless.
Works with any AI coding agent — Claude Code, Cursor, Codex, and more.
🛡️ Read-only 🖥️ Mac + Windows 📦 Zero dependencies 🔌 Any AI agent
- ✨ What can it do?
- 🚀 Quick Start
- 📖 Scripts
- 🔄 Recommended Workflow
- 📈 Business Analysis Patterns
- 🛡️ Safety
- 📂 Output & File Saving
- ⚙️ Configuration
- 🧰 Prerequisites
- 🔐 Security & Connection
- 📜 License
mindmap
root((Redshift Skill))
🔍 Explore
List schemas
Browse tables
Search objects
View columns
📊 Query
Run SQL
Sample data
Profile columns
Local analytics
🏗️ DDL
Tables
Views
Schemas
UDFs
External tables
📈 Analyze
Trend analysis
Root cause
Cohort analysis
Top/Bottom N
npx skills add onsen-ai/redshift-skillOr install globally:
npx skills add onsen-ai/redshift-skill -gSee vercel-labs/skills for more install options.
Run the interactive wizard in your terminal:
python3 scripts/setup.py # macOS / Linux
python scripts/setup.py # WindowsThe wizard walks you through:
flowchart LR
A[🔑 Pick AWS Profile] --> B{Provisioned or Serverless?}
B -->|Provisioned| C[🔍 Discover Clusters]
B -->|Serverless| D[🔍 Discover Workgroups]
C --> E[🗄️ Choose Database]
D --> E
E --> F[✅ Test Connection]
F --> G[💾 Save Config]
Config is saved to ~/.redshift-skill/config.json — re-run anytime to change settings.
python3 scripts/query.py "SELECT count(1) FROM sales.fact_orders"That's it. All scripts auto-detect your saved connection. 🎉
| Script | What it does | Example |
|---|---|---|
schemas.py |
List all schemas and owners | schemas.py |
tables.py |
Browse tables with row counts & sizes | tables.py --schema=sales |
columns.py |
Column types, encoding, dist/sort keys | columns.py --schema=sales --table=fact_orders |
search.py |
Find tables/columns by name pattern | search.py --pattern=revenue |
sample.py |
Peek at actual data values | sample.py --schema=sales --table=dim_products --limit=5 |
| Script | What it does | Example |
|---|---|---|
query.py |
Run any read-only SQL | query.py "SELECT ..." or query.py --sql-file=my.sql |
profile.py |
Per-column stats (nulls, cardinality, min/max) | profile.py --schema=sales --table=dim_customers |
analyze.py |
Local analytics on saved files — no Redshift needed | analyze.py data.csv --describe |
analyze.py data.csv --describe # Per-column statistics
analyze.py data.csv --sum=revenue # Sum a column
analyze.py data.csv --group-by=region --avg=sales # Group by + aggregate
analyze.py data.csv --filter='year=2024' --top=10 # Filter + top N
analyze.py data.csv --hist=price # Text histogramGenerate CREATE statements for 7 object types — full DDL with distkeys, sortkeys, encoding, constraints, and ownership:
| Type | Example |
|---|---|
| 📋 Table | ddl.py --schema=sales --name=fact_orders |
| 👁️ View | ddl.py --type=view --schema=sales --name=v_daily_revenue |
| 📁 Schema | ddl.py --type=schema --name=sales |
| 🗄️ Database | ddl.py --type=database |
| ⚙️ UDF | ddl.py --type=udf --schema=public |
| 🌐 External | ddl.py --type=external --schema=spectrum |
| 👥 Group | ddl.py --type=group |
| Script | What it does | Example |
|---|---|---|
table_info.py |
Size, rows, skew, encoding, sort key stats | table_info.py --schema=sales --table=fact_orders |
space.py |
Largest tables by disk usage | space.py --schema=sales --top=10 |
flowchart TD
A[🔍 schemas.py] -->|Pick a schema| B[📋 tables.py]
B -->|Check row counts & sizes| C{Large table?}
C -->|< 10K rows| D[📊 query.py SELECT *]
C -->|> 10K rows| E[🔬 columns.py + sample.py]
E --> F[📈 Write targeted query]
F --> G[💾 Results auto-saved]
G --> H[🧮 analyze.py locally]
style A fill:#e1f5fe
style B fill:#e1f5fe
style D fill:#c8e6c9
style E fill:#fff9c4
style F fill:#c8e6c9
style G fill:#f3e5f5
style H fill:#f3e5f5
💡 Don't follow this rigidly! If the user knows exactly what they want, skip straight to the query. This is a guide for unfamiliar schemas, not a mandatory checklist.
The skill is built for real-world analyst work — product analysts, commercial analysts, BI developers.
| Pattern | Approach |
|---|---|
| 📊 Trend analysis | GROUP BY month + SUM/COUNT, compare YoY/MoM |
| 👥 Cohort analysis | Group by first purchase date, track retention |
| 📉 Root cause | Decompose metric → slice by dimensions → drill into outliers |
| 🏆 Top/Bottom N | ORDER BY metric DESC LIMIT N |
| 📅 YoY comparison | Self-join shifted by 1 year, or LAG() window function |
| 🔢 Distribution | NTILE(100), percentiles, or analyze.py --hist locally |
| 🎯 Pareto (80/20) | Cumulative SUM() OVER (ORDER BY ...) |
| 🧩 Segmentation | CASE WHEN or NTILE to bucket, then profile each segment |
Two layers of protection — the AI agent validates SQL before sending, and the script itself hard-blocks anything that isn't read-only:
✅ Allowed: SELECT · WITH · SHOW · DESCRIBE · EXPLAIN · SET
❌ Blocked: CREATE · ALTER · DROP · INSERT · UPDATE · DELETE · MERGE · COPY · UNLOAD · GRANT · REVOKE
Multi-statement queries (; followed by another statement) are also blocked.
| Table size | Approach |
|---|---|
| < 10K rows | Explore freely, SELECT * is fine |
| 10K – 1M rows | Add WHERE or LIMIT |
| > 1M rows | Always aggregate or filter, never SELECT * |
All results are automatically saved to ~/redshift-exports/:
- 📄 First 20 rows shown inline (quick preview)
- 💾 Full results saved to file (for follow-up with
analyze.py) - 📍 File path printed so the agent can read it for deeper analysis
--format=txt|csv|json # Output format (default: txt)
--save=PATH # Save to specific location
--no-save # Skip auto-saveProvisioned cluster:
// ~/.redshift-skill/config.json
{
"profile": "my-profile",
"cluster": "my-cluster",
"database": "my-database",
"db_user": "admin",
"region": "us-east-1"
}Serverless workgroup:
// ~/.redshift-skill/config.json
{
"profile": "my-profile",
"workgroup": "my-workgroup",
"database": "my-database",
"region": "us-east-1"
}Serverless doesn't need
db_user— the Data API authenticates as your IAM identity directly.
Edit directly or re-run python3 scripts/setup.py.
- Python 3.8+ — stdlib only, no pip packages needed
- AWS CLI v2 — with a profile that has Redshift Data API access
- Redshift Data API access — no explicit enable toggle needed, but your cluster's security group must allow inbound connections from the Data API service. For provisioned clusters, the
--db-usermust already exist in the database. For serverless, your IAM identity is used directly.
💡 On macOS use
python3, on Windows usepython. The setup wizard saves your Python path so the agent uses the right one automatically.
The skill connects to Redshift entirely through IAM — your AWS CLI profile handles authentication. No database passwords, access keys, or secrets are stored anywhere. The config file (~/.redshift-skill/config.json) contains only connection metadata (cluster name, database, region), never credentials.
sequenceDiagram
participant Skill as Redshift Skill
participant CLI as AWS CLI
participant API as Redshift Data API
participant RS as Redshift
Skill->>CLI: execute-statement (--profile)
Note over CLI: Provisioned: --cluster-identifier + --db-user<br/>Serverless: --workgroup-name
CLI->>API: IAM-authenticated request
API->>API: Generate temporary DB credentials
API->>RS: Connect & execute SQL
RS-->>API: Results
API-->>CLI: Statement ID
Skill->>CLI: get-statement-result
CLI-->>Skill: Query results
The Redshift Data API generates temporary database credentials internally — you never call GetClusterCredentials or manage tokens yourself.
The AWS CLI profile used by the skill needs the following permissions:
Provisioned clusters:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"redshift-data:ExecuteStatement",
"redshift-data:DescribeStatement",
"redshift-data:GetStatementResult"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "redshift:DescribeClusters",
"Resource": "*"
}
]
}Serverless workgroups:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"redshift-data:ExecuteStatement",
"redshift-data:DescribeStatement",
"redshift-data:GetStatementResult"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "redshift-serverless:ListWorkgroups",
"Resource": "*"
}
]
}| Permission | Mode | Purpose |
|---|---|---|
redshift-data:ExecuteStatement |
Both | Submit SQL queries |
redshift-data:DescribeStatement |
Both | Poll query execution status |
redshift-data:GetStatementResult |
Both | Fetch paginated results |
redshift:DescribeClusters |
Provisioned | Discover clusters during setup |
redshift-serverless:ListWorkgroups |
Serverless | Discover workgroups during setup |
With provisioned clusters, the --db-user parameter allows the IAM principal to connect as any database user. To restrict this, scope your IAM policy with a condition:
{
"Effect": "Allow",
"Action": "redshift-data:ExecuteStatement",
"Resource": "arn:aws:redshift:<region>:<account>:cluster:<cluster-id>",
"Condition": {
"StringEquals": {
"redshift:DbUser": ["analyst_readonly"]
}
}
}Serverless doesn't use
--db-user— it authenticates as your IAM identity directly, so this condition doesn't apply.
💡 For defense in depth, combine IAM restrictions with a read-only database user — the skill's application-level SQL validation is the first layer, IAM is the second, and database grants are the third.
Built by the team at Onsen — an AI-powered mental health companion for journaling, emotional wellbeing, and personal growth.
MIT — see LICENSE.
SQL in scripts/sql/ is derived from amazon-redshift-utils (Apache 2.0).