BigQuery CLI Skill for Claude

A comprehensive Claude Code skill for working with Google BigQuery via the command line using the bq tool.

Overview

This skill provides complete guidance for using the BigQuery CLI (bq) effectively, including:

Query Operations - Execute SQL, estimate costs with dry-run, parameterized queries
Data Loading - Load from CSV/JSON/Avro/Parquet with schema management
Data Export - Export tables to GCS with compression and sharding
Resource Management - Create/list/delete datasets, tables, views
Cost Optimization - Always dry-run first, partitioning strategies, clustering
Schema Design - Inline and JSON formats, types, nested records
Best Practices - Partitioning, clustering, cost controls, performance tips

Installation

For Claude Code Users

Copy the skill to your Claude skills directory:

mkdir -p ~/.claude/skills/bigquery-cli
cp SKILL.md ~/.claude/skills/bigquery-cli/

The skill will be automatically available in your next Claude Code session.

Prerequisites

You need the BigQuery CLI (bq) tool installed as part of Google Cloud SDK:

# Install Google Cloud SDK
# https://cloud.google.com/sdk/docs/install

# Verify installation
bq version

# Authenticate
gcloud auth login

# Set default project
gcloud config set project YOUR_PROJECT_ID

What This Skill Provides

Quick Reference Tables

Complete command reference with flags for:

Query operations (query, dry-run, parameters)
Data loading (CSV, JSON, Avro, Parquet)
Data export (formats, compression, sharding)
Resource management (list, show, create, delete)
Job management (cancel, show, filter)

Cost Estimation Workflow

# ALWAYS estimate cost first
bq query --dry_run 'SELECT ...'

# Review bytes to be processed
# Calculate: (bytes / 1TB) × $6.25

# Run query if acceptable
bq query 'SELECT ...'

# OR add safety limit
bq query --maximum_bytes_billed=10000000000000 'SELECT ...'

Partitioning and Clustering

# Create partitioned table for scale
bq mk --table \
  --time_partitioning_type=DAY \
  --time_partitioning_field=event_date \
  --clustering_fields=user_id,region \
  --require_partition_filter=true \
  ds.events \
  schema.json

Common Workflows

Cost estimation → Query execution
Data loading pipeline with validation
Large table export with sharding
Partitioned table creation for billions of rows

Best Practices

Always use --dry_run for queries scanning >1TB
Partition large tables by date/timestamp
Cluster by common filter columns
Use wildcards for large exports (>1GB limit)
Compress exports with GZIP or SNAPPY
Check authentication before running commands

Common Mistakes Section

Mistake	Why It's Wrong	Correct Approach
No dry-run for large queries	Unexpected costs	Always `bq query --dry_run` first
SELECT * on huge tables	Scans all columns	Select only needed columns
Single file for large exports	1GB limit per file	Use wildcard: `gs://bucket/file_*.json.gz`
No partitioning on large tables	Expensive full scans	Use `--time_partitioning_type=DAY`
Loading without --skip_leading_rows	Header becomes data	Use `--skip_leading_rows=1` for CSVs

Usage Examples

Query Cost Analysis

# Estimate cost before running
bq query --dry_run 'SELECT user_id, COUNT(*) FROM dataset.events GROUP BY user_id'

# Run with safety limit
bq query --maximum_bytes_billed=5000000000000 'SELECT ...'

Load CSV with Schema

# Autodetect schema (fast)
bq load --autodetect --skip_leading_rows=1 \
  dataset.table \
  gs://bucket/data.csv

# Explicit schema (production)
bq load --skip_leading_rows=1 \
  dataset.table \
  gs://bucket/data.csv \
  user_id:STRING,event_type:STRING,timestamp:TIMESTAMP,data:STRING

Export Large Table

# Export with compression and sharding
bq extract \
  --compression=GZIP \
  --destination_format=NEWLINE_DELIMITED_JSON \
  dataset.large_table \
  'gs://bucket/export_*.json.gz'

Create Partitioned Table

# Partition + cluster for scale
bq mk --table \
  --time_partitioning_type=DAY \
  --time_partitioning_field=event_date \
  --clustering_fields=user_id \
  dataset.events \
  event_id:STRING,user_id:STRING,event_date:DATE,data:JSON

Development Process

This skill was created following Test-Driven Development (TDD) principles for documentation:

RED Phase

Tested agents WITHOUT the skill
Identified baseline knowledge: strong command syntax
Documented gaps: completeness, best practices, common mistakes

GREEN Phase

Wrote comprehensive reference skill
Added quick reference tables
Included best practices and workflows
Created common mistakes section

REFACTOR Phase

Verified skill improves completeness
No loopholes found (reference skill)
Ready for deployment

See the /docs directory for detailed testing documentation.

Key Features

🔐 Authentication Workflow

Check authentication and project setup before running commands.

💰 Cost Estimation

Always dry-run queries first to estimate costs before execution.

📊 Quick Reference Tables

Comprehensive command reference with all flags and options.

⚡ Partitioning & Clustering

Strategies for optimizing large tables (billions of rows).

📦 Data Loading & Export

Complete guidance for CSV, JSON, Avro, Parquet with compression.

🚨 Common Mistakes

Red flags table to catch issues before they happen.

📈 Best Practices

Cost optimization, performance tuning, schema design tips.

Real-World Impact

Cost savings:

Dry-run prevents accidental multi-thousand dollar queries
Partitioning reduces scan costs by 10-100x
Clustering adds 20-40% additional savings

Performance:

Partitioning + clustering: queries 10-100x faster
Proper schema: faster loads and queries
Columnar formats: 5-10x faster loads than CSV

Contributing

Contributions are welcome! If you find issues or have suggestions:

Test the change following TDD principles
Document any new baseline failures
Update the skill to address them
Verify improvements
Submit a pull request

License

MIT License - feel free to use and modify as needed.

Author

Created by Jakob He

Version History

v1.0.0 (2026-01-20)

Initial release
Complete command reference
Cost estimation workflow
Partitioning and clustering guide
Best practices and common mistakes
Schema format documentation
Real-world examples

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
skill.json		skill.json

License

leweii/bigquery-cli

Folders and files

Latest commit

History

Repository files navigation

BigQuery CLI Skill for Claude

Overview

Installation

For Claude Code Users

Prerequisites

What This Skill Provides

Quick Reference Tables

Cost Estimation Workflow

Partitioning and Clustering

Common Workflows

Best Practices

Common Mistakes Section

Usage Examples

Query Cost Analysis

Load CSV with Schema

Export Large Table

Create Partitioned Table

Development Process

RED Phase

GREEN Phase

REFACTOR Phase

Key Features

🔐 Authentication Workflow

💰 Cost Estimation

📊 Quick Reference Tables

⚡ Partitioning & Clustering

📦 Data Loading & Export

🚨 Common Mistakes

📈 Best Practices

Real-World Impact

Contributing

License

Author

Related

Version History

v1.0.0 (2026-01-20)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Packages