Skip to content

Claude Code skill for BigQuery CLI (bq) - includes cost estimation, partitioning strategies, data loading/export, and best practices

License

Notifications You must be signed in to change notification settings

leweii/bigquery-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BigQuery CLI Skill for Claude

A comprehensive Claude Code skill for working with Google BigQuery via the command line using the bq tool.

Overview

This skill provides complete guidance for using the BigQuery CLI (bq) effectively, including:

  • Query Operations - Execute SQL, estimate costs with dry-run, parameterized queries
  • Data Loading - Load from CSV/JSON/Avro/Parquet with schema management
  • Data Export - Export tables to GCS with compression and sharding
  • Resource Management - Create/list/delete datasets, tables, views
  • Cost Optimization - Always dry-run first, partitioning strategies, clustering
  • Schema Design - Inline and JSON formats, types, nested records
  • Best Practices - Partitioning, clustering, cost controls, performance tips

Installation

For Claude Code Users

Copy the skill to your Claude skills directory:

mkdir -p ~/.claude/skills/bigquery-cli
cp SKILL.md ~/.claude/skills/bigquery-cli/

The skill will be automatically available in your next Claude Code session.

Prerequisites

You need the BigQuery CLI (bq) tool installed as part of Google Cloud SDK:

# Install Google Cloud SDK
# https://cloud.google.com/sdk/docs/install

# Verify installation
bq version

# Authenticate
gcloud auth login

# Set default project
gcloud config set project YOUR_PROJECT_ID

What This Skill Provides

Quick Reference Tables

Complete command reference with flags for:

  • Query operations (query, dry-run, parameters)
  • Data loading (CSV, JSON, Avro, Parquet)
  • Data export (formats, compression, sharding)
  • Resource management (list, show, create, delete)
  • Job management (cancel, show, filter)

Cost Estimation Workflow

# ALWAYS estimate cost first
bq query --dry_run 'SELECT ...'

# Review bytes to be processed
# Calculate: (bytes / 1TB) Γ— $6.25

# Run query if acceptable
bq query 'SELECT ...'

# OR add safety limit
bq query --maximum_bytes_billed=10000000000000 'SELECT ...'

Partitioning and Clustering

# Create partitioned table for scale
bq mk --table \
  --time_partitioning_type=DAY \
  --time_partitioning_field=event_date \
  --clustering_fields=user_id,region \
  --require_partition_filter=true \
  ds.events \
  schema.json

Common Workflows

  • Cost estimation β†’ Query execution
  • Data loading pipeline with validation
  • Large table export with sharding
  • Partitioned table creation for billions of rows

Best Practices

  1. Always use --dry_run for queries scanning >1TB
  2. Partition large tables by date/timestamp
  3. Cluster by common filter columns
  4. Use wildcards for large exports (>1GB limit)
  5. Compress exports with GZIP or SNAPPY
  6. Check authentication before running commands

Common Mistakes Section

Mistake Why It's Wrong Correct Approach
No dry-run for large queries Unexpected costs Always bq query --dry_run first
SELECT * on huge tables Scans all columns Select only needed columns
Single file for large exports 1GB limit per file Use wildcard: gs://bucket/file_*.json.gz
No partitioning on large tables Expensive full scans Use --time_partitioning_type=DAY
Loading without --skip_leading_rows Header becomes data Use --skip_leading_rows=1 for CSVs

Usage Examples

Query Cost Analysis

# Estimate cost before running
bq query --dry_run 'SELECT user_id, COUNT(*) FROM dataset.events GROUP BY user_id'

# Run with safety limit
bq query --maximum_bytes_billed=5000000000000 'SELECT ...'

Load CSV with Schema

# Autodetect schema (fast)
bq load --autodetect --skip_leading_rows=1 \
  dataset.table \
  gs://bucket/data.csv

# Explicit schema (production)
bq load --skip_leading_rows=1 \
  dataset.table \
  gs://bucket/data.csv \
  user_id:STRING,event_type:STRING,timestamp:TIMESTAMP,data:STRING

Export Large Table

# Export with compression and sharding
bq extract \
  --compression=GZIP \
  --destination_format=NEWLINE_DELIMITED_JSON \
  dataset.large_table \
  'gs://bucket/export_*.json.gz'

Create Partitioned Table

# Partition + cluster for scale
bq mk --table \
  --time_partitioning_type=DAY \
  --time_partitioning_field=event_date \
  --clustering_fields=user_id \
  dataset.events \
  event_id:STRING,user_id:STRING,event_date:DATE,data:JSON

Development Process

This skill was created following Test-Driven Development (TDD) principles for documentation:

RED Phase

  • Tested agents WITHOUT the skill
  • Identified baseline knowledge: strong command syntax
  • Documented gaps: completeness, best practices, common mistakes

GREEN Phase

  • Wrote comprehensive reference skill
  • Added quick reference tables
  • Included best practices and workflows
  • Created common mistakes section

REFACTOR Phase

  • Verified skill improves completeness
  • No loopholes found (reference skill)
  • Ready for deployment

See the /docs directory for detailed testing documentation.

Key Features

πŸ” Authentication Workflow

Check authentication and project setup before running commands.

πŸ’° Cost Estimation

Always dry-run queries first to estimate costs before execution.

πŸ“Š Quick Reference Tables

Comprehensive command reference with all flags and options.

⚑ Partitioning & Clustering

Strategies for optimizing large tables (billions of rows).

πŸ“¦ Data Loading & Export

Complete guidance for CSV, JSON, Avro, Parquet with compression.

🚨 Common Mistakes

Red flags table to catch issues before they happen.

πŸ“ˆ Best Practices

Cost optimization, performance tuning, schema design tips.

Real-World Impact

Cost savings:

  • Dry-run prevents accidental multi-thousand dollar queries
  • Partitioning reduces scan costs by 10-100x
  • Clustering adds 20-40% additional savings

Performance:

  • Partitioning + clustering: queries 10-100x faster
  • Proper schema: faster loads and queries
  • Columnar formats: 5-10x faster loads than CSV

Contributing

Contributions are welcome! If you find issues or have suggestions:

  1. Test the change following TDD principles
  2. Document any new baseline failures
  3. Update the skill to address them
  4. Verify improvements
  5. Submit a pull request

License

MIT License - feel free to use and modify as needed.

Author

Created by Jakob He

Related

Version History

v1.0.0 (2026-01-20)

  • Initial release
  • Complete command reference
  • Cost estimation workflow
  • Partitioning and clustering guide
  • Best practices and common mistakes
  • Schema format documentation
  • Real-world examples

About

Claude Code skill for BigQuery CLI (bq) - includes cost estimation, partitioning strategies, data loading/export, and best practices

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •