BenchBox 0.1.2
Added
- DataFrame mode for all benchmarks — complete DataFrame query implementations across all 18 benchmarks including TPC-DS (102 queries), TPC-H (22 queries), SSB, ClickBench, NYC Taxi, TSBS DevOps, H2ODB, AMPLab, CoffeeShop, TPC-H Skew, and Data Vault. DataFrame platforms: Polars, DuckDB, DataFusion, PySpark, Pandas, Modin, Dask, and cuDF (GPU).
- ASCII chart visualizations — terminal-native ASCII rendering replacing Plotly HTML charts. Seven chart types: performance bar, distribution box, query heatmap, comparison bar, diverging bar, summary box, and query latency histogram. ANSI colors, Unicode box-drawing, and best/worst highlighting.
- 14 new SQL platform adapters — PostgreSQL, Trino, PrestoDB, Apache Spark, AWS Athena, Azure Synapse, Microsoft Fabric, Firebolt, MotherDuck, InfluxDB 3.x, TimescaleDB, ClickHouse Cloud, Onehouse Quanton, and managed Spark variants (EMR, Dataproc, Glue, Fabric Spark, Synapse Spark, Dataproc Serverless).
- Open table format support — Delta Lake, Apache Iceberg, Apache Hudi, DuckLake, and Vortex columnar format with format conversion orchestration and manifest v2 for multi-format tracking.
- Physical tuning DDL generation — platform-specific DDL generators for DuckDB, Snowflake, Redshift, BigQuery, ClickHouse, Firebolt, PostgreSQL, TimescaleDB, Trino/Presto/Athena, and Spark family with sort keys, partitioning, clustering, and compression.
- Query plan capture and comparison — plan parsers for DuckDB, PostgreSQL, Redshift, DataFusion, and SQLite. Comparison engine with regression detection, fingerprinting, historical tracking with flapping detection, and CLI visualization.
- Interactive CLI wizard — guided benchmark configuration with platform selection, tuning wizard, scale factor validation, phase/query selection, onboarding, and persistent preferences.
- TPC-DI benchmark — complete implementation across 4 phases: core schema, query suite, ETL pipeline, and validation/testing.
- Cross-platform comparison engine —
benchbox comparecommand with multi-platform analysis, SQL vs DataFrame comparison, and unified visualization. - Unified tuning configuration — YAML-based tuning system with per-platform DDL generation, write-time physical layout configuration, and dry-run preview.
- Cloud storage and deployment modes for S3/GCS/ADLS/DBFS with credential setup wizard and cost estimation.
- TPC compliance improvements: stream-aware validation, query permutations, warmup/measurement iterations, maintenance operations (RF1/RF2), and
--seedfor reproducibility. - New benchmarks: AI/ML Primitives, Metadata Primitives, Write Primitives, Transaction Primitives.
- MCP server:
suggest_chartsandgenerate_charttools;--queriesand--validation-modeCLI flags; tiered--help.
Fixed
- TPC-DS data generation reliability — segfaults with fractional scale factors, parallel generation errors, streaming compression, and chunked file handling.
- Cloud platform stability — credential refresh errors, schema creation ordering, UC Volume uploads, S3 key handling, and BigQuery/Snowflake/Redshift/Databricks adapter issues.
- Type safety — 150+ type errors resolved across production code with proper annotations and TYPE_CHECKING imports.
- SQL dialect translation — SQLGlot compatibility for DuckDB, ClickHouse, DataFusion, and Netezza; reserved keyword quoting and identifier case sensitivity.
- Security hardening: SQL injection prevention, parameterized queries, path traversal protection.
- CLI hanging in non-interactive mode, progress display precision,
--quietmode propagation. - TPC compliance: correct stream permutations, maintenance phase SQL execution, Power@Size calculation parity between SQL and DataFrame modes.
Changed
- Dropped Plotly HTML charts in favor of ASCII-only rendering.
- Lazy-load cloud platform adapters to speed up CLI startup and test suite.
- Optimized TPC-DS smoke tests with selective table generation.
Full Changelog: https://github.com/joeharris76/BenchBox/blob/main/CHANGELOG.md#012---2026-02-09