Skip to content

satur-io/php-duckdb-performance

 
 

Repository files navigation

ClickHouse for Symfony Developers

This is a demo project for the "ClickHouse for Symfony Developers" talk presented at SymfonyCon 2025.

Overview

This project demonstrates how to integrate ClickHouse with Symfony applications and compares its performance against traditional relational databases (PostgreSQL, MySQL, MariaDB) and Elasticsearch for analytical workloads.

How Benchmark Commands Work

The benchmark suite provides a systematic way to compare query performance across different database engines. Here's how it works:

1. Setup Phase (benchmark:setup-tables)

Creates identical table schemas across all supported databases:

  • PostgreSQL, MariaDB, MySQL: Creates a test_int_32 table with columns for id (auto-increment), datetime (timestamp), and intValue (integer), along with an index on the datetime column
  • Elasticsearch: Creates an index test_int_32 with mappings for datetime and intValue fields
  • ClickHouse: Creates a test_int_32 table using the MergeTree engine, ordered by datetime

Usage:

bin/console benchmark:setup-tables [--force]

The --force option drops existing tables/indices before creating new ones.

2. Data Generation Phase (benchmark:fill-tables)

Generates and inserts time-series data across all databases:

  • Generates data points evenly distributed across a time range (default: last 24 hours)
  • Each data point contains a timestamp and a random integer value
  • Supports configurable batch sizes for efficient bulk inserts
  • Measures insertion performance (duration and points/second) for each database

Performance optimizations:

  • SQL databases: Drops indexes before insertion, uses multi-value INSERT statements with transactions, then recreates indexes
  • Elasticsearch: Uses the Bulk API with parallel requests (up to 100 concurrent requests)
  • ClickHouse: Uses JSONEachRow format for efficient bulk inserts with parallel requests

Usage:

bin/console benchmark:fill-tables \
  --datetime-from="-1 day" \
  --quantity=1_000_000 \
  --batch-size=1_000 \
  --db=clickhouse,postgresql

Options:

  • --datetime-from: Start datetime (e.g., "-1 day", "-2 hours", "2025-11-12 14:30:00")
  • --quantity: Number of data points to insert (supports underscore formatting, max 10,000,000)
  • --batch-size: Number of rows per batch (supports underscore formatting)
  • --db: Specific database(s) to fill (comma-separated). Options: postgresql, mariadb, mysql, elasticsearch, clickhouse

3. Query Benchmarks

Simple Aggregation (benchmark:query-avg)

Tests basic aggregation performance by calculating the average of intValue across all databases:

  • Executes AVG() and COUNT() queries with a date range filter
  • Measures query execution time in milliseconds
  • Compares performance across all engines

Usage:

bin/console benchmark:query-avg --datetime-from="-1 day"

Windowed Aggregation (benchmark:query-avg-by-window)

Tests time-series aggregation by calculating averages grouped by time windows:

  • Uses database-specific time bucketing functions:
    • PostgreSQL: to_timestamp(floor(extract(epoch from datetime) / window) * window)
    • MySQL/MariaDB: FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(datetime) / window) * window)
    • Elasticsearch: Date histogram aggregation with fixed_interval
    • ClickHouse: toStartOfInterval(datetime, INTERVAL window SECOND)
  • Groups data into time windows (e.g., 1 hour, 15 minutes)
  • Returns average value and count for each window

Usage:

bin/console benchmark:query-avg-by-window \
  --datetime-from="-1 day" \
  --window="1 hour"

Window format examples: "1 day", "2 hours", "30 minutes", "15 minutes"

Use -v flag to see detailed results for each window.

4. Performance Summary

After each benchmark, a performance summary table is displayed showing:

  • Engine name
  • Duration (in milliseconds)
  • Throughput (for inserts: points/second, for queries: rows scanned)
  • vs Fastest (percentage comparison to the fastest engine)

This makes it easy to visually compare the performance characteristics of different databases for analytical workloads.

Additional Features

Real-time Application Monitoring

This project includes a real-time monitoring system that demonstrates ClickHouse's capabilities for observability use cases:

Setup Monitoring (app:setup-monitoring-table)

Creates ClickHouse tables optimized for monitoring data:

  • Main table (monitoring_data): Stores raw request monitoring data (timestamp, duration, memory usage, controller, URI, status code)
  • Aggregation table (monitoring_data_hourly): Uses AggregatingMergeTree engine to store pre-aggregated statistics
  • Materialized view (monitoring_data_hourly_mv): Automatically aggregates data as it's inserted

The materialized view calculates:

  • Request counts
  • Duration statistics (avg, min, max, percentiles: p50, p90, p95, p99)
  • Memory usage statistics (avg, min, max, percentiles: p50, p90, p95, p99)

Usage:

bin/console app:setup-monitoring-table [--force]

Query Monitoring Stats (app:query-monitoring-stats)

Queries aggregated monitoring statistics with flexible filtering:

Usage:

bin/console app:query-monitoring-stats \
  --hours=24 \
  --controller="App\Controller\RootController::index" \
  --status-code=200 \
  --limit=50 \
  --format=table

Options:

  • --hours: Number of hours to look back (default: 24)
  • --controller: Filter by controller name
  • --status-code: Filter by HTTP status code
  • --limit: Limit number of results (default: 50)
  • --format: Output format: table or json (default: table)

The monitoring system uses ClickHouse's AggregateFunction types and merge functions (countMerge, avgMerge, quantilesMerge) to efficiently query pre-aggregated data, demonstrating the power of materialized views for real-time analytics.

How the Monitoring System Works

The monitoring system is built using Symfony's event system and Messenger component to capture and store HTTP request metrics asynchronously. Here's the architecture:

Components

  1. RequestMonitoringListener (src/EventListener/RequestMonitoringListener.php)

    • Listens to KernelEvents::REQUEST and KernelEvents::TERMINATE
    • Captures metrics for each HTTP request:
      • Request duration (in seconds)
      • Peak memory usage (in bytes)
      • Controller name
      • Request URI
      • HTTP status code
    • Dispatches a MonitoringData message to Symfony Messenger
  2. MonitoringData (src/Messenger/Message/MonitoringData.php)

    • A simple readonly DTO (Data Transfer Object) that holds the monitoring metrics
    • Passed through Symfony Messenger for asynchronous processing
  3. MonitoringDataHandler (src/Messenger/MessageHandler/MonitoringDataHandler.php)

    • Message handler that processes MonitoringData messages
    • Implements buffering to optimize ClickHouse inserts:
      • Buffers up to 100 monitoring entries before flushing
      • Also flushes every 30 seconds to prevent stale data
    • Uses ClickHouse's JSONEachRow format for efficient bulk inserts
    • Gracefully handles errors to prevent monitoring from breaking the application
  4. MessengerWorkerSubscriber (src/EventSubscriber/MessengerWorkerSubscriber.php)

    • Ensures buffered data is properly flushed when:
      • The Messenger worker is stopped (WorkerStoppedEvent)
      • Periodically during worker runtime every 30 seconds (WorkerRunningEvent)
    • Prevents data loss when the worker is restarted

Data Flow

HTTP Request
    ↓
RequestMonitoringListener (captures metrics)
    ↓
MonitoringData message → Symfony Messenger (async)
    ↓
MonitoringDataHandler (buffers data)
    ↓
ClickHouse (bulk insert via JSONEachRow)
    ↓
Materialized View (auto-aggregation)
    ↓
monitoring_data_hourly table (pre-aggregated stats)

Using the Monitoring System

Step 1: Setup ClickHouse Tables

First, create the necessary tables and materialized views:

bin/console app:setup-monitoring-table --force

This creates:

  • monitoring_data table (stores raw request data)
  • monitoring_data_hourly table (stores aggregated statistics)
  • monitoring_data_hourly_mv materialized view (auto-aggregates on insert)

Step 2: Start the Messenger Worker

The monitoring system requires a running Messenger worker to process monitoring messages asynchronously:

bin/console messenger:consume async -vv

Important: Keep this worker running in the background. You can run it as a systemd service in production.

Step 3: Generate Traffic

Make HTTP requests to your application. The RequestMonitoringListener will automatically capture metrics for every request:

# Example: Make requests to the demo controller
curl http://localhost:8000/
curl http://localhost:8000/some-other-route

Step 4: Query Monitoring Statistics

Once you have collected some data, query the aggregated statistics:

# View all monitoring stats for the last 24 hours
bin/console app:query-monitoring-stats

# Filter by specific controller
bin/console app:query-monitoring-stats \
  --controller="App\Controller\RootController::index"

# Filter by status code
bin/console app:query-monitoring-stats --status-code=200

# Look back further (last 7 days)
bin/console app:query-monitoring-stats --hours=168

# Export as JSON
bin/console app:query-monitoring-stats --format=json

The output shows hourly aggregated statistics including:

  • Total requests per hour
  • Average/min/max request duration
  • Duration percentiles (p50, p95)
  • Average/min/max memory usage
  • Memory percentiles (p50, p95)

Prerequisites

  • PHP 8.2+
  • Symfony 7.3
  • Composer

Installation

  1. Clone the repository:
git clone <repository-url>
cd clickhouse-symfony
  1. Install dependencies:
composer install
  1. Configure environment variables in .env.local:
# Add connection strings for PostgreSQL, MySQL, MariaDB, Elasticsearch, and ClickHouse
# See .env for required variables

Quick Start

Run a complete benchmark test:

# 1. Setup tables
bin/console benchmark:setup-tables --force

# 2. Fill with 1 million data points
bin/console benchmark:fill-tables --quantity=1_000_000

# 3. Run simple aggregation benchmark
bin/console benchmark:query-avg

# 4. Run windowed aggregation benchmark
bin/console benchmark:query-avg-by-window --window="1 hour"