Skip to content

feat: merge-DML batching optimization for binlog apply#1687

Open
dnovitski wants to merge 1 commit into
github:masterfrom
dnovitski:pr-1378
Open

feat: merge-DML batching optimization for binlog apply#1687
dnovitski wants to merge 1 commit into
github:masterfrom
dnovitski:pr-1378

Conversation

@dnovitski
Copy link
Copy Markdown
Contributor

@dnovitski dnovitski commented May 22, 2026

Related issue: #1378

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

Summary

Port of #1378 (by @shaohk) rebased onto current master with API adaptation, correctness hardening, and comprehensive tests.

What it does

When --is-merge-dml-event is enabled, gh-ost batches binlog DML events instead of applying them one-by-one:

  1. Batching — Collects up to --dml-batch-size events, then emits at most 3 SQL statements (one batched REPLACE for inserts/updates, one batched DELETE, one for key-changing updates)
  2. Deduplication — Within a batch, only the last event per unique-key value survives (e.g., 5 UPDATEs to the same row → 1 REPLACE)
  3. Range filtering — Events targeting rows beyond the current row-copy high-watermark are discarded (they'll be copied by row-copy anyway)

Changes from original PR

  • Adapted to master's builder-pattern query API (NewDMLDeleteQueryBuilder etc.)
  • Security: Strict formatNumericValue type-switch (rejects non-numeric types, prevents SQL injection via interpolated DELETE values)
  • Correctness: INSERT+DELETE within same batch emits DELETE (not no-op) to protect against row-copy race
  • Correctness: Merge auto-disabled when table has >1 unique key (REPLACE semantics unsafe with secondary unique indexes)
  • Correctness: Fixed strings.Contains("int") matching "point" → exact base-type switch
  • Correctness: Key-changing UPDATE saves/restores dmlEvent.DML to avoid mutation
  • Tests: 35+ test cases covering dedup scenarios, batch query structure, range filtering, type detection, and numeric formatting edge cases

Benchmark Results

Setup: MySQL 8.0 (Docker), 100K-row table, 8 concurrent writer threads generating ~2900 events/s (60% UPDATE, 25% INSERT, 15% DELETE), --dml-batch-size=100.

Mode Wall-clock Events Applied Events Ignored Effective SQL Queries
Standard 2m58s ~411,000 0 ~411,000
Merge 2m57s ~265,000 ~128,000 ~8,000

Wall-clock is identical because this workload is row-copy bound (backlog stays at 0–1/1000). The optimization delivers:

  • 98% fewer SQL queries to MySQL (411K → 8K)
  • 32% fewer events applied via dedup (redundant updates to same rows eliminated)
  • 33% of events range-filtered (beyond row-copy watermark, would be copied anyway)

The wall-clock improvement manifests on large tables (millions of rows, hours-long migrations) where DML apply becomes the bottleneck and backlog grows. The query reduction directly translates to less lock contention, less binlog volume, and lower replication lag on the target.

Go-level microbenchmark (query count per 1000 events):

Metric Standard Merge Reduction
SQL queries 1000 3 333×

Usage

gh-ost ... --is-merge-dml-event --dml-batch-size=100

Auto-disabled (with log warning) when:

  • Table's unique key contains non-memory-comparable columns (TEXT, BLOB, JSON)
  • Table's unique key has nullable columns (NULL breaks comparison/dedup semantics)
  • Table has more than one unique key (REPLACE semantics conflict)

Credit

Based on the work by @shaohk in #1378. Co-authored-by trailer preserved in commit.

@dnovitski dnovitski mentioned this pull request May 22, 2026
2 tasks
@dnovitski dnovitski changed the title fix: use binlog row images in merge DML path instead of source table reads feat: merge-DML batching optimization for binlog apply May 22, 2026
@dnovitski dnovitski force-pushed the pr-1378 branch 7 times, most recently from 6d0ead2 to fe8e3bb Compare May 22, 2026 04:04
Add --is-merge-dml-event flag that batches and deduplicates binlog DML
events before applying them to the ghost table, significantly reducing
SQL round-trips during high-write migrations.

When enabled and the unique key is memory-comparable (numeric columns):
- Deduplicates DML events by unique key (latest event wins)
- Reduces INSERT+DELETE sequences to DELETE (safe against row-copy races)
- Batches INSERTs/UPDATEs as multi-row REPLACE INTO
- Batches DELETEs as DELETE WHERE (pk) IN (...)
- Skips events beyond migration range (not yet copied by row-copy)
- Disables merge for tables with secondary unique indexes

Safety: strict numeric type validation in formatNumericValue prevents
SQL injection. Type detection uses exact base-type parsing (not substring).
Uses BuildColumnsPreparedValues for proper per-column conversion tokens.

Original implementation by shaohoukun in PR github#1378, adapted to current
master's builder-pattern API with correctness and security hardening.

Co-authored-by: shaohk <shaohoukun@meituan.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant