fix(migration): quote identifiers without breaking expression indexes by rekurt · Pull Request #11 · rekurt/dbdiff

rekurt · 2026-04-07T22:22:53Z

Summary

Add quote_ident(name, dialect) that wraps identifiers in double quotes (PostgreSQL/SQLite) or backticks (MySQL) only when needed — simple lowercase names pass through unquoted for readability
Apply quoting to table names, column names, and index names in all generated migration SQL
Do not quote index column clauses — they are raw SQL fragments from pg_indexes.indexdef (e.g. lower(email), created_at DESC, "MixedCase") and must be emitted verbatim
Fix parse_index_columns bug: rfind('(') matched inner parens in expressions like lower(email), now walks from ON table(...) with proper nesting
Merge dialect-aware quoting and SQL injection escape test from fix: quote SQL identifiers in migration generation #9
Update README, CONTRIBUTING, architecture docs, CHANGELOG, SECURITY for v0.2.0
Add 46 new tests (total: 108) covering expression indexes, sorted indexes, type/nullability/default changes, multi-column indexes, SQL injection escaping, MySQL backticks

Subsumes

Closes #9 — our implementation covers the same quoting surface but avoids the critical bug of quoting index column clauses, which breaks expression and sorted indexes.

Test plan

cargo test — 108 tests pass (83 unit + 22 CLI + 3 config)
cargo clippy -- -D warnings — clean
cargo fmt --check — clean
Manual: cargo run -- tests/fixtures/schema_a.sql --schema tests/fixtures/schema_expr_indexes.sql produces valid SQL

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6530ec964f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-07T22:26:55Z

+            .chars()
+            .all(|c| c.is_ascii_lowercase() || c.is_ascii_digit() || c == '_');
+    if needs_quoting {
+        format!("\"{}\"", name.replace('"', "\"\""))


Use MySQL-safe identifier quoting

quote_ident always emits double-quoted identifiers, but this helper is also used when generating MySQL statements (for example in DROP INDEX ... ON ... and MODIFY COLUMN). In default MySQL/MariaDB SQL mode, "identifier" is not an identifier quote (unless ANSI_QUOTES is enabled), so migrations for names that trigger quoting (e.g., CamelCase or spaces) now produce invalid SQL where previously unquoted forms worked. Please make quoting dialect-aware (backticks for MySQL) or avoid forcing ANSI-style quotes for that dialect.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a61eb474e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-07T22:55:01Z

+    let upper = indexdef.to_uppercase();
+    let on_pos = match upper.find(" ON ") {
+        Some(p) => p,
+        None => return Vec::new(),
+    };
+
+    let after_on = &indexdef[on_pos..];


Avoid slicing with indices from uppercased SQL text

parse_index_columns computes on_pos from indexdef.to_uppercase() and then reuses that byte offset to slice the original indexdef. For quoted identifiers containing non-ASCII characters whose uppercase mapping changes byte length (for example ı, ſ, or similar), this offset can point to the wrong byte boundary and make &indexdef[on_pos..] panic or parse the wrong substring, causing schema loading to fail on otherwise valid PostgreSQL index definitions.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cac54da7d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-07T23:20:44Z

+        match dialect {
+            SqlDialect::MySql => format!("`{}`", name.replace('`', "``")),
+            _ => format!("\"{}\"", name.replace('"', "\"\"")),


Handle SqlFile dialect when quoting identifiers

In quote_ident, the fallback branch (_) applies PostgreSQL/SQLite-style double-quote escaping to every non-lowercase identifier, but generate_migration is also called with SqlDialect::SqlFile when both inputs are .sql files. That means MySQL-oriented SQL-file comparisons now produce statements like "UserAccounts" and "idx_UserAccounts_email", which are invalid in default MySQL/MariaDB mode unless ANSI_QUOTES is enabled. This is a regression from the previous unquoted output path for SQL-file workflows.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-07T23:20:44Z

+    let after_on = &indexdef[on_pos..];
+    let open = match after_on.find('(') {
+        Some(p) => on_pos + p,


Locate index column list after table token, not first '('

parse_index_columns takes the first ( after ON as the start of the index column list. If a quoted table name itself contains ( (legal identifier text in PostgreSQL), this points into the table name, so the depth scan never balances and the function returns an empty column list. That causes indexes to load with no columns and can later generate invalid SQL like CREATE INDEX ... ON ...(); or spurious index diffs.

Useful? React with 👍 / 👎.

…ismatch on non-ASCII identifiers Replace to_uppercase().find() with as_bytes().windows().eq_ignore_ascii_case() in parse_index_columns and parse_default. The old approach computed byte offsets from the uppercased copy and reused them to slice the original string, which panics or parses incorrectly when non-ASCII characters change byte length under uppercasing (e.g. ı → I). Also adds MySQL backtick quoting integration test.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa5e0ea87c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-07T23:42:00Z

+    // Index columns are raw clauses from pg_indexes.indexdef and may contain
+    // expressions (lower(email)), sort orders (created_at DESC), or
+    // already-quoted identifiers.  They must NOT be wrapped with quote_ident.
+    let cols = idx.columns.join(", ");


Quote index column identifiers for MySQL output

create_index_sql now emits idx.columns verbatim for every dialect, but MySQL index columns loaded from information_schema.statistics are plain identifier strings (not raw SQL clauses). When a MySQL column name needs escaping (for example reserved words like select or names containing spaces), generated SQL becomes CREATE INDEX ... (select) and fails to execute; before this change each column was passed through quote_ident. Restrict raw passthrough to expression-style PostgreSQL clauses, or keep identifier quoting for MySQL/SQLite paths.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-07T23:42:00Z

+        match bytes[pos] {
+            b'(' => depth += 1,
+            b')' => depth -= 1,
+            _ => {}


Ignore quoted text when matching index-list parentheses

The parenthesis-depth scan treats every ( and ) byte as structural, including those inside quoted identifiers or string literals, so valid PostgreSQL definitions like ... USING btree ("a)", b) or expressions containing literal parentheses can terminate early and return truncated columns. This yields incorrect index metadata and malformed follow-up migration SQL. The parser needs quote-aware scanning while finding the closing ) for the column list.

Useful? React with 👍 / 👎.

…ote-aware parser - Restore conditional quoting in quote_ident: only quote identifiers that need it (uppercase, spaces, special chars), fixing SqlFile dialect regression - Quote index columns for MySQL/SQLite (plain identifiers from information_schema) while keeping PostgreSQL columns raw (SQL expressions from pg_get_indexdef) - Make parse_index_columns quote-aware: skip double-quoted content when finding opening paren and matching close paren, so table names or column names containing parentheses don't break the parser

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be72c44c5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-07T23:55:03Z

+    let needs_quoting = name.is_empty()
+        || !name
+            .chars()
+            .all(|c| c.is_ascii_lowercase() || c.is_ascii_digit() || c == '_');


Quote reserved and digit-leading identifiers

The new needs_quoting heuristic treats any [a-z0-9_] name as safe, so identifiers like user (reserved keyword) or 123table (invalid unquoted start in PostgreSQL) are emitted unquoted and produce invalid migration SQL. This is a regression from the previous behavior, which always quoted identifiers and therefore handled these valid-but-quoted catalog names correctly when generating CREATE/ALTER/DROP statements.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-07T23:55:03Z

+    let on_pos = match indexdef
+        .as_bytes()
+        .windows(4)
+        .position(|w| w.eq_ignore_ascii_case(b" ON "))
+    {


Ignore quoted index names when finding ON clause

parse_index_columns takes the first ASCII " ON " anywhere in indexdef without quote awareness, so a valid quoted index name containing that token (for example CREATE INDEX "idx ON weird" ON ...) makes on_pos point inside the name. The subsequent scan then fails to find the real column-list opener and returns an empty column list, which can cascade into incorrect index diffs and malformed generated SQL.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Apr 7, 2026

View reviewed changes

rekurt changed the title ~~Fix/quote ident expression indexes~~ fix(migration): quote identifiers without breaking expression indexes Apr 7, 2026

chatgpt-codex-connector Bot reviewed Apr 7, 2026

View reviewed changes

rekurt force-pushed the fix/quote-ident-expression-indexes branch from 0cac54d to aa5e0ea Compare April 7, 2026 23:38

chatgpt-codex-connector Bot reviewed Apr 7, 2026

View reviewed changes

rekurt merged commit d8fd92b into master Apr 8, 2026

rekurt deleted the fix/quote-ident-expression-indexes branch April 8, 2026 00:10

Conversation

rekurt commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Subsumes

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rekurt commented Apr 7, 2026 •

edited

Loading