Skip to content

fix(migration): quote identifiers without breaking expression indexes#11

Merged
rekurt merged 2 commits into
masterfrom
fix/quote-ident-expression-indexes
Apr 8, 2026
Merged

fix(migration): quote identifiers without breaking expression indexes#11
rekurt merged 2 commits into
masterfrom
fix/quote-ident-expression-indexes

Conversation

@rekurt
Copy link
Copy Markdown
Owner

@rekurt rekurt commented Apr 7, 2026

Summary

  • Add quote_ident(name, dialect) that wraps identifiers in double quotes (PostgreSQL/SQLite) or backticks (MySQL) only when needed — simple lowercase names pass through unquoted for readability
  • Apply quoting to table names, column names, and index names in all generated migration SQL
  • Do not quote index column clauses — they are raw SQL fragments from pg_indexes.indexdef (e.g. lower(email), created_at DESC, "MixedCase") and must be emitted verbatim
  • Fix parse_index_columns bug: rfind('(') matched inner parens in expressions like lower(email), now walks from ON table(...) with proper nesting
  • Merge dialect-aware quoting and SQL injection escape test from fix: quote SQL identifiers in migration generation #9
  • Update README, CONTRIBUTING, architecture docs, CHANGELOG, SECURITY for v0.2.0
  • Add 46 new tests (total: 108) covering expression indexes, sorted indexes, type/nullability/default changes, multi-column indexes, SQL injection escaping, MySQL backticks

Subsumes

Closes #9 — our implementation covers the same quoting surface but avoids the critical bug of quoting index column clauses, which breaks expression and sorted indexes.

Test plan

  • cargo test — 108 tests pass (83 unit + 22 CLI + 3 config)
  • cargo clippy -- -D warnings — clean
  • cargo fmt --check — clean
  • Manual: cargo run -- tests/fixtures/schema_a.sql --schema tests/fixtures/schema_expr_indexes.sql produces valid SQL

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6530ec964f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/migration.rs Outdated
.chars()
.all(|c| c.is_ascii_lowercase() || c.is_ascii_digit() || c == '_');
if needs_quoting {
format!("\"{}\"", name.replace('"', "\"\""))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use MySQL-safe identifier quoting

quote_ident always emits double-quoted identifiers, but this helper is also used when generating MySQL statements (for example in DROP INDEX ... ON ... and MODIFY COLUMN). In default MySQL/MariaDB SQL mode, "identifier" is not an identifier quote (unless ANSI_QUOTES is enabled), so migrations for names that trigger quoting (e.g., CamelCase or spaces) now produce invalid SQL where previously unquoted forms worked. Please make quoting dialect-aware (backticks for MySQL) or avoid forcing ANSI-style quotes for that dialect.

Useful? React with 👍 / 👎.

@rekurt rekurt changed the title Fix/quote ident expression indexes fix(migration): quote identifiers without breaking expression indexes Apr 7, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a61eb474e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/loader/postgres.rs Outdated
Comment on lines +167 to +173
let upper = indexdef.to_uppercase();
let on_pos = match upper.find(" ON ") {
Some(p) => p,
None => return Vec::new(),
};

let after_on = &indexdef[on_pos..];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid slicing with indices from uppercased SQL text

parse_index_columns computes on_pos from indexdef.to_uppercase() and then reuses that byte offset to slice the original indexdef. For quoted identifiers containing non-ASCII characters whose uppercase mapping changes byte length (for example ı, ſ, or similar), this offset can point to the wrong byte boundary and make &indexdef[on_pos..] panic or parse the wrong substring, causing schema loading to fail on otherwise valid PostgreSQL index definitions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cac54da7d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/migration.rs Outdated
Comment on lines +16 to +18
match dialect {
SqlDialect::MySql => format!("`{}`", name.replace('`', "``")),
_ => format!("\"{}\"", name.replace('"', "\"\"")),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle SqlFile dialect when quoting identifiers

In quote_ident, the fallback branch (_) applies PostgreSQL/SQLite-style double-quote escaping to every non-lowercase identifier, but generate_migration is also called with SqlDialect::SqlFile when both inputs are .sql files. That means MySQL-oriented SQL-file comparisons now produce statements like "UserAccounts" and "idx_UserAccounts_email", which are invalid in default MySQL/MariaDB mode unless ANSI_QUOTES is enabled. This is a regression from the previous unquoted output path for SQL-file workflows.

Useful? React with 👍 / 👎.

Comment thread src/loader/postgres.rs Outdated
Comment on lines +180 to +182
let after_on = &indexdef[on_pos..];
let open = match after_on.find('(') {
Some(p) => on_pos + p,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Locate index column list after table token, not first '('

parse_index_columns takes the first ( after ON as the start of the index column list. If a quoted table name itself contains ( (legal identifier text in PostgreSQL), this points into the table name, so the depth scan never balances and the function returns an empty column list. That causes indexes to load with no columns and can later generate invalid SQL like CREATE INDEX ... ON ...(); or spurious index diffs.

Useful? React with 👍 / 👎.

…ismatch on non-ASCII identifiers

Replace to_uppercase().find() with as_bytes().windows().eq_ignore_ascii_case()
in parse_index_columns and parse_default. The old approach computed byte offsets
from the uppercased copy and reused them to slice the original string, which
panics or parses incorrectly when non-ASCII characters change byte length under
uppercasing (e.g. ı → I). Also adds MySQL backtick quoting integration test.
@rekurt rekurt force-pushed the fix/quote-ident-expression-indexes branch from 0cac54d to aa5e0ea Compare April 7, 2026 23:38
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa5e0ea87c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/migration.rs Outdated
// Index columns are raw clauses from pg_indexes.indexdef and may contain
// expressions (lower(email)), sort orders (created_at DESC), or
// already-quoted identifiers. They must NOT be wrapped with quote_ident.
let cols = idx.columns.join(", ");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Quote index column identifiers for MySQL output

create_index_sql now emits idx.columns verbatim for every dialect, but MySQL index columns loaded from information_schema.statistics are plain identifier strings (not raw SQL clauses). When a MySQL column name needs escaping (for example reserved words like select or names containing spaces), generated SQL becomes CREATE INDEX ... (select) and fails to execute; before this change each column was passed through quote_ident. Restrict raw passthrough to expression-style PostgreSQL clauses, or keep identifier quoting for MySQL/SQLite paths.

Useful? React with 👍 / 👎.

Comment thread src/loader/postgres.rs Outdated
Comment on lines +185 to +188
match bytes[pos] {
b'(' => depth += 1,
b')' => depth -= 1,
_ => {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore quoted text when matching index-list parentheses

The parenthesis-depth scan treats every ( and ) byte as structural, including those inside quoted identifiers or string literals, so valid PostgreSQL definitions like ... USING btree ("a)", b) or expressions containing literal parentheses can terminate early and return truncated columns. This yields incorrect index metadata and malformed follow-up migration SQL. The parser needs quote-aware scanning while finding the closing ) for the column list.

Useful? React with 👍 / 👎.

…ote-aware parser

- Restore conditional quoting in quote_ident: only quote identifiers that
  need it (uppercase, spaces, special chars), fixing SqlFile dialect regression
- Quote index columns for MySQL/SQLite (plain identifiers from
  information_schema) while keeping PostgreSQL columns raw (SQL expressions
  from pg_get_indexdef)
- Make parse_index_columns quote-aware: skip double-quoted content when
  finding opening paren and matching close paren, so table names or column
  names containing parentheses don't break the parser
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be72c44c5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/migration.rs
Comment on lines +124 to +127
let needs_quoting = name.is_empty()
|| !name
.chars()
.all(|c| c.is_ascii_lowercase() || c.is_ascii_digit() || c == '_');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Quote reserved and digit-leading identifiers

The new needs_quoting heuristic treats any [a-z0-9_] name as safe, so identifiers like user (reserved keyword) or 123table (invalid unquoted start in PostgreSQL) are emitted unquoted and produce invalid migration SQL. This is a regression from the previous behavior, which always quoted identifiers and therefore handled these valid-but-quoted catalog names correctly when generating CREATE/ALTER/DROP statements.

Useful? React with 👍 / 👎.

Comment thread src/loader/postgres.rs
Comment on lines +161 to +165
let on_pos = match indexdef
.as_bytes()
.windows(4)
.position(|w| w.eq_ignore_ascii_case(b" ON "))
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore quoted index names when finding ON clause

parse_index_columns takes the first ASCII " ON " anywhere in indexdef without quote awareness, so a valid quoted index name containing that token (for example CREATE INDEX "idx ON weird" ON ...) makes on_pos point inside the name. The subsequent scan then fails to find the real column-list opener and returns an empty column list, which can cascade into incorrect index diffs and malformed generated SQL.

Useful? React with 👍 / 👎.

@rekurt rekurt merged commit d8fd92b into master Apr 8, 2026
@rekurt rekurt deleted the fix/quote-ident-expression-indexes branch April 8, 2026 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant