Fix split_arguments to handle single-quoted strings and UTF-8#156
Conversation
The split_arguments method did not account for single-quoted string
literals, causing commas inside quoted default values (e.g. ','::character
varying) to be treated as delimiters. This produced broken argument
fragments when building CREATE OR REPLACE scripts for routines with such
defaults.
Extend the character-by-character state machine with in_quote tracking.
When inside a single-quoted string, commas are ignored as delimiters.
Escaped quotes ('') inside a string literal are consumed as a pair
without toggling the in-quote state. Dollar-quoted strings are also
tracked defensively to handle any future occurrences in default
expressions.
Add five unit tests covering: single quoted-comma default, multiple
defaults with a quoted comma, escaped single-quotes, parenthesised type
expressions (regression), and a full arguments_with_defaults round-trip.
Add format_csv_line procedure with DEFAULT ',' to both integration
fixture schemas (schema_a.sql, schema_b.sql) to validate no false diff
is emitted for this case. Update data/test/README.md accordingly.
Closes nettrash#154
Switch char-vector index to char_indices() so that s[byte_i..] slices are always valid byte offsets, preventing panics on multi-byte UTF-8 characters before a dollar-quote tag. Also remove the unnecessary .clone() on in_dollar_quote and add a unit test for dollar-quoted string parsing.
Dollar-quote handling was added defensively but is not needed for the issue nettrash#154 fix. Remove in_dollar_quote state, the dollar-tag detection block, and the related unit test. Keep only single-quote handling and the char_indices approach required for escaped-quote lookahead.
There was a problem hiding this comment.
Pull request overview
This pull request fixes routine argument-default parsing so commas inside single-quoted string literals (e.g., DEFAULT ',') are not treated as argument separators, preventing incorrect routine signatures/diffs (Issue #154).
Changes:
- Enhanced
Routine::split_argumentsto respect single-quoted string literals (including doubled-quote escaping) in addition to parenthesized groups. - Added unit tests covering quoted commas, escaped quotes, and a full
arguments_with_defaultsround-trip for theDEFAULT ','case. - Added an identical regression-test procedure (
format_csv_line) to both test schemas and documented the expected no-diff behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| app/src/dump/routine.rs | Improves argument/default splitting logic and adds targeted unit/regression tests. |
| data/test/schema_a.sql | Adds format_csv_line procedure with DEFAULT ',' to reproduce/guard Issue #154. |
| data/test/schema_b.sql | Mirrors format_csv_line procedure so schema comparisons should emit no diff. |
| data/test/README.md | Documents the new regression test and expected comparer behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… split_arguments Use a zero-allocation peekable iterator instead of pre-collecting into a Vec for the single lookahead needed for escaped-quote detection. Replace clone+clear with std::mem::take when pushing completed segments.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
iBarBuba
left a comment
There was a problem hiding this comment.
Re: [#discussion_r3047810206] — Good observation. In practice pg_get_expr(proargdefaults, 0) always deparses string literals via quote_literal(), which emits the standard ''-doubling form and never produces E'...' / \' escapes — so this splitter only needs to handle what that function can actually output. E-string handling would be necessary if parsing arbitrary SQL expressions from other sources, but that's out of scope here.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This pull request improves the parsing logic for routine argument defaults in order to correctly handle cases where a comma appears inside a single-quoted string, such as in default values like
DEFAULT ','. It ensures that such commas are not incorrectly treated as argument separators, addressing Issue #154. The change is thoroughly tested and includes regression tests in the codebase and test schemas.Parsing improvements:
split_argumentsfunction inroutine.rsto correctly handle single-quoted string literals, including escaped quotes, so that commas inside quoted strings are not treated as delimiters. This prevents incorrect splitting of argument lists when default values contain commas in strings.Testing and regression coverage:
split_argumentsto verify correct handling of single-quoted commas, escaped quotes, and parenthesized expressions, ensuring robust parsing and preventing regressions.format_csv_linewith a comma-in-string default value to both test schemas (schema_a.sqlandschema_b.sql) as a regression test for Issue pgc incorrectly parses default string values in procedure parameters (comma delimiter case) #154. [1] [2]README.md) to describe the new regression test and clarify the expected behavior when comparing schemas with procedures containing comma-in-string defaults. [1] [2]