-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Buffer.add_substitute
#12423
Fix Buffer.add_substitute
#12423
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do we expect breakage with this change?
The commit message should be amended to keep a blank line between the first line and the body.
fa17021
to
94bb6ec
Compare
- clarify documentation - remove Not_found exception - fix implementation
94bb6ec
to
0fcc459
Compare
string pattern, a variable name immediately follows a non-escaped | ||
[$] character and is one of the following: | ||
string pattern, a variable reference is a non-escaped [$] immediately | ||
followed by a variable name, which is one of the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the $
character is escaped is inconsistent with escaping in other places (eg. OCaml strings, format strings) as it's impossible to escape the \
character.
It's impossible to have this function write \$
without using a trick but it's possible to write "\\\""
and printf "%%s"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the
$
character is escaped is inconsistent with escaping in other places (eg. OCaml strings, format strings) as it's impossible to escape the\
character.
That's true, it's this way because I wanted to make minimal changes to the behavior of the function.
It's impossible to have this function write
\$
without using a trick
That's in fact fixed by this PR. Look at the test file, the function maps:
\$
to $
\\$
to \$
\\\$
to \\$
(and so on). It's a different quoting convention, but it is surjective, unlike the current version of add_substitute
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what is impossible without tricks is a backslash followed by a variable reference. We could fix this by switching to the usual quoting convention, but I'm afraid this might entail a lot of breakage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a mostly-backwards compatible that makes two changes to the function:
-
a backslash now always stands for itself, except when it precedes a
$
when the pair stands for a lone$
. Previously,\\$foo
would be mapped to\\<value of foo>
, now it maps to\$foo
. -
no longer raises
Not_found
when an unterminated variable reference is found (eg${foo
), instead it simply echoes the characters back. A quick OPAM grep (https://sherlocode.com/?q=Buffer.add_substitute) shows that most (all?) users of this function do not handleNot_found
anyway. One could argue that, if anything,Buffer.add_substitute
should have raisedinvalid_arg
instead ofNot_found
in this case, but simply returning the untouched characters seems even better.
A second approval by a core dev is needed, as this touches the standard library (note that @MisterDA has approved the PR as well, so a "on-behalf-of" approval could be enough).
I'm satisfied about the new behavior of the function, and the test cases cover all the patterns that were important to me when we filed the original bug. |
Friendly ping; we still need a second review of this PR... volunteers appreciated! Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression:
- the spec of this function is somewhat broken, it should not be in Buffer in the first place ( I think everyone agrees, starting with Alain 9 years ago: Buffer.add_substitute is not surjective and its documentation is incomplete. #6732 (comment) )
- the fix makes some cases that looked very broken behave nicer -- but the result remains unsatisfying in some respects
- in jest I would say, semi-seriously, that the main reason to merge this is the sunk cost fallacy: we have put enough effort in the change to want to see it merged, despite the fact that in practice no one cares and the cost of changing the behavior of the function under the user's feet is probably going to outweigh the benefits of the improvements
I deduce from this glowing review that we should go ahead and merge it, sooner rather than later.
Thanks for the review @gasche! Just for future reference, my thought process regarding this PR is as follows: I tend to agree that the API is not ideal, but as pointed out in #6732 (comment), we are stuck with it for the time being, so in general we should try to fix its shortcomings if it can be done without too much trouble. Whether to go ahead with with backwards-incompatible fixes as in this PR is a more delicate question, but my (necessarily subjective) feeling is that it is worth it. |
Not_found
exceptionFixes #6732