Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caller-$/-setting is too invasive #1235

Open
zoffixznet opened this issue Nov 5, 2017 · 10 comments

Comments

Projects
None yet
7 participants
@zoffixznet
Copy link
Contributor

commented Nov 5, 2017

This started with RT#126721 that points out .subst's replacement block has wrong $/ if caller's $/ is read-only. The bug was originally fixed by having .subst throw in such circumstances, just like .match and .trans do (now reverted).

This fix caused breakage in about 20 modules and when I went to fix it I ended up staring at diffs like this, with a dozen changes, all just to rename a variable, all just because a .subst call was used.

This is what the users would regularly have to do in their grammar actions, since $/ is the common name for the parameter. For that reason, I propose we go into the opposite direction and instead don't set caller's $/ in method forms. Off the top of my head, that'd affect .parse, .subparse, .match, .trans (currently throw on read-only $/) and .subst and .subst-mutate (currently silently fail). .comb with regex might be also in the list, except it already appears it does not touch $/.

The operator forms (/.../, s/...//, S/...//, and tr///) on the other hand would set $/. Unlike operators, methods are often chained, and at least to me feel like they shouldn't be messing with variables in my current scope.

So "foo" ~~ /(f)(.)(.)/ would set $/ and the user can obtain captures from it. "foo".match: /(f)(.)(.)/ would not and it'd be assumed the user would be chaining it with something and using the return value.

One obvious implementation would be to add something like :$RAKUDO-INTERNAL-SET-DOLLAR-SLASH parameter to the methods that operators would set, since they're implemented in terms of those methods. The arg could also be made public API (e.g. :$set-match), so users could set it in method forms to affect the $/ . Finally, the back-compat way to do this could be to add a reverse arg: one that makes method forms NOT touch $/ but IMO that feels like a bandaid over the issue and users would still get confusing errors when using these methods in grammar actions, except instead of telling them to rename the parameter we'd be telling them to set an arg.

This change would break 6.c tests, so it's 6.d material.

@zoffixznet zoffixznet added RFC 6.d labels Nov 5, 2017

@jnthn

This comment has been minimized.

Copy link
Member

commented Nov 5, 2017

This is what the users would regularly have to do in their grammar actions, since $/ is the common name for the parameter

I think there's a fairly good argument that using subst inside of an action method would imply re-parsing, which one should think twice about anyway. I agree silent failure is a very bad behavior, though.

On the more general issue, the thing to keep in mind is that the replacement part tends to want to refer to $/. In the case of the s/// syntax, we may well already be compiling the replacement part as taking $/ as a parameter, so it'd work out fine. The thing we'd extensively bust if we changed subst to not set $/ is anything like $foo.subst(/(\d+)/, { $0 * 2 }, :g). I suspect that's extremely common, and would probably break a lot more than adding the error on silent failure to set.

@zoffixznet

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2017

I think a third possibility that's likely 6.c-friendly is possible: with both ops and methods, try to set caller's $/ and if we fail, then assume the user decided they don't want it set. If it's possible for $/ to be used within the machinery of the method, then the method will put the match into their own $/ that's visible within the machinery but does not affect caller's $/:

  • .subst: /(\d)/, { $0 * 2 } sets caller's $/
  • $/ := "meow"; .subst: /(\d)/, { $0 * 2 } can't set caller, so it doesn't bother to, but $/ is still available within the block

Basically, instead of bugging the user to rename their variables or silently failing, we'd be silently using our own $/ that doesn't get propagated to the caller.

@zoffixznet zoffixznet removed the 6.d label Nov 6, 2017

@vendethiel

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2017

That option sounds like the most user-friendly to me.

@TisonKun

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2017

grammar {
  token TOP { <letters> { if $<letters> ~~ /^a/ { make 42; } } }
  token letters { \w+ }
}.parse("abcd").made.say; # OUTPUT: Nil

please take this case into consideration.

make/made might be the most commonly actions, and, user who write grammar quite probably tends to using regexes. this case shows that as things go complex, using regexes may break what we always do($<token-name>) in grammar-actions.

@smls

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2018

@zoffixznet

$/ := "meow"; .subst: /(\d)/, { $0 * 2 } can't set caller, so it doesn't bother to, but $/ is still available within the block

Would the $0 inside the block refer to the caller's $/ (i.e. "meow")?
If so, I think that's worse than throwing an exception - bugs caused by some code suddenly starting to use a different value than the user expected due to a change elsewhere, are among the most annoying to debug.

Or do you mean that the .subst method would somehow set $/ inside the callback block passed to it, before invoking that block?
If that's possible (and doesn't turn out to have negative consequences), I think the best solution would be your original proposal (method forms not setting $/ in their caller's scope), with the addition that the ones that accept a callback do set $/ inside that callback's scope.

@zoffixznet

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

Would the $0 inside the block refer to the caller's $/ (i.e. "meow")?

No, the $0 inside the block would contain (\d) match from subst. What you describe is the current, buggy behaviour.

@smls

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2018

@zoffixznet

No, the $0 inside the block would contain (\d) match from subst.

How? Which is to say, what mechanism will pass $/ from .subst to the callback block?

Will .subst "cheat" to accomplish this, or could a user-defined (pure Perl 6) .subst-alternative do the same?

  • Maybe an expansion of the Capture type so it can hold a value for $/ (and $_ and and $!?) in addition to positional and named arguments, which any Block will then set in its scope when it is invoked with the Capture?

  • Or an alternative to .CALL-ME that says "Prepare a call to this block, i.e. resolve any multi-dispatch and bind the parameters and create a callframe, but don't start running its code just yet – instead, give me the callframe so I can mess with it first"?

@zoffixznet

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2018

How?

I don't remember anymore or whether I tried implementing anything at the time.

It's possible that I thought that since we have to do all the fetching of $/ from caller in the method that we'd be able to fully control the $/ inside the block.

Is $/ lexical or dynamic?

@zoffixznet zoffixznet added the 6.e label Jun 28, 2018

@zoffixznet

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2018

Is $/ lexical or dynamic?

This says dynamic:

<Zoffix_> m: say $/.VAR.dynamic
<camelia> rakudo-moar 587cd4f9e: OUTPUT: «True␤»

So I'd expect this:

<Zoffix_> m: use MONKEY; augment class Str { method s (\re, \b) { $/ := "foos"; dd b.() } }; $/ := "meows"; "foo".s: /o(o)/, -> { $/ ~ "g" };
<camelia> rakudo-moar 587cd4f9e: OUTPUT: «"meowsg"␤»

To work like this:

<Zoffix_> m: use MONKEY; augment class Str { method s (\re, \b) { my $*z := "foos"; dd b.() } }; my $*z := "meows"; "foo".s: /o(o)/, -> { $*z ~ "g" };
<camelia> rakudo-moar 587cd4f9e: OUTPUT: «"foosg"␤»

Is the optimizer converting the $/ to lexical overeagerly in this context, perhaps?

@FCO

This comment has been minimized.

Copy link
Member

commented Jun 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.