New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope flattening and regexes #2608

Closed
AlexDaniel opened this Issue Jan 14, 2019 · 8 comments

Comments

Projects
None yet
3 participants
@AlexDaniel
Copy link
Member

AlexDaniel commented Jan 14, 2019

cfa++ reported that URI::Encode is failing. Blin bisected to the same commit 541a4f1.

However, it's a bit interesting. After attempting to golf we found that this code changed its behavior:

my $rr = rx/<[bc]>/; say "abcd".comb.map({ so $rr })

But that golf bisected to (2019-01-03) eb3917c, which is a different commit.

¦2018.12,eb3917c260bc^: «(False True True False)␤» ¦eb3917c,541a4f1628^,541a4f1,HEAD(d6b1bd2): «(False False False False)␤»

Furthermore, that same commit doesn't affect the module! Proof.

Both commits are part of the same work.

Originally posted by @AlexDaniel in #2601 (comment)

@AlexDaniel

This comment has been minimized.

Copy link
Member Author

AlexDaniel commented Jan 14, 2019

Ah! Alright! Here's a code snippet that's golfed a bit less (cfa++):

my $rx = rx/<[a..z]>/; 'abc'.comb.map({ if $rx { 'yes' } else { 'no' } }).say;

And this golf, unlike the one in the comment above, is not affected by eb3917c:

<AlexDaniel`> c: 2018.12,eb3917c260bc^,eb3917c260bc,541a4f1628^,541a4f1628,HEAD my $rx = rx/<[a..z]>/; 'abc'.comb.map({ if $rx { 'yes' } else { 'no' } }).say;
<committable6> AlexDaniel, ¦2018.12,eb3917c260bc^,eb3917c,541a4f1628^: «(yes yes yes)␤» ¦541a4f1,HEAD(d6b1bd2): «(no no no)␤»
@AlexDaniel

This comment has been minimized.

Copy link
Member Author

AlexDaniel commented Jan 14, 2019

Comment from @jnthn:

The regex case, we probably do not want to spec; doing so is in contradiction with the design decision made in 6.d that $_ is no longer dynamically scoped. The best we can probably offer is a migration period.

@cfa

This comment has been minimized.

Copy link

cfa commented Jan 14, 2019

Documenting a trap might be advisable? To be clear, here's what's going on:

(i) 2018.12, dynamic $_:

$_ = 'z'; my $rx = rx/<[a..z]>/; 'abc'.comb.map({ if $rx { $/ } else { 'no' } }).say;
# OUTPUT: «(「a」 「b」 「c」)␤»

(ii) HEAD, lexical $_, $rx captures outer 'z':

> $_ = 'z'; my $rx = rx/<[a..z]>/; 'abc'.comb.map({ if $rx { $/ } else { 'no' } }).say;
# OUTPUT: «(「z」 「z」 「z」)␤»

or for comparison with the (no no no) case:

> $_ = Any; my $rx = rx/<[a..z]>/; 'abc'.comb.map({ if $rx { $/ } else { 'no' } }).say;
(no no no)

(iii) Quickest fix to URI::Encode, create a Regex object inside the map sub:

$_ = 'z'; my $rx = rx/<[a..z]>/; 'abc'.comb.map({ if /$rx/ { $/ } else { 'no' } }).say;
# OUTPUT: «(「a」 「b」 「c」)␤»

RfC @jnthn @AlexDaniel.

@jnthn

This comment has been minimized.

Copy link
Member

jnthn commented Jan 17, 2019

I think we need to decide on:

  1. Exactly what semantics we want to spec
  2. What mitigations for current code we'll perhaps decide to live with until 6.e, or if the answer is that code should change - at a minimum to specify use v6.c.

Having $_ be dynamic frustrates program analysis and so limits optimization opportunities. It shows up all over the place, in all kinds of idiomatic code. In some cases, the writes to $_ entails work (for example, boxing in for ^1000 { }) even though the $_ isn't used. A similar situation happens when with or without are used solely for the test, not the topicalization. If it's dynamic, our hands are pretty tied when it comes to trying to elide that work.

Therefore, in 6.d, it was decided that $_ was no longer dynamic. Current Rakudo is implementing optimizations based on that.

The one case that has shown up as awkward is the Regex in sink/boolean context behavior, which is specified as matching against $_. This case is easy enough to analyze when the regex literal appears in those cases. In all other cases, however, supporting it implies $_ has to be dynamic.

At present, the optimizer looks for scopes where a regex literal appears, and refuses to optimize out $_ in such cases, just in case it is used in sink or boolean context. This handles all the literal cases in sink and boolean context just fine, which is - I believe - what the boolean context and sink context semantics were aimed at.

The other cases get...interesting, and honestly I'm not sure it was an intended feature that you could stick a regex literal in a variable and then use it as a boolean "at a distance". With current Rakudo, you get the first $_ on the stack that was not optimized out. Which could be anything. This is certainly not desirable semantics, so we should pick some.

One option would be for the closure clone of a regex literal to capture the reference to the $_ as part of the clone process. This is quite easy to arrange. Then instead of it doing a dynamic lookup, it just matches against an object attribute. That could actually make the common case a bit faster. It would also be predictable: even if you use the regex in boolean context at a distance, you always know it will refer to the $_ it was captured with. That maybe isn't what you want, but it is at least not going to be seemingly random behavior.

In the meantime, we could perhaps make the assumption that uses of a regex in boolean context will appear lexically beneath the point of the literal, and maybe poison $_ lowering in inner scopes, which isn't the way we normally do these things, but would let us restore the behavior that the module in question relies on. Alas, that'll - until we get rid of it - mean that the mere presence of a regex will make various optimizations not happen in various bits of surrounding code. Maybe for now the least evil choice, and in 6.e we can nail it down as described about.

Thoughts?

jnthn added a commit that referenced this issue Jan 17, 2019

Make `.grep: { /regex/ }` work again
This worked for the slightly dubious reason that `grep` happened to set
`$_`, and boolify the regex that was returned. This was relying on an
implementation detail - there's nothing that says `grep` has to have the
test element in its `$_` - however there's code in the wild that relies
on this construct, and it's hard to argue that it shouldn't DWIM.

It stopped working when the `$_` in `grep` ceased to be dynamically
available, as it no longer needs to be. For now, solve it by making
`grep` explictly handle the case where a Regex is returned to it. This
fixes #2614. If in the future we accept the proposal that a regex will,
when cloned, capture the `$_` in the scope it literally appears in for
the purpose of use in boolean or sink context, as proposed in #2608,
then this change can be removed again. It's also a good sign for that
proposal that it happens to solve this problem neatly.
@AlexDaniel

This comment has been minimized.

Copy link
Member Author

AlexDaniel commented Jan 21, 2019

@jnthn any new thoughts since the day you post your comment? Personally I'm not too concerned about this issue given that you can still get the previous behavior with use v6.c.

@jnthn

This comment has been minimized.

Copy link
Member

jnthn commented Jan 25, 2019

@AlexDaniel I'm still quite keen on the "closure clone of a regex captures $_ and $/" approach, in that it's predictable and gives us an way for .grep({ /foo/ }) to work that is not a total hack. Heck, it lets us having so /foo/ work in a good way too.

I've done an implementation of it to see what happens. The main thing it doesn't do is retain the 6.c behavior; I can make that happen, but it's only really worth it if I know we decide to go this way. I think we should, but I'd like to here a few other opinions. See the notes in the linked commit for details.

The one spectest in question looks like this;

$var = rx/<&abc>/;
ok("aaabccc" ~~ m/aa <{ $var ?? $var !! rx{<.null>} }> cc/, 'Rule block first');

And would have to become something like this:

$var = rx/<&abc>/;
ok("aaabccc" ~~ m/aa <{ $_ ~~ $var ?? $var !! rx{<.null>} }> cc/, 'Rule block first');

Which is still a bit...uh...interesting in terms of how it works (basically, just due to the $_ being bound temporarily by the smartmatch, and restored after it). Since such code is always asking "does this other thing match the whole string somewhere, and if so match it here", it's kind of a weird thing to be doing.

@jnthn

This comment has been minimized.

Copy link
Member

jnthn commented Jan 28, 2019

With a lack of objections here or, to my knowledge, elsewhere, and some amount of agreement on the change, I've now added the previously mentioned commit to master, along with another to preserve 6.c behavior. I've updated the mentioned spectest, and also picked that into 6.d-errata.

@AlexDaniel

This comment has been minimized.

Copy link
Member Author

AlexDaniel commented Jan 31, 2019

Thank you, @jnthn. This can be closed.

@AlexDaniel AlexDaniel closed this Jan 31, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment