'#' literals in Grammars: syntax error #1324

JJ · 2017-12-20T06:37:49Z

The Problem

There does not seem to be a way of including hashes (#) in grammars except by using `\c'

Expected Behavior

Pretty much as it works in regular expressions, Escaping it should work, like here

> '#' ~~ /\#/
｢#｣

Actual Behavior

grammar Fails {
    regex TOP { \# <stuff> }
    token stuff { \w+ }
}

my $parsed = Fails.parse( "#stuff");

Fails with:

$ perl6 Fails.p6
===SORRY!===
Missing required term after infix
at /home/jmerelo/Code/perl6/dev.to-code/perl6/Fails.p6:3
------>     token stuff { \w+ ⏏}
    expecting any of:
        prefix
        term
Other potential difficulties:
    Space is not significant here; please use quotes or :s (:sigspace) modifier (or, to suppress this warning, omit the space, or otherwise change the spacing)
    at /home/jmerelo/Code/perl6/dev.to-code/perl6/Fails.p6:3
    ------>     token⏏ stuff { \w+ }

Steps to Reproduce

Save the code above and run with perl6 fails.p6. It does not matter if it's a token either.

Environment

Operating system: Ubuntu
Compiler version (perl6 -v):

This is Rakudo version 2017.11 built on MoarVM version 2017.11
implementing Perl 6.c.

The text was updated successfully, but these errors were encountered:

Reported as rakudo/rakudo#1324

lizmat · 2017-12-20T11:44:30Z

Not sure whether this is a bug or a feature, but there *is* a workaround: regex TOP { '#' <stuff> }

…

On 20 Dec 2017, at 07:37, Juan Julián Merelo Guervós ***@***.***> wrote: The Problem There does not seem to be a way of including hashes (#) in grammars except by including the using `\c' Expected Behavior Pretty much as it works in regular expressions, Escaping it should work, like here > '#' ~~ /\#/ ｢#｣ Actual Behavior grammar Fails { regex TOP { \# <stuff> } token stuff { \w+ } } my $parsed = Fails.parse( "#stuff"); Fails with: $ perl6 Fails.p6 ===SORRY!=== Missing required term after infix at /home/jmerelo/Code/perl6/dev.to-code/perl6/Fails.p6:3 ------> token stuff { \w+ ⏏} expecting any of: prefix term Other potential difficulties: Space is not significant here; please use quotes or :s (:sigspace) modifier (or, to suppress this warning, omit the space, or otherwise change the spacing) at /home/jmerelo/Code/perl6/dev.to-code/perl6/Fails.p6:3 ------> token⏏ stuff { \w+ } Steps to Reproduce Save the code above and run with perl6 fails.p6. It does not matter if it's a token either. Environment • Operating system: Ubuntu • Compiler version (perl6 -v): This is Rakudo version 2017.11 built on MoarVM version 2017.11 implementing Perl 6.c. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

JJ · 2017-12-20T11:54:51Z

2017-12-20 12:44 GMT+01:00 lizmat <notifications@github.com>:

Not sure whether this is a bug or a feature, but there *is* a workaround: regex TOP { '#' <stuff> }

Thanks!

zoffixznet · 2017-12-21T01:36:55Z

Reading the core code, looks like the reason token/rule/regex fail is because \# matches a MAIN braid's <ws> token (an unspace, followed by a comment). In OP's code that comment comments out an important part of code, causing a syntax error.

In fact, if you wiggle it just a little, the \# that appears to work as a literal in / / regex also becomes an unspace+comment. Looks like if it's preceeded by whitespace, it becomes unspace, otherwise it becomes a # literal:

say "ab" ~~ /
    a \#b
/; # OUTPUT: ｢a｣

say "ab" ~~ /
    a \#`(
      meows
    )
    b
/; # OUTPUT: ｢ab｣

say "a#`meowsb" ~~ /
    a\#\`(
      meows
    )
    b
/; # OUTPUT: ｢a#`meowsb｣
   #           0 => ｢meows｣

The same pattern appears to apply to tokens too: if there's no whitespace before the \# and it isn't the first thing in there, then it gets treated as a literal:

say "a#b" ~~ my token {
    a\#
    b
} # OUTPUT: ｢a#b｣

Unsure of exact rules of unspaces, but seems like this is working as expected, even if surprising.

JJ · 2017-12-21T06:11:38Z

💯 Whatever the course you decide it to be, my opinion is that it would be best if they behaved in the same way.

zoffixznet · 2017-12-21T19:20:18Z

Some comments:

https://irclog.perlgeek.de/perl6-dev/2017-12-21#i_15606474

06:10 moritz m: / \# /
06:10 evalable6 moritz, rakudo-moar bbf95db: OUTPUT: «»
06:11 moritz I thought we had a warning for that? where did that go?

https://irclog.perlgeek.de/perl6-dev/2017-12-21#i_15609321

19:12 TimToady m: say "#" ~~ m/ \# /
19:12 evalable6 TimToady, rakudo-moar e5c38ad: OUTPUT: «｢#｣␤»
19:13 TimToady I distinctly remember disallowing that at one point, though perhaps that was in STD
19:14 TimToady that's why nqp's throw_unspace method is parameterized to take the character that is being disallowed, I think
19:14 TimToady but nothing calls it on behalf of \#
19:15 TimToady bisect: say "#" ~~ m/ \# /
19:15 TimToady hmm...
19:17 TimToady well, maybe I'm misremembering, and it was parameterized to handle other things like tab, but we could easily add # to that
19:18 TimToady except, of course, for backward compat, sigh

JJ · 2017-12-21T19:21:54Z

Thanks for the update!

JJ · 2019-04-19T06:25:02Z

Ping

Altai-man · 2021-07-06T12:49:04Z

I'd say it is clear from the conversation that this is a case of wontfix / not-a-bug ticket, so should be closed.
Please, re-open if disagree.

polettix · 2021-08-30T20:53:50Z

Hi! Even if it's OK for this to yield a syntax error, I'd like people to elaborate on the fact that it works in some cases:

$ raku
Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2021.07.
Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
Built on MoarVM version 2021.07.

To exit type 'exit' or '^D'
> 'Hello # world' ~~ /\#/
｢#｣
> 'Hello # world' ~~ /\s+\#/
｢ #｣
> 'Hello # world' ~~ /\s+\#\s+/
｢ # ｣
> 'Hello # world' ~~ /\s+ \#/
===SORRY!===
Regex not terminated.
at line 2
------> <BOL>⏏<EOL>
Unable to parse regex; couldn't find final '/'
at line 2
------> <BOL>⏏<EOL>
    expecting any of:
        infix stopper

As I see it, it should be all or nothing.

S02, although possibly obsolete, had this to say:

Within a regex, unspace is disallowed as too ambiguous with customary backslashing conventions in surrounding cultures. Hence you must write an explicit whitespace match some other way, such as with quotes or with a \x20 or \c32 escape. On the other hand, while an unspace can start with \# in normal code, \# within a regex is specifically allowed, and is not taken as unspace, but matches a literal U+0023 (NUMBER SIGN). (Within a character class, you may also escape whitespace with a backslash; the restriction on unspace applies only at the normal pattern-matching level.)

The current test is a bit too concise but seems to proceed in the same spirit:

# \# okay within a regex
ok '#' ~~ /\#/, 'Unspace restriction in regex does not apply to \#';

Would you consider re-opening this bug?

Altai-man · 2021-08-31T11:13:00Z

To be honest it is painful to see people spending efforts on something so tiny from a wider programming perspective that most would not really notice the issue, do the '#' thing and continue to write software, but I am also nobody to tell people how to volunteer their free time, so sure thing, enjoy digging in if you wish. :)

polettix · 2021-08-31T12:11:48Z

I feel your pain. I hit a problem and looked around, to the point of opening a documentation issue after reading this very thread. It turned out that the real intention was originally in agreement with the docs, so there's understandable resistance to change them.

I'm fine with either a change in the code or a change in the spec/tests/docs, whatever is deemed "better" -- including the wider programming perspective. This requires a design decision though. Do you think that it would be better to open an Issue in the Roast repository, adding more tests to better "spec" this behavior?

In the meantime, other people are likely to trip over this 🤷

codesections · 2021-09-01T00:09:20Z

To be honest it is painful to see people spending efforts on something so tiny from a wider programming perspective that most would not really notice the issue, do the '#' thing and continue to write software,

Sorry to cause you pain 😁 But, imo, "make it → make it right → make it fast" is a good progression to stick to, and we're mostly on the "make it fast" part for Raku – which makes fixing the few remaining correctness bugs worthwhile. Plus, I really like that Raku usually doesn't make me remember inconsistancies like "you can escape non-alphanumeric characters with \ – oh, except for #; # is special".

But in any event, I have a fix now, so I don't think any of us will need to spend much more time figuring this out :)

JJ added a commit to JJ/dev.to-code that referenced this issue Dec 20, 2017

Failing code

e6c4db9

Reported as rakudo/rakudo#1324

zoffixznet changed the title ~~Hashes in Grammars: syntax error~~ '#' literals in Grammars: syntax error Dec 29, 2017

zoffixznet mentioned this issue Jul 22, 2018

Misleading error in tr/// when including a # character #2118

Closed

lucasbuchala added the regex Regular expressions, pattern matching, user-defined grammars, tokens and rules label Mar 15, 2019

p6rt mentioned this issue Jan 5, 2020

Escaped hash character \# is somtimes parsed incorrectly Raku/old-issue-tracker#6654

Closed

Altai-man closed this as completed Jul 6, 2021

polettix mentioned this issue Aug 30, 2021

Using backslash to escape the hash/pound metacharacter for comments Raku/doc#3947

Open

Altai-man reopened this Aug 31, 2021

codesections mentioned this issue Sep 1, 2021

Fix escaping of # in regexes #4506

Closed

codesections mentioned this issue Sep 1, 2021

regex { \# } doesn't parse #4387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'#' literals in Grammars: syntax error #1324

'#' literals in Grammars: syntax error #1324

JJ commented Dec 20, 2017 •

edited

lizmat commented Dec 20, 2017 via email

JJ commented Dec 20, 2017 via email

zoffixznet commented Dec 21, 2017 •

edited

JJ commented Dec 21, 2017 via email

zoffixznet commented Dec 21, 2017 •

edited

JJ commented Dec 21, 2017 via email

JJ commented Apr 19, 2019

Altai-man commented Jul 6, 2021

polettix commented Aug 30, 2021 •

edited

Altai-man commented Aug 31, 2021

polettix commented Aug 31, 2021

codesections commented Sep 1, 2021

'#' literals in Grammars: syntax error #1324

'#' literals in Grammars: syntax error #1324

Comments

JJ commented Dec 20, 2017 • edited

The Problem

Expected Behavior

Actual Behavior

Steps to Reproduce

Environment

lizmat commented Dec 20, 2017 via email

JJ commented Dec 20, 2017 via email

zoffixznet commented Dec 21, 2017 • edited

JJ commented Dec 21, 2017 via email

zoffixznet commented Dec 21, 2017 • edited

JJ commented Dec 21, 2017 via email

JJ commented Apr 19, 2019

Altai-man commented Jul 6, 2021

polettix commented Aug 30, 2021 • edited

Altai-man commented Aug 31, 2021

polettix commented Aug 31, 2021

codesections commented Sep 1, 2021

JJ commented Dec 20, 2017 •

edited

zoffixznet commented Dec 21, 2017 •

edited

zoffixznet commented Dec 21, 2017 •

edited

polettix commented Aug 30, 2021 •

edited