Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'#' literals in Grammars: syntax error #1324

Open
JJ opened this issue Dec 20, 2017 · 12 comments
Open

'#' literals in Grammars: syntax error #1324

JJ opened this issue Dec 20, 2017 · 12 comments
Labels
regex Regular expressions, pattern matching, user-defined grammars, tokens and rules

Comments

@JJ
Copy link
Collaborator

JJ commented Dec 20, 2017

The Problem

There does not seem to be a way of including hashes (#) in grammars except by using `\c'

Expected Behavior

Pretty much as it works in regular expressions, Escaping it should work, like here

> '#' ~~ /\#/
「#」

Actual Behavior

grammar Fails {
    regex TOP { \# <stuff> }
    token stuff { \w+ }
}

my $parsed = Fails.parse( "#stuff");

Fails with:

$ perl6 Fails.p6
===SORRY!===
Missing required term after infix
at /home/jmerelo/Code/perl6/dev.to-code/perl6/Fails.p6:3
------>     token stuff { \w+ ⏏}
    expecting any of:
        prefix
        term
Other potential difficulties:
    Space is not significant here; please use quotes or :s (:sigspace) modifier (or, to suppress this warning, omit the space, or otherwise change the spacing)
    at /home/jmerelo/Code/perl6/dev.to-code/perl6/Fails.p6:3
    ------>     token⏏ stuff { \w+ }

Steps to Reproduce

Save the code above and run with perl6 fails.p6. It does not matter if it's a token either.

Environment

  • Operating system: Ubuntu
  • Compiler version (perl6 -v):
This is Rakudo version 2017.11 built on MoarVM version 2017.11
implementing Perl 6.c.
JJ added a commit to JJ/dev.to-code that referenced this issue Dec 20, 2017
@lizmat
Copy link
Contributor

lizmat commented Dec 20, 2017 via email

@JJ
Copy link
Collaborator Author

JJ commented Dec 20, 2017 via email

@zoffixznet
Copy link
Contributor

zoffixznet commented Dec 21, 2017

Reading the core code, looks like the reason token/rule/regex fail is because \# matches a MAIN braid's <ws> token (an unspace, followed by a comment). In OP's code that comment comments out an important part of code, causing a syntax error.

In fact, if you wiggle it just a little, the \# that appears to work as a literal in / / regex also becomes an unspace+comment. Looks like if it's preceeded by whitespace, it becomes unspace, otherwise it becomes a # literal:

say "ab" ~~ /
    a \#b
/; # OUTPUT: 「a」

say "ab" ~~ /
    a \#`(
      meows
    )
    b
/; # OUTPUT: 「ab」

say "a#`meowsb" ~~ /
    a\#\`(
      meows
    )
    b
/; # OUTPUT: 「a#`meowsb」
   #           0 => 「meows」

The same pattern appears to apply to tokens too: if there's no whitespace before the \# and it isn't the first thing in there, then it gets treated as a literal:

say "a#b" ~~ my token {
    a\#
    b
} # OUTPUT: 「a#b」

Unsure of exact rules of unspaces, but seems like this is working as expected, even if surprising.

@JJ
Copy link
Collaborator Author

JJ commented Dec 21, 2017 via email

@zoffixznet
Copy link
Contributor

zoffixznet commented Dec 21, 2017

Some comments:

https://irclog.perlgeek.de/perl6-dev/2017-12-21#i_15606474

06:10 moritz m: / \# /
06:10 evalable6 moritz, rakudo-moar bbf95db: OUTPUT: «»
06:11 moritz I thought we had a warning for that? where did that go?

https://irclog.perlgeek.de/perl6-dev/2017-12-21#i_15609321

19:12 TimToady m: say "#" ~~ m/ \# /
19:12 evalable6 TimToady, rakudo-moar e5c38ad: OUTPUT: «「#」␤»
19:13 TimToady I distinctly remember disallowing that at one point, though perhaps that was in STD
19:14 TimToady that's why nqp's throw_unspace method is parameterized to take the character that is being disallowed, I think
19:14 TimToady but nothing calls it on behalf of \#
19:15 TimToady bisect: say "#" ~~ m/ \# /
19:15 TimToady hmm...
19:17 TimToady well, maybe I'm misremembering, and it was parameterized to handle other things like tab, but we could easily add # to that
19:18 TimToady except, of course, for backward compat, sigh

@JJ
Copy link
Collaborator Author

JJ commented Dec 21, 2017 via email

@zoffixznet zoffixznet changed the title Hashes in Grammars: syntax error '#' literals in Grammars: syntax error Dec 29, 2017
@lucasbuchala lucasbuchala added the regex Regular expressions, pattern matching, user-defined grammars, tokens and rules label Mar 15, 2019
@JJ
Copy link
Collaborator Author

JJ commented Apr 19, 2019

Ping

@Altai-man
Copy link
Member

I'd say it is clear from the conversation that this is a case of wontfix / not-a-bug ticket, so should be closed.
Please, re-open if disagree.

@polettix
Copy link

polettix commented Aug 30, 2021

Hi! Even if it's OK for this to yield a syntax error, I'd like people to elaborate on the fact that it works in some cases:

$ raku
Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2021.07.
Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
Built on MoarVM version 2021.07.

To exit type 'exit' or '^D'
> 'Hello # world' ~~ /\#/
「#」
> 'Hello # world' ~~ /\s+\#/
「 #」
> 'Hello # world' ~~ /\s+\#\s+/
「 # 」
> 'Hello # world' ~~ /\s+ \#/
===SORRY!===
Regex not terminated.
at line 2
------> <BOL>⏏<EOL>
Unable to parse regex; couldn't find final '/'
at line 2
------> <BOL>⏏<EOL>
    expecting any of:
        infix stopper

As I see it, it should be all or nothing.

S02, although possibly obsolete, had this to say:

Within a regex, unspace is disallowed as too ambiguous with customary backslashing conventions in surrounding cultures. Hence you must write an explicit whitespace match some other way, such as with quotes or with a \x20 or \c32 escape. On the other hand, while an unspace can start with \# in normal code, \# within a regex is specifically allowed, and is not taken as unspace, but matches a literal U+0023 (NUMBER SIGN). (Within a character class, you may also escape whitespace with a backslash; the restriction on unspace applies only at the normal pattern-matching level.)

The current test is a bit too concise but seems to proceed in the same spirit:

# \# okay within a regex
ok '#' ~~ /\#/, 'Unspace restriction in regex does not apply to \#';

Would you consider re-opening this bug?

@Altai-man
Copy link
Member

To be honest it is painful to see people spending efforts on something so tiny from a wider programming perspective that most would not really notice the issue, do the '#' thing and continue to write software, but I am also nobody to tell people how to volunteer their free time, so sure thing, enjoy digging in if you wish. :)

@Altai-man Altai-man reopened this Aug 31, 2021
@polettix
Copy link

I feel your pain. I hit a problem and looked around, to the point of opening a documentation issue after reading this very thread. It turned out that the real intention was originally in agreement with the docs, so there's understandable resistance to change them.

I'm fine with either a change in the code or a change in the spec/tests/docs, whatever is deemed "better" -- including the wider programming perspective. This requires a design decision though. Do you think that it would be better to open an Issue in the Roast repository, adding more tests to better "spec" this behavior?

In the meantime, other people are likely to trip over this 🤷

@codesections
Copy link
Collaborator

To be honest it is painful to see people spending efforts on something so tiny from a wider programming perspective that most would not really notice the issue, do the '#' thing and continue to write software,

Sorry to cause you pain 😁 But, imo, "make it → make it right → make it fast" is a good progression to stick to, and we're mostly on the "make it fast" part for Raku – which makes fixing the few remaining correctness bugs worthwhile. Plus, I really like that Raku usually doesn't make me remember inconsistancies like "you can escape non-alphanumeric characters with \ – oh, except for #; # is special".

But in any event, I have a fix now, so I don't think any of us will need to spend much more time figuring this out :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regex Regular expressions, pattern matching, user-defined grammars, tokens and rules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants