Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tmarks should not be possible on character-set members: -[^'ab'] and ~[^'a'; -'b'] etc. #53

Closed
cmsmcq opened this issue Mar 15, 2022 · 7 comments
Assignees
Labels
bug Something isn't working specification
Milestone

Comments

@cmsmcq
Copy link
Contributor

cmsmcq commented Mar 15, 2022

The grammar of 2022-02-22 (like all of its predecessors, as far as I know) allows tmark both on literals and on inclusions and exclusions. When a literal occurs as a member of a character set, it is thus grammatical to write things like

-[^'ab']
~['a'; -'b'; ^'c']

It is not clear what such constructs might mean.

The simplest solution appears to be to replace literal with ^string in the right-hand side of the rule for member.

It might be thought that we could solve this problem by moving tmark into the rule for terminal from its current locations in the definitions of quoted, encoded, inclusion, and exclusion. (That is, that is what I thought.) This will not work because for tmark to be serialized as an attribute in the XML it must be a descendant of the nonterminal which produces the element.

Proposal: replace

  -member: literal;
           range;
           class.

with

  -member: ^string;
           range;
           class.
@ndw
Copy link
Contributor

ndw commented Mar 15, 2022

I believe that change would lose hex from within ranges.

@cmsmcq
Copy link
Contributor Author

cmsmcq commented Mar 15, 2022

The nonterminal hex is not currently reachable from range; hex-encoded range boundaries are reached via character:

        range: from, s, -"-", s, to, s.
        @from: character.
          @to: character.
   -character: -'"', dchar, -'"';
               -"'", schar, -"'";
               "#", hex.

@ndw
Copy link
Contributor

ndw commented Mar 15, 2022

That's only a hex that's part of a range. With your proposed change to member, this does not parse:

s: ["0"-"9"; #20] .

because range is "x - y" so I get:

<fail xmlns:ixml="http://invisiblexml.org/NS" ixml:state="failed">
   <line>1</line>
   <column>18</column>
   <pos>17</pos>
   <unexpected>]</unexpected>
   <permitted>'\n', '\r', '\t', '{', ["0"-"9";"a"-"f";"A"-"F"], [Zs]</permitted>
</fail>

@cmsmcq
Copy link
Contributor Author

cmsmcq commented Mar 15, 2022

You're right; sorry to be so slow on the uptake. I'll need to think about the right way to fix this.

@spemberton
Copy link
Member

spemberton commented Mar 15, 2022 via email

@cmsmcq cmsmcq changed the title Tmarks should be be possible on character-set members: -[^'ab'] and ~[^'a'; -'b'] etc. Tmarks should not be possible on character-set members: -[^'ab'] and ~[^'a'; -'b'] etc. Mar 21, 2022
@cmsmcq
Copy link
Contributor Author

cmsmcq commented Mar 21, 2022

On 22 March 2022, SP's proposal to resolve this issue was accepted. Issue to be closed once the change is in the published spec.

@ndw ndw added the bug Something isn't working label Apr 3, 2022
@ndw ndw added this to the Version 1.0 milestone Apr 3, 2022
@ndw
Copy link
Contributor

ndw commented Apr 5, 2022

Steven reports this is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working specification
Projects
None yet
Development

No branches or pull requests

3 participants