Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrongly parsed double-quoted .lex names, .lex "foo\\o" #1095

Closed
rurban opened this issue Oct 8, 2014 · 6 comments
Closed

wrongly parsed double-quoted .lex names, .lex "foo\\o" #1095

rurban opened this issue Oct 8, 2014 · 6 comments
Assignees
Milestone

Comments

@rurban
Copy link
Member

rurban commented Oct 8, 2014

Needed by perl6: https://rt.perl.org/Public/Bug/Display.html?id=116643
Reportedly this worked in perl6-p before, but it could be a nqp change also, which put the statements into blocks, and thus the variables into lexicals.

The perl6 problem is with: .lex "&prefix:<\\o/>", $P103
and it works with .lex "&prefix:<\o/>", $P103 and .lex '&prefix:<\o/>', $P103.
Most pir strings are not properly unescaped, ' and " are treated just as ',
only when they are put into a register or with some other special cases (mk_sub_address_fromc)
the " is unescaped.
So the perl6 workaround is to put such lexicals in single-quotes until rurban/lexqnames-gh1095 has landed.

parrot:
In these roundtrips only the first 2 names survive:

.sub 'main' :main
    .lex 'bar\o', $P0
    $P1 = box 'ok 1'
    store_lex 'bar\o', $P1
    $P2 = find_lex 'bar\o'
    say $P2

    .lex "foo\\o", $P3       # imcc parses that as "foo\\\\o"
    $P1 = box 'ok 2'
    store_lex "foo\\o", $P1
    $P2 = find_lex "foo\\o"
    say $P2

    .lex "foo\o", $P4        # imcc parses that as "foo\\o"
    $P1 = box 'ok 3'
    store_lex "foo\o", $P1   # parrot str_unescape compresses that to "fooo"
    $P2 = find_lex "foo\o"
    say $P2
.end

=>

ok 1
ok 2
Lexical 'fooo' not found

For globals it is simplier, but the problem of ignoring Illegal escape sequences remains:

    $S1 = "foo\\o"
    $P1 = box 'ok 2'
    set_global "foo\\o", $P1   # ok, parsed as "foo\\o"
    $P2 = get_global "foo\\o"
    say $P2

    $S2 = "foo\o"
    $P1 = box 'ok 3'
    $S3 = "fooo"
    $P2 = box 'ok 4'
    set_global "foo\o", $P1    # wrong, compressed to "fooo" (Ignored Illegal escape sequence \o)
    $P3 = get_global "foo\o"
    say $P3

    $P3 = get_global "fooo"   # accesses $P1, not $P2
    say $P3
@rurban rurban self-assigned this Oct 8, 2014
rurban pushed a commit that referenced this issue Oct 8, 2014
in perl6 the correctly double-quoted .lex "foo\\o", $P3 name fails to work.
see https://rt.perl.org/Public/Bug/Display.html?id=116643
this might be caused by the switch from globals to lexicals, as this statement
is now enclosed in a block.

in parrot the binary character \0 causes problems in lexnames with roundtrips.
rurban pushed a commit that referenced this issue Oct 8, 2014
For GH #1095

global names do work fine, only lexicals not.
Note I do not know how to reliable get the correct target register index for
declare_lex_preg in pure pir, and how to initialize it.
Setting it crashes the ctx.
rurban pushed a commit that referenced this issue Oct 8, 2014
in perl6 the correctly double-quoted .lex "foo\\o", $P3 name fails to work.
see https://rt.perl.org/Public/Bug/Display.html?id=116643
this might be caused by the switch from globals to lexicals, as this statement
is now enclosed in a block.

in parrot the binary character \0 causes problems in lexnames with roundtrips.
rurban pushed a commit that referenced this issue Oct 8, 2014
For GH #1095

global names do work fine, only lexicals not.
Note I do not know how to reliable get the correct target register index for
declare_lex_preg in pure pir, and how to initialize it.
Setting it crashes the ctx.
@rurban
Copy link
Member Author

rurban commented Oct 8, 2014

imcc: set_lexical() gets those wrongly quoted names from the parser

 .lex 'bar\o', $P0 => "bar\\o" # correct
 .lex "foo\\o", $P0 => "foo\\\\o" # wrong
 .lex "foo\o", $P0 => "foo\\o" # wrong

and this is also wrongly parsed:

store_lex "foo\o", P1 => "fooo" #wrong

rurban pushed a commit that referenced this issue Oct 8, 2014
@rurban rurban changed the title double-quoted .lex names, like .lex "foo\o", $P0 wrongly parsed quoted names, like .lex 'foo\o', .lex "foo\\o", set_global "foo\o" Oct 8, 2014
@rurban
Copy link
Member Author

rurban commented Oct 9, 2014

There's more nonsense going on:

  • A parsed name (a STRINGC) can be single or double-quoted, but both are treated as single-quoted (unescaped). But not when it's a
    • HLL: only " + unescaped
    • only mk_sub_address_fromc (sub, outer, sub_label_op_c) is correct, i.e. checks for " or ' and unescapes only "
    • loadlib only accepts " and unescapes
    • pmc_const accepts both, strips and treats both as unescaped (mk_pmc_const_named)
  • all other names are stored as constant with the surrounding quotes. I.e. you can specify 'str' or "str" but both are different strings then.
  • unknown escape sequences are ignored, e.g. "\o" will be "o", " will be " and more. I changed this with 940ede1 to die on unknown escape sequences for A-Za-z, but silently skip the \ with other chars, as before. To allow `", \``` and more. See GH Throw throw_illegal_escape, do not skip them #1103
  • empty strings cannot occur in imcc, as they are either "" or ''

The encoding and quoting is later resolved in IMCC_string_from__STRINGC and IMCC_string_from_reg with proper unescaping, but the constants are at this time already interned in the symbol table.

e.g. annotated t/compilers/imcc/syn/clash_15.pir (added some tracings):

#    mk_const ''bar\o''
#    mk_const ''ok 1''
#    mk_const ''bar\o''
#    mk_const '"foo\\o"'
#    mk_const ''ok 2''
#    mk_const '"foo\\o"'
#    mk_const '"foo\\o"'
#    mk_const '"foo\o"'
#    mk_const ''ok 3''
#    mk_const '"fooo"'
#    mk_const ''ok 4''
#    mk_const '"foo\o"'
#    mk_const '"foo\o"'
#    mk_const '"fooo"'
#    mk_const '"()"'
0000 set S0, "bar\\o"
0003 box P0, "ok 1"
0006 set_global S0, P0
0009 get_global P1, "bar\\o"
000c say P1
ok 1
000e set S1, "foo\\o"
0011 box P0, "ok 2"
0014 set_global "foo\\o", P0
0017 get_global P1, "foo\\o"
001a say P1
ok 2
001c set S2, "fooo"
001f box P0, "ok 3"
0022 set S3, "fooo"
0025 box P1, "ok 4"
0028 set_global "fooo", P0
002b get_global P2, "fooo"
002e say P2
ok 3
0030 get_global P2, "fooo"
0033 say P2
ok 3
0035 set_returns PC0

and the same for the .lex testcase clash_14.pir:

#    mk_const 'bar\o'
#    .lex 'bar\o'
#    mk_const ''ok 1''
#    mk_const ''bar\o''
#    mk_const ''bar\o''
#    mk_const 'foo\\o'
#    .lex 'foo\\o'
#    mk_const ''ok 2''
#    mk_const '"foo\\o"'
#    mk_const '"foo\\o"'
#    mk_const 'foo\o'
#    .lex 'foo\o'
#    mk_const ''ok 3''
#    mk_const '"foo\o"'
#    mk_const '"foo\o"'
#    mk_const '"()"'
0000 box P1, "ok 1"
0003 store_lex "bar\\o", P1
#    store_lex_sc_p 'bar\o'
0006 find_lex P2, "bar\\o"
#    lexpad.get 'bar\o'
#    find_lex_p_sc 'bar\o'
0009 say P2              
ok 1
000b box P1, "ok 2"      
000e store_lex "foo\\o", P1
#    store_lex_sc_p 'foo\o'
0011 find_lex P2, "foo\\o" 
#    lexpad.get 'foo\o'
#    find_lex_p_sc 'foo\o'
0014 say P2                
ok 2
0016 box P1, "ok 3"        
0019 store_lex "fooo", P1  
#    store_lex_sc_p 'fooo'
Lexical 'fooo' not found

@rurban rurban added this to the 7.0.0 milestone Oct 14, 2014
rurban pushed a commit that referenced this issue Oct 15, 2014
in perl6 the correctly double-quoted .lex "foo\\o", $P3 name fails to work.
see https://rt.perl.org/Public/Bug/Display.html?id=116643
this might be caused by the switch from globals to lexicals, as this statement
is now enclosed in a block.

in parrot the binary character \0 causes problems in lexnames with roundtrips.
rurban pushed a commit that referenced this issue Oct 15, 2014
For GH #1095

global names do work fine, only lexicals not.
Note I do not know how to reliable get the correct target register index for
declare_lex_preg in pure pir, and how to initialize it.
Setting it crashes the ctx.
rurban pushed a commit that referenced this issue Oct 15, 2014
rurban pushed a commit that referenced this issue Oct 15, 2014
in perl6 the correctly double-quoted .lex "foo\\o", $P3 name fails to work.
see https://rt.perl.org/Public/Bug/Display.html?id=116643
this might be caused by the switch from globals to lexicals, as this statement
is now enclosed in a block.

in parrot the binary character \0 causes problems in lexnames with roundtrips.
rurban pushed a commit that referenced this issue Oct 15, 2014
For GH #1095

global names do work fine, only lexicals not.
Note I do not know how to reliable get the correct target register index for
declare_lex_preg in pure pir, and how to initialize it.
Setting it crashes the ctx.
rurban pushed a commit that referenced this issue Oct 15, 2014
rurban pushed a commit that referenced this issue Oct 16, 2014
in perl6 the correctly double-quoted .lex "foo\\o", $P3 name fails to work.
see https://rt.perl.org/Public/Bug/Display.html?id=116643
this might be caused by the switch from globals to lexicals, as this statement
is now enclosed in a block.

in parrot the binary character \0 causes problems in lexnames with roundtrips.
rurban pushed a commit that referenced this issue Oct 16, 2014
For GH #1095

global names do work fine, only lexicals not.
Note I do not know how to reliable get the correct target register index for
declare_lex_preg in pure pir, and how to initialize it.
Setting it crashes the ctx.
rurban pushed a commit that referenced this issue Oct 16, 2014
rurban pushed a commit that referenced this issue Oct 16, 2014
For GH #1095
global names do work fine, only lexicals not.
Note I do not know how to reliable get the correct target register index for
declare_lex_preg in pure pir, and how to initialize it.
Setting it crashes the ctx.
@rurban
Copy link
Member Author

rurban commented Oct 17, 2014

Clarification, as the fix will need some time: The workaround for perl6 and other HLLs is to use single-quotes for .lex names. Single quotes are preferred, only some special ops require double-quotes: HLL and loadlib.

@rurban rurban modified the milestones: 6.10.0, 7.0.0 Oct 20, 2014
rurban pushed a commit that referenced this issue Oct 24, 2014
in perl6 the correctly double-quoted .lex "foo\\o", $P3 name fails to work.
see https://rt.perl.org/Public/Bug/Display.html?id=116643
this might be caused by the switch from globals to lexicals, as this statement
is now enclosed in a block.

in parrot the binary character \0 causes problems in lexnames with roundtrips.
rurban pushed a commit that referenced this issue Oct 24, 2014
For GH #1095

global names do work fine, only lexicals not.
Note I do not know how to reliable get the correct target register index for
declare_lex_preg in pure pir, and how to initialize it.
Setting it crashes the ctx.
rurban pushed a commit that referenced this issue Oct 24, 2014
rurban pushed a commit that referenced this issue Oct 25, 2014
Handle GH #1095 .lex corner cases with double-quotes.
Do not use it until refactored.

Add a new -d2 flag to print mk_const strings, useful to find
about quoting and escaping, or not. Was used for LEXER some years ago.
rurban pushed a commit that referenced this issue Oct 25, 2014
rurban pushed a commit that referenced this issue Oct 25, 2014
Handle GH #1095 .lex corner cases with double-quotes.
Do not use it until refactored.

Add a new -d2 flag to print mk_const strings, useful to find
about quoting and escaping, or not. Was used for LEXER some years ago.
rurban pushed a commit that referenced this issue Oct 25, 2014
@rurban
Copy link
Member Author

rurban commented Oct 25, 2014

Merged a documentation warning into 6.10.0:

Limitation: For now use only single-quotes for lexical variable
names! Double-quoted lexical names in .lex are treated as already
escaped, i.e. only as single-quoted and are not unescaped.

@rurban rurban changed the title wrongly parsed quoted names, like .lex 'foo\o', .lex "foo\\o", set_global "foo\o" wrongly parsed double-quoted .lex names, .lex "foo\\o" Oct 25, 2014
rurban pushed a commit that referenced this issue Oct 26, 2014
@rurban
Copy link
Member Author

rurban commented Nov 11, 2014

Took the easy road, and just unescaped double-quoted .lex string constants for now in smoke-me/lexqnames-gh1095

@rurban rurban closed this as completed in 7bc5452 Nov 11, 2014
@rurban
Copy link
Member Author

rurban commented Nov 11, 2014

smoked fine on nqp and perl6 also.

rurban pushed a commit that referenced this issue Nov 17, 2014
For GH #1095

global names do work fine, only lexicals not.
Note I do not know how to reliable get the correct target register index for
declare_lex_preg in pure pir, and how to initialize it.
Setting it crashes the ctx.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant