Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charnames and \p or \N regex SEGV in SWASHNEW >5.22 #273

Closed
toddr opened this issue Oct 21, 2015 · 13 comments
Closed

charnames and \p or \N regex SEGV in SWASHNEW >5.22 #273

toddr opened this issue Oct 21, 2015 · 13 comments

Comments

@toddr
Copy link
Collaborator

toddr commented Oct 21, 2015

I cannot simplify this code much more than this but I think the code I have shows the problem sufficiently:

package _charnames;

sub foo {
    ($name =~ /^(\p{_Perl_Charname_Begin})/) and return;
}

print "ok\n";

If you change the package name to anything else or remove \p from the regex, the program will not fail.

@toddr
Copy link
Collaborator Author

toddr commented Oct 21, 2015

What I think is happening is that perl tries to autoload some modules when it sees \p in a regex. The fact that _charnames is already there causes it confusion and thus the segfault. While this is a very convoluted example, simply using the module charnames qw/:full/ is enough to set this problem off.

Probably we need some sort of special handling during compile to handle this properly.

What's interesting is that \p does not seem to trigger _charnames

$>perl -e'my $regex = qr/\p{foo}/; print join("\n", sort keys %INC)' 
XSLoader.pm
re.pm
strict.pm
unicore/Heavy.pl
utf8.pm
utf8_heavy.pl

@toddr
Copy link
Collaborator Author

toddr commented Oct 21, 2015

Also note, this was triggered by changing the stash walk into a reverse sort rather than a sort in commit e33aec3. This probably means some special global when walked will trigger the load of certain stashes (modules) automajically. It also means we probably bloated our binaries with the above modules because of it.

@toddr
Copy link
Collaborator Author

toddr commented Oct 21, 2015

We can probably just keep an eye on this until we have a practical case where this comes up.

@toddr toddr changed the title charnames and \p matching causes perlcc binaries to segfault. charnames and \p or \N matching causes perlcc binaries to segfault. Oct 29, 2015
@toddr
Copy link
Collaborator Author

toddr commented Oct 29, 2015

This will cause a segfault:
perlcc -r -e'print "\N{KELVIN SIGN}"'

Unfortunately while not much on CPAN uses \p, much more uses \N. Locale::Maketext::Utils included.

@toddr
Copy link
Collaborator Author

toddr commented Oct 29, 2015

#266 was closed as a duplicate of this case

@toddr
Copy link
Collaborator Author

toddr commented Oct 29, 2015

./t/testc.sh 273

@rurban
Copy link
Owner

rurban commented Oct 29, 2015

SEGV in utf8 SWASHNEW:

(/Users/rurban/Perl/B-C/a.out:0)    Looking for method SWASHNEW in package utf8
(/Users/rurban/Perl/B-C/a.out:0)    nextstate
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:48)  padrange($class,$type,$list,$minbits,$none)
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:48)  aassign
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:48)  nextstate
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:49)  const(IV(0))
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:49)  padsv($user_defined)
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:49)  sassign
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:49)  nextstate
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:50)  gvsv(main::)  # $^D
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:50)  and
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:50)  const(IV(0))
(/usr/local/lib/perl5/5.22.0/utf8_heavy.pl:50)  gvsv(main::) # $^D

Program received signal SIGSEGV, Segmentation fault.
0x0000000100326135 in Perl_pp_multideref () at pp_hot.c:2106
2106                assert(isGV_with_GP(sv));

MDEREF_HV_gvhv_helem:
sv = UNOP_AUX_item_sv(++items);

(gdb) bt
#0  0x0000000100326135 in Perl_pp_multideref () at pp_hot.c:2106
#1  0x00000001002bb794 in Perl_runops_debug () at dump.c:2234
#2  0x0000000100407a37 in S_docatch (o=0x102012d20) at pp_ctl.c:3237
#3  0x0000000100410fa5 in Perl_pp_require () at pp_ctl.c:4177
4177        op = DOCATCH(PL_eval_start);
(gdb) p unixname
$6 = 0x1009952e0 "unicore/Heavy.pl"

sub SWASHNEW {
        my ($class, $type, $list, $minbits, $none) = @_;
        my $user_defined = 0;
        local $^D = 0 if $^D;

@toddr
Copy link
Collaborator Author

toddr commented Oct 29, 2015

@rurban If this is a method bug, I might refer you to #298 first?

@rurban rurban changed the title charnames and \p or \N matching causes perlcc binaries to segfault. charnames and \p or \N regex SEGV in SWASHNEW >5.18 Oct 30, 2015
@rurban rurban changed the title charnames and \p or \N regex SEGV in SWASHNEW >5.18 charnames and \p or \N regex SEGV in SWASHNEW >5.22 Oct 30, 2015
@rurban
Copy link
Owner

rurban commented Nov 2, 2015

No, it's not a method bug related to #298. It's a missing GV in a multideref.

@atoomic
Copy link
Collaborator

atoomic commented Nov 3, 2015

Looks like this is coming from the PL_hintgv once referenced in one UNOP_AUX list

idea:

diff --git a/lib/B/C/OverLoad/B/UNOP_AUX.pm b/lib/B/C/OverLoad/B/UNOP_AUX.pm
index d438139..36bc26e 100644
--- a/lib/B/C/OverLoad/B/UNOP_AUX.pm
+++ b/lib/B/C/OverLoad/B/UNOP_AUX.pm
@@ -80,9 +80,12 @@ sub save {
                 }
             }
             else {
-                # gv or other late inits
-                $s .= "\t,{.sv=Nullsv} \t/* $itemsym */\n";
-                init2()->add("unopaux_item${ix}[$i].sv = (SV*)$itemsym;");
+                if ( $itemsym =~ qr{^PL_} ) {
+                    $s .= "\t,{.sv=((SV* const) \&$itemsym)} \t/* $itemsym */\n";
+                } else {
+                    $s .= "\t,{.sv=Nullsv} \t/* $itemsym */\n";
+                    init2()->add("unopaux_item${ix}[$i].sv = (SV*)$itemsym;");
+                }
             }
         }
         $i++;

gdb output

Program received signal SIGSEGV, Segmentation fault.
0x002eb816 in Perl_pp_multideref () at pp_hot.c:2106
2106                assert(isGV_with_GP(sv));

        case MDEREF_HV_gvhv_helem:                  /* $pkg{...} */
            sv = UNOP_AUX_item_sv(++items);
            assert(isGV_with_GP(sv));
            sv = (SV*)GvHVn((GV*)sv);
            goto do_HV_helem;

with items coming from

    UNOP_AUX_item *items = cUNOP_AUXx(PL_op)->op_aux;
    #define cUNOP_AUXx(o)   ((UNOP_AUX*)(o))


(gdb) p PL_op
$1 = (OP *) 0x8308bbc
(gdb) ((UNOP_AUX*) PL_op)->op_aux     
$2 = (UNOP_AUX_item *) 0x8290528
(gdb) p ((UNOP_AUX*) PL_op)->op_aux->uv
$3 = 109   
# 109 & 0xf => 1101 => 13 = MDEREF_HV_gvhv_helem
(gdb) p items->sv
$4 = (SV *) 0x0


(gdb) p ++items
$4 = (UNOP_AUX_item *) 0x8290538
(gdb) p items->sv
$5 = (SV *) 0x35

@atoomic
Copy link
Collaborator

atoomic commented Nov 3, 2015

Here is a sample of a UNOP_AUX where we can notice the problem,
the list contains 3 elements:
0 - action flag
1 - B::GV ( PL_hintgv in our case )
2 - B::PV

Looks like we should point to the second element (ix=1), the GV here and not the flag

rurban pushed a commit that referenced this issue Nov 3, 2015
This needs unop_aux deferred GV symbols being defined already.
Fixes GH #273, t/testc.sh 273 2731
@rurban
Copy link
Owner

rurban commented Nov 3, 2015

No, it's only a timing problem. The regexp compiler is called in init, but the symbol is only set at init2.
Testing commit 3b3e5b7
Author: Reini Urban rurban@cpanel.net
Date: Tue Nov 3 09:35:56 2015 +0100

C 1.52_13: Defer regex compilation to init2 for SWASHNEW

This needs unop_aux deferred GV symbols being defined already.
Fixes GH #273, t/testc.sh 273 2731

This was not enough, as dl_init already needs some compiled regex, split in load_file. See 9ebc799 and 82ba15b

So I had to add a new section $init1. This is complicated by the fact the calling utf8::SWASHNEW itself already needs some compiled regexp, which need to be initialized already, similar to XSLoader::load_file. Done in 6e456de

rurban pushed a commit that referenced this issue Nov 3, 2015
This needs unop_aux deferred GV symbols being defined already.
Fixes GH #273, t/testc.sh 273 2731

new deferred init1 section before dl_init
dl_init needs some regexes already setup, split in load_file

protect dlsym init2 code from being split.
move some init2 parts (after dl_init) to init or init1,
which might be needed for the regex compiler.

We have have to do both, immediate PMOP::save for simple
regex without needing SWASHNEW init, and deferred for
regex which do a SWASHNEW.
The problem is that some regex need to be present earlier, e.g.
XSLoader::load_file has a split, SWASHNEW itself uses some regex.
So we need to save them in $init, but need to defer others, esp.
those needed multideref GVs, such as in SWASHNEW to later.

Also pass down the B::OP::save $level argument, which was prev. ignored.
And accept some optional $fullname args, which are rarely used.

Also do now static init of op_pmreplstart
@rurban
Copy link
Owner

rurban commented Nov 3, 2015

Fixed with cbc7f1f

@rurban rurban closed this as completed Nov 3, 2015
rurban pushed a commit to cpanel/perl-compiler that referenced this issue Nov 4, 2015
This needs unop_aux deferred GV symbols being defined already.
Fixes GH rurban#273, t/testc.sh 273 2731

new deferred init1 section before dl_init
dl_init needs some regexes already setup, split in load_file

protect dlsym init2 code from being split.
move some init2 parts (after dl_init) to init or init1,
which might be needed for the regex compiler.

We have have to do both, immediate PMOP::save for simple
regex without needing SWASHNEW init, and deferred for
regex which do a SWASHNEW.
The problem is that some regex need to be present earlier, e.g.
XSLoader::load_file has a split, SWASHNEW itself uses some regex.
So we need to save them in $init, but need to defer others, esp.
those needed multideref GVs, such as in SWASHNEW to later.

Also pass down the B::OP::save $level argument, which was prev. ignored.
And accept some optional $fullname args, which are rarely used.

Also do now static init of op_pmreplstart

(cherry picked from commit cbc7f1f)
Signed-off-by: Nicolas Rochelemagne <rochelemagne@cpanel.net>
rurban pushed a commit that referenced this issue Nov 4, 2015
deferred regex compiler
rurban pushed a commit to cpanel/perl-compiler that referenced this issue Nov 4, 2015
This needs unop_aux deferred GV symbols being defined already.
Fixes GH rurban#273, t/testc.sh 273 2731

new deferred init1 section before dl_init
dl_init needs some regexes already setup, split in load_file

protect dlsym init2 code from being split.
move some init2 parts (after dl_init) to init or init1,
which might be needed for the regex compiler.

We have have to do both, immediate PMOP::save for simple
regex without needing SWASHNEW init, and deferred for
regex which do a SWASHNEW.
The problem is that some regex need to be present earlier, e.g.
XSLoader::load_file has a split, SWASHNEW itself uses some regex.
So we need to save them in $init, but need to defer others, esp.
those needed multideref GVs, such as in SWASHNEW to later.

Also pass down the B::OP::save $level argument, which was prev. ignored.
And accept some optional $fullname args, which are rarely used.

Also do now static init of op_pmreplstart

(cherry picked from commit cbc7f1f)
Signed-off-by: Nicolas Rochelemagne <rochelemagne@cpanel.net>

Author: Reini Urban <rurban@cpanel.net>
Date:   Tue Nov 3 23:30:33 2015 +0100

    C: fix uninitialized value $level warnings

(cherry picked from commit 9970b03)

adjust known_errors from swash branch
atoomic pushed a commit to cpanel/perl-compiler that referenced this issue Aug 9, 2016
This needs unop_aux deferred GV symbols being defined already.
Fixes GH rurban#273, t/testc.sh 273 2731

new deferred init1 section before dl_init
dl_init needs some regexes already setup, split in load_file

protect dlsym init2 code from being split.
move some init2 parts (after dl_init) to init or init1,
which might be needed for the regex compiler.

We have have to do both, immediate PMOP::save for simple
regex without needing SWASHNEW init, and deferred for
regex which do a SWASHNEW.
The problem is that some regex need to be present earlier, e.g.
XSLoader::load_file has a split, SWASHNEW itself uses some regex.
So we need to save them in $init, but need to defer others, esp.
those needed multideref GVs, such as in SWASHNEW to later.

Also pass down the B::OP::save $level argument, which was prev. ignored.
And accept some optional $fullname args, which are rarely used.

Also do now static init of op_pmreplstart

(cherry picked from commit cbc7f1f)
Signed-off-by: Nicolas Rochelemagne <rochelemagne@cpanel.net>

Author: Reini Urban <rurban@cpanel.net>
Date:   Tue Nov 3 23:30:33 2015 +0100

    C: fix uninitialized value $level warnings

(cherry picked from commit 9970b03)

adjust known_errors from swash branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants