Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regcomp.c:18417: void S_skip_to_be_ignored_text(RExC_state_t *, char **, const _Bool): Assertion `! UTF || UTF8_IS_INVARIANT(**p) || UTF8_IS_START(**p)' failed. #16035

Closed
p5pRT opened this issue Jun 24, 2017 · 13 comments

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Jun 24, 2017

Migrated from rt.perl.org#131642 (status was 'resolved')

Searchable as RT131642$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 24, 2017

From @dur-randir

Created by @dur-randir

While fuzzing perl v5.27.1-37-g4c95ee9f29 built with afl and run
under libdislocator, I found the following program (text also attached
to this letter)

00000000 30 2c 73 70 6c 69 74 0d 70 61 63 6b 22 55 75 68 |0,split.pack"Uuh|
00000010 33 22 2c 30 2c 30 2c 27 30 6f 27 |3",0,0,'0o'|

to cause an assertion failure, even when run under -c for a syntax
check. This is a regression between 5.22 and 5.24, bisect points to​:

commit 361446f
Author​: Karl Williamson <khw@​cpan.org>
Date​: Wed Sep 23 14​:06​:23 2015 -0600

  Allow (#...) anywhere white space is under qr//x

  Wherever you can have white space under /x, you can also have a (#...)
  comment (even without /x). Prior to this commit, there were several
  places that allowed the white space but not the comments.

  This resolves [perl #116639].

GDB info about the crash location is​:

(gdb) bt
#0 __GI_raise (sig=sig@​entry=6) at ../sysdeps/unix/sysv/linux/raise.c​:51
#1 0x00007fa5483813fa in __GI_abort () at abort.c​:89
#2 0x00007fa548378e37 in __assert_fail_base (fmt=<optimized out>,
  assertion=assertion@​entry=0x55e2c166e0b0 "! UTF ||
UTF8_IS_INVARIANT(**p) || UTF8_IS_START(**p)",
file=file@​entry=0x55e2c1664f98 "regcomp.c",
  line=line@​entry=18417, function=function@​entry=0x55e2c166f9f0
<__PRETTY_FUNCTION__.19589> "S_skip_to_be_ignored_text") at
assert.c​:92
#3 0x00007fa548378ee2 in __GI___assert_fail (assertion=0x55e2c166e0b0
"! UTF || UTF8_IS_INVARIANT(**p) || UTF8_IS_START(**p)",
  file=0x55e2c1664f98 "regcomp.c", line=18417,
function=0x55e2c166f9f0 <__PRETTY_FUNCTION__.19589>
"S_skip_to_be_ignored_text") at assert.c​:101
#4 0x000055e2c13d5c8e in S_skip_to_be_ignored_text
(pRExC_state=0x7ffcc9ecea20, p=0x7ffcc9ecdd80, force_to_xmod=false) at
regcomp.c​:18417
#5 0x000055e2c13c14af in S_regatom (pRExC_state=0x7ffcc9ecea20,
flagp=0x7ffcc9ece050, depth=4) at regcomp.c​:13494
#6 0x000055e2c13b603b in S_regpiece (pRExC_state=0x7ffcc9ecea20,
flagp=0x7ffcc9ece17c, depth=3) at regcomp.c​:11668
#7 0x000055e2c13b5989 in S_regbranch (pRExC_state=0x7ffcc9ecea20,
flagp=0x7ffcc9ece228, first=1, depth=2) at regcomp.c​:11593
#8 0x000055e2c13b326c in S_reg (pRExC_state=0x7ffcc9ecea20, paren=0,
flagp=0x7ffcc9ece664, depth=1) at regcomp.c​:11331
#9 0x000055e2c139b859 in Perl_re_op_compile (patternp=0x0,
pat_count=1, expr=0x55e2c1f8a280, eng=0x55e2c1901540
<PL_core_reg_engine>, old_re=0x0,
  is_bare_re=0x0, orig_rx_flags=2048, pm_flags=2048) at regcomp.c​:7100
#10 0x000055e2c12b899d in Perl_pmruntime (o=0x55e2c1f8a150,
expr=0x55e2c1f8a280, repl=0x0, flags=2, floor=0) at op.c​:5882
#11 0x000055e2c12d67f2 in Perl_ck_split (o=0x55e2c1f7d120) at op.c​:11268
#12 0x000055e2c12b4f39 in Perl_op_convert_list (type=155, flags=0,
o=0x55e2c1f7d120) at op.c​:4889
#13 0x000055e2c136b09e in Perl_yyparse (gramtype=258) at perly.y​:889
#14 0x000055e2c1519108 in S_doeval_compile (gimme=1 '\001',
outside=0x55e2c1f52340, seq=4294967261, hh=0x0) at pp_ctl.c​:3456
#15 0x000055e2c1520b9a in Perl_pp_entereval () at pp_ctl.c​:4415
#16 0x000055e2c13f4a7d in Perl_runops_debug () at dump.c​:2451
#17 0x000055e2c12eab3d in S_run_body (oldscope=1) at perl.c​:2548
#18 0x000055e2c12ea0bb in perl_run (my_perl=0x55e2c1f50010) at perl.c​:2471
#19 0x000055e2c12a2f3e in main (argc=4, argv=0x7ffcc9ecf7f8,
env=0x7ffcc9ecf820) at perlmain.c​:123
(gdb) f 4
#4 0x000055e2c13d5c8e in S_skip_to_be_ignored_text
(pRExC_state=0x7ffcc9ecea20, p=0x7ffcc9ecdd80, force_to_xmod=false) at
regcomp.c​:18417
18417 assert( ! UTF || UTF8_IS_INVARIANT(**p) || UTF8_IS_START(**p));
(gdb) p **p
$1 = -92 '\244'

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.27.1:

Configured by root at Sun May 28 01:44:41 MSK 2017.

Summary of my perl5 (revision 5 version 26 subversion 0) configuration:
  Derived from: 4c95ee9f298c2edfc1382d540ff89288790e78b6
  Platform:
    osname=linux
    osvers=4.9.0-3-amd64
    archname=x86_64-linux
    uname='linux dorothy 4.9.0-3-amd64 #1 smp debian 4.9.25-1
(2017-05-02) x86_64 gnulinux '
    config_args='-des -Dusedevel -DDEBUGGING -Dcc=afl-clang-fast
-Doptimize=-O0 -g -ggdb3 -fno-omit-frame-pointer'
    hint=previous
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='afl-clang-fast'
    ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O0 -g -ggdb3 -fno-omit-frame-pointer'
    cppflags='-DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Clang 3.9.1 (tags/RELEASE_391/rc2)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='afl-clang-fast'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/llvm-3.9/bin/../lib/clang/3.9.1/lib
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu
/lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
/usr/local/lib /usr/lib/llvm-3.9/bin/../lib/clang/3.9.1/lib
/usr/include/x86_64-linux-gnu /usr/lib
    libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.24.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.24'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O0 -g -ggdb3 -fno-omit-frame-pointer
-L/usr/local/lib -fstack-protector-strong'

Locally applied patches:
    uncommitted-changes


@INC for perl 5.27.1:
    lib
    /usr/local/lib/perl5/site_perl/5.26.0/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.26.0
    /usr/local/lib/perl5/5.26.0/x86_64-linux
    /usr/local/lib/perl5/5.26.0


Environment for perl 5.27.1:
    HOME=/home/afl
    LANG=en_US.UTF-8
    LANGUAGE=en_US:en
    LC_CTYPE=en_US.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.24.1-dbg/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
    PERLBREW_BASHRC_VERSION=0.78
    PERLBREW_HOME=/home/afl/.perlbrew
    PERLBREW_MANPATH=/home/afl/perlbrew/perls/perl-5.24.1-dbg/man
    PERLBREW_PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.24.1-dbg/bin
    PERLBREW_PERL=perl-5.24.1-dbg
    PERLBREW_ROOT=/home/afl/perlbrew
    PERLBREW_VERSION=0.78
    PERL_BADLANG (unset)
    SHELL=/usr/bin/zsh

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 24, 2017

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Feb 7, 2018

From @khwilliamson

I looked at this and the assertion that's failing is valid. The problem is that pack is returning malformed UTF-8, which you can see if you add -Dr to the command line options. So I'm unsure how to proceed.
--
Karl Williamson

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Feb 7, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 2, 2018

From @khwilliamson

It seems to me that the right fix to this is to forbid pack from returning malformed UTF-8. When I change it to do that, various tests in our suite fail. All these look to be deliberate attempts to generate malformed UTF-8, and testing how this is handled, and they use pack to do that generating.

So, it's been known that you can use pack for this, and people have taken advantage of it. No /cpan tests rely on this. But I think we've gotten wise over the years about the perils of malformed UTF-8, and I think this is one that should be fixed.

I don't know if it is too late in the 5.28 development cycle to do so, however. But another way of looking at it, it is just in time to fix bugs that would otherwise occur in 5.28
--
Karl Williamson

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 2, 2018

From @cpansprout

On Sun, 01 Apr 2018 17​:13​:52 -0700, khw wrote​:

It seems to me that the right fix to this is to forbid pack from
returning malformed UTF-8. When I change it to do that, various tests
in our suite fail. All these look to be deliberate attempts to
generate malformed UTF-8, and testing how this is handled, and they
use pack to do that generating.

So, it's been known that you can use pack for this, and people have
taken advantage of it. No /cpan tests rely on this. But I think
we've gotten wise over the years about the perils of malformed UTF-8,
and I think this is one that should be fixed.

I too agree that it should be fixed.

I don't know if it is too late in the 5.28 development cycle to do so,
however. But another way of looking at it, it is just in time to fix
bugs that would otherwise occur in 5.28

I’m a little wary of doing it this close to a stable release if it’s possible people are relying on it.

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 2, 2018

From @xsawyerx

On 04/02/2018 04​:16 AM, Father Chrysostomos via RT wrote​:

On Sun, 01 Apr 2018 17​:13​:52 -0700, khw wrote​:

It seems to me that the right fix to this is to forbid pack from
returning malformed UTF-8. When I change it to do that, various tests
in our suite fail. All these look to be deliberate attempts to
generate malformed UTF-8, and testing how this is handled, and they
use pack to do that generating.

So, it's been known that you can use pack for this, and people have
taken advantage of it. No /cpan tests rely on this. But I think
we've gotten wise over the years about the perils of malformed UTF-8,
and I think this is one that should be fixed.
I too agree that it should be fixed.

I don't know if it is too late in the 5.28 development cycle to do so,
however. But another way of looking at it, it is just in time to fix
bugs that would otherwise occur in 5.28
I’m a little wary of doing it this close to a stable release if it’s possible people are relying on it.

Agreed. I would rather we save it for 5.29.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jul 2, 2018

From @khwilliamson

On Mon, 02 Apr 2018 12​:28​:46 -0700, xsawyerx@​gmail.com wrote​:

On 04/02/2018 04​:16 AM, Father Chrysostomos via RT wrote​:

On Sun, 01 Apr 2018 17​:13​:52 -0700, khw wrote​:

It seems to me that the right fix to this is to forbid pack from
returning malformed UTF-8. When I change it to do that, various
tests
in our suite fail. All these look to be deliberate attempts to
generate malformed UTF-8, and testing how this is handled, and they
use pack to do that generating.

So, it's been known that you can use pack for this, and people have
taken advantage of it. No /cpan tests rely on this. But I think
we've gotten wise over the years about the perils of malformed UTF-
8,
and I think this is one that should be fixed.
I too agree that it should be fixed.

I don't know if it is too late in the 5.28 development cycle to do
so,
however. But another way of looking at it, it is just in time to
fix
bugs that would otherwise occur in 5.28
I’m a little wary of doing it this close to a stable release if it’s
possible people are relying on it.

Agreed. I would rather we save it for 5.29.

I'm not sure what the best approach is. Attached is a patch that just croaks when the returned SV is UTF-8 and malformed. Another option would be to just turn off the UTF-8 flag. Should this use be deprecated instead?
--
Karl Williamson

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jul 2, 2018

From @khwilliamson

0001-Trial-patch-for-131642.patch
From d8f351f05faf4badc3b80de38390cfac14e56418 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sun, 1 Jul 2018 22:39:47 -0600
Subject: [PATCH] Trial patch for #131642

---
 pp_pack.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/pp_pack.c b/pp_pack.c
index 5e9cc64301..1eed6c4d79 100644
--- a/pp_pack.c
+++ b/pp_pack.c
@@ -3147,6 +3147,15 @@ PP(pp_pack)
 
     packlist(cat, pat, patend, MARK, SP + 1);
 
+    if (SvUTF8(cat)) {
+        STRLEN result_len;
+        const char * result = SvPV_nomg(cat, result_len);
+
+        if (! is_utf8_string((U8 *) result, result_len)) {
+            Perl_croak(aTHX_ "Malformed UTF-8 string returned from pack");
+        }
+    }
+
     SvSETMAGIC(cat);
     SP = ORIGMARK;
     PUSHs(cat);
-- 
2.17.1

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Mar 8, 2019

From @khwilliamson

Fixed by

commit fd879d9
Author​: Karl Williamson <khw@​cpan.org>
Date​: Sun Jul 1 22​:39​:47 2018 -0600

  PATCH​: [perl #131642] pack returning malformed UTF-8
 
  This patch causes pack to die rather than return malformed UTF-8. This
  protects the rest of the core from unexpectedly getting malformed
  inputs.

We'll see if this breaks cpan, with people relying on pcak to create malformed UTF-8. But my current inclination is they should be doing something else to generate it.
--
Karl Williamson

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Mar 8, 2019

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented May 22, 2019

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.30.0, this and 160 other issues have been
resolved.

Perl 5.30.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.30.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented May 22, 2019

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.