Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed in Perl_reg_numbered_buff_fetch, file regcomp.c, line 7459 #14081

Closed
p5pRT opened this issue Sep 10, 2014 · 50 comments

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Sep 10, 2014

Migrated from rt.perl.org#122747 (status was 'resolved')

Searchable as RT122747$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 10, 2014

From Mark.Martinec@ijs.si

Created by Mark.Martinec@ijs.si

Have been running 5.20.1-RC2 here under FreeBSD 10.0 for a couple of
days without a problem. The application is a mail content filter
(amavisd-new + SpamAssassin), which means that perl is in heavy use
in a complex situation, involving tainted variables and UTF-8
character strings.

Today one of the forked child process has crashed (SIGABRT)
due Assertion failed​:

  Assertion failed​:
  ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)),
  function Perl_reg_numbered_buff_fetch,
  file regcomp.c, line 7459.

Perl is build with gcc 4.8.4 20140828 with debugging enabled,
with -fstack-protector-strong and jmalloc memory protections
enabled (MALLOC_CONF="abort​:true,junk​:true,redzone​:true").

The extra safeguards were there just in case - to rule out
some potential cases of memory corruption, although this crash
does not seem to be related to memory corruption).

The coredump shows the following​:
  (some names (plain ascii) were replaced by xxx to preserve
  privacy,the number of characters was not changed)

# gdb /usr/local/bin/perl /var/coredumps/perl-97654.core
GNU gdb (GDB) 7.8 [GDB v7.8 for FreeBSD]
Copyright [...]
Reading symbols from /usr/local/bin/perl...done.
[New process 101359]
[New Thread 802006800 (LWP 101359)]
Core was generated by `perl'.
Program terminated with signal SIGABRT, Aborted.
#0 0x000000080171026a in thr_kill () from /lib/libc.so.7

(gdb) bt
#0 0x000000080171026a in thr_kill () from /lib/libc.so.7
#1 0x00000008017d7ac9 in abort () from /lib/libc.so.7
#2 0x00000008017bb0b1 in __assert () from /lib/libc.so.7
#3 0x0000000800940240 in Perl_reg_numbered_buff_fetch (r=0x817a602b8, paren=1, sv=0x8173c1180) at regcomp.c​:7459
#4 0x000000080099668a in Perl_magic_get (sv=0x8173c1180, mg=0x8174b2ed0) at mg.c​:805
#5 0x00000008009943a6 in Perl_mg_get (sv=0x8173c1180) at mg.c​:201
#6 0x0000000800a72e74 in Perl_save_scalar (gv=0x8021cd990) at scope.c​:219
#7 0x0000000800967e60 in Perl_save_re_context () at regcomp.c​:16475
#8 0x0000000800b2278c in Perl__core_swash_init (pkg=0x800bf9546 "utf8", name=0x800bf94ff "ToCf", listsv=0x800e273e0 <PL_sv_undef>, minbits=4, none=0,
  invlist=0x0, flags_p=0x0) at utf8.c​:2583
#9 0x0000000800b20cc2 in Perl_to_utf8_case (
  p=0x80c8a2f7e "\342\200\234Intelligence without ambition is a bird without wings.\342\200\235 -Salvador Dali Save a tree. Please don't print this e-mail unless it's really necessary\n", ustrp=0x7fffffffd070 " \345\235\a\b", lenp=0x7fffffffcaa8, swashp=0x800e27a68 <PL_utf8_tofold>, normal=0x800bf94ff "ToCf",
  special=0x800bf9212 "") at utf8.c​:2028
#10 0x0000000800b22024 in Perl__to_utf8_fold_flags (
  p=0x80c8a2f7e "\342\200\234Intelligence without ambition is a bird without wings.\342\200\235 -Salvador Dali Save a tree. Please don't print this e-mail unless it's really necessary\n", ustrp=0x7fffffffd070 " \345\235\a\b", lenp=0x7fffffffcaa8, flags=2 '\002') at utf8.c​:2397
#11 0x0000000800b0aa5a in S_regmatch (reginfo=0x7fffffffd350,
  startpos=0x80c8a2f63 "xxxxx.xxxxxxxx@​outlook.com \342\200\234Intelligence without ambition is a bird without wings.\342\200\235 -Salvador Dali Save a tree. Please don't print this e-mail unless it's really necessary\n", prog=0x81811d030) at regexec.c​:4207
#12 0x0000000800b05846 in S_regtry (reginfo=0x7fffffffd350, startposp=0x7fffffffd1b8) at regexec.c​:3200
#13 0x0000000800b051d5 in Perl_regexec_flags (rx=0x817a602b8,
  stringarg=0x80c8a2e00 "-- _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Xxxxx Xxxxxxxx University of Ljubljana Faculty of Natural Sciences and Engineering Department of Geology A\302\271ker\303\250eva xx or Xxxxxx xx SI-1000 Ljubljana Slovenia tel.​:"...,
  strend=0x80c8a2f70 "k@​outlook.com \342\200\234Intelligence without ambition is a bird without wings.\342\200\235 -Salvador Dali Save a tree. Please don't print this e-mail unless it's really necessary\n",
  strbeg=0x80c8a2e00 "-- _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Xxxxx Xxxxxxxx University of Ljubljana Faculty of Natural Sciences and Engineering Department of Geology A\302\271ker\303\250eva xx or Xxxxxx xx SI-1000 Ljubljana Slovenia tel.​:"..., minend=0, sv=0x8178d82e8, data=0x0, flags=1) at regexec.c​:3058
#14 0x00000008009dc8e1 in Perl_pp_subst () at pp_hot.c​:2130
#15 0x00000008009801c0 in Perl_runops_debug () at dump.c​:2427
#16 0x00000008008938d9 in S_run_body (oldscope=1) at perl.c​:2451
#17 0x0000000800892de7 in perl_run (my_perl=0x802020048) at perl.c​:2372
#18 0x000000000040100c in main (argc=4, argv=0x7fffffffd858, env=0x7fffffffd880) at perlmain.c​:114

This happened during processing of the first MIME part (a rather
short plain text part, ISO-8859-2, 8bit) of an otherwise rather large
mail message with attachment.

The crash occurs within SpamAssassin code (the last debug
log from SpamAssassin was​: SA dbg​: FreeMail​: From address​: ...),
although I can't reproduce the failure when spamassassin is
run from a command line - it only happens (reproducibly) when
the SpamAssassin perl module is spawned from amavisd and given
this particular mail message.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.20.1:

Configured by mark at Mon Sep  8 18:40:33 CEST 2014.

Summary of my perl5 (revision 5 version 20 subversion 1) configuration:
   
  Platform:
    osname=freebsd, osvers=10.0-release-p7, archname=amd64-freebsd
    uname='freebsd dorothy.ijs.si 10.0-release-p7 freebsd 10.0-release-p7 #0: tue jul 8 06:37:44 utc 2014 root@amd64-builder.daemonology.net:usrobjusrsrcsysgeneric amd64 '
    config_args='-sde -Dprefix=/usr/local -Darchlib=/usr/local/lib/perl5/5.20/mach -Dprivlib=/usr/local/lib/perl5/5.20 -Dman3dir=/usr/local/lib/perl5/5.20/perl/man/man3 -Dman1dir=/usr/local/man/man1 -Dsitearch=/usr/local/lib/perl5/site_perl/5.20/mach -Dsitelib=/usr/local/lib/perl5/site_perl/5.20 -Dscriptdir=/usr/local/bin -Dsiteman3dir=/usr/local/lib/perl5/5.20/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Ui_malloc -Ui_iconv -Uinstallusrbinperl -Dcc=gcc48 -Duseshrplib -Dinc_version_list=none -Dccflags=-DAPPLLIB_EXP="/usr/local/lib/perl5/5.20/BSDPAN" -Doptimize=-g -fno-omit-frame-pointer -fstack-protector-strong -DDEBUGGING -Ui_gdbm -Duse64bitint -Dusethreads=n -Dusemymalloc=n'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc48', ccflags ='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.20/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
    optimize='-g -fno-omit-frame-pointer -fstack-protector-strong',
    cppflags='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.20/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.8.4 20140828 (prerelease)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc48', ldflags ='-pthread -Wl,-E  -fstack-protector -L/usr/local/lib'
    libpth=/usr/lib /usr/local/lib /usr/local/lib /usr/local/lib/gcc48/gcc/x86_64-portbld-freebsd10.0/4.8.4/include-fixed /usr/lib
    libs=-lgdbm -lm -lcrypt -lutil
    perllibs=-lm -lcrypt -lutil
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='  -Wl,-R/usr/local/lib/perl5/5.20/mach/CORE'
    cccdlflags='-DPIC -fPIC', lddlflags='-shared  -L/usr/local/lib -fstack-protector'

Locally applied patches:
    RC2


@INC for perl 5.20.1:
    /usr/local/lib/perl5/5.20/BSDPAN
    /usr/local/lib/perl5/site_perl/5.20/mach
    /usr/local/lib/perl5/site_perl/5.20
    /usr/local/lib/perl5/5.20/mach
    /usr/local/lib/perl5/5.20
    .


Environment for perl 5.20.1:
    HOME=/root
    LANG (unset)
    LANGUAGE (unset)
    LC_ALL=en_US.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/root/bin:/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/bash

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 10, 2014

From Mark.Martinec@ijs.si

This happened during processing of the first MIME part (a rather
short plain text part, ISO-8859-2, 8bit) of an otherwise rather large
mail message with attachment.

The crash occurs within SpamAssassin code (the last debug
log from SpamAssassin was​: SA dbg​: FreeMail​: From address​: ...),
although I can't reproduce the failure when spamassassin is
run from a command line [...]

Made some progress in narrowing this down, can reproduce it
now reliably by running spamassassin from a command line.

The crash involves a s/// operator with a horribly complicated
regexp (not utf8, not tainted), and a string (utf8, tainted).

Will try to narrow it down further...

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 10, 2014

From @khwilliamson

On 09/10/2014 10​:23 AM, Mark Martinec wrote​:

This happened during processing of the first MIME part (a rather
short plain text part, ISO-8859-2, 8bit) of an otherwise rather large
mail message with attachment.

The crash occurs within SpamAssassin code (the last debug
log from SpamAssassin was​: SA dbg​: FreeMail​: From address​: ...),
although I can't reproduce the failure when spamassassin is
run from a command line [...]

Made some progress in narrowing this down, can reproduce it
now reliably by running spamassassin from a command line.

The crash involves a s/// operator with a horribly complicated
regexp (not utf8, not tainted), and a string (utf8, tainted).

Will try to narrow it down further...

That would be helpful. One perhaps easy option is to try it with the
string untainted. If that fixes the problem, it will really narrow down
the possible causes. (But I kinda doubt that will have an effect.)

Something else is to run it with valgrind. This may well be the result
of a wild write or read.

Perhaps this info will aid you in the narrowing. The core dump
indicates it is in the middle of a pattern match and is trying to match
a string with the Dali quote.. The pattern match has been made into a
'trie'. The actual position in the utf8 string where the error occurs
is shown in octal in the dump. It resolves to LEFT DOUBLE QUOTATION
MARK U+201C. That all looks ok so far. The match is supposed to be
case-insensitive, and it is the first time in the program's execution
that it has found a caseless match that doesn't have the rules for it
coded in. So it has to go out to disk to read in those rules. It
saves and restores the state around this fetch, using
Perl_save_re_context() to do the save. The assertion fails during the
course of the save.

Here is the comment at the beginning of Perl_save_re_context()​:
/* XXX Here's a total kludge. But we need to re-enter for swash
routines. */

That indicates what we're up against. It is failing because of a
problem with the capturing buffers in the pattern (The things that
parentheses enclose). If nothing else, you could send us the s/// text,
to compile here and eyeball for issues.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 10, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 10, 2014

From Mark.Martinec@ijs.si

Thanks Karl for a quick response!

Got it down to a sensible size, I'm sure it can be reduced further,
but I left the regexp in its original form for now (from SpamAssassin).

I realized it also fails on perl 5.20.0 build with clang.

The key seems to be in​:

  use re 'taint';

...removing that avoids the crash.

Here is now the test program​:

#!/usr/bin/perl

use strict;
use re 'taint';

my $tlds = qr/
(?​:X(?​:N--(?​:MGB(?​:A(?​:(?​:3A4F16|YH7GP)A|AM7A8H|B2BD)|ERP4A5D4AR|C0A9AZCG|BH1A71E|X4CD0AB|9AWBF)|F(?​:IQ(?​:(?​:228C5H|S8|Z9)S|64B)|PCRJ9C3D|ZC2C9E2C)|C(?​:LCHC0EA0B2G2A9GCD|ZR(?​:694B|U2D)|G4BKI|1AVG)|X(?​:KC2(?​:DL3A5EE0H|AL3HYE2A)|HQ521B)|(?​:(?​:GEC|H2B)RJ9|Q9JYB4|90A3A)C|80A(?​:S(?​:EHDB|WG)|DXHKS|O21A)|N(?​:QV7F(?​:S00EMA)?|GBC5AZD)|3(?​:E0B707E|BST00M|DS443G)|KP(?​:R(?​:W13|Y57)D|UT3I)|Y(?​:FRO4I67O|GBI2AMMX)|6(?​:QQ986B3XL|FRZ82G)|I(?​:1B6B1A6A2E|O0A7I)|L(?​:GBBAT1AD8J|1ACC)|(?​:D1ACJ3|ZFR164)B|O(?​:GBPF8FL|3CW4H)|S(?​:9BRJ9C|ES554G)|4(?​:5BRJ9C|GBRIM)|J(?​:6W193G|1AMH)|55Q(?​:W42G|X5D)|P(?​:GBS0DH|1AI)|WGB(?​:H1C|L6A)|1QQW23A|RHQV96G|UNUP4Y|VHQUV)|XX|YZ)|C(?​:[CDFGKMNVWXZ]|O(?​:N(?​:S(?​:TRUCTION|ULTING)|(?​:TRACTOR|DO)S)|M(?​:P(?​:UTER|ANY)|MUNITY)?|(?​:L(?​:LEG|OGN)|FFE)E|O(?​:[LP]|KING)|UNTRY|DES)?|A(?​:R(?​:E(?​:ERS?)?|AVAN|DS)|(?​:NCERRESEARC|S)H|P(?​:ETOWN|ITAL)|T(?​:ERING)?|M(?​:ERA|P)|B)?|L(?​:(?​:EAN|OTH)ING|I(?​:NIC|CK)|AIMS|UB)?|R(?​:EDIT(?​:CARD)?|UISES)?|H(?​:RISTMAS|URCH|EAP)?|I(?​:T(?​:IC|Y))?|E(?​:NTER|R
N|O)|U(?
:ISINELLA)?|Y(?​:MRU)?)|S(?​:[BDGJKLMNRTVXZ]|U(?​:PP(?​:L(?​:IES|Y)|ORT)|R(?​:GERY|F)|ZUKI)?|O(?​:L(?​:UTIONS|AR)|FTWARE|CIAL|HU|Y)?|C(?​:[AB]|H(?​:MIDT|ULE)|OT)?|A(?​:ARLAND|RL)?|E(?​:RVICES|XY)?|H(?​:IKSHA|OES)?|P(?​:IEGEL|ACE)|I(?​:NGLES)?|Y(?​:STEMS)?)|B(?​:[BDFGHJSTVWY]|U(?​:ILD(?​:ERS)?|SINESS|ZZ)|A(?​:R(?​:GAINS)?|YERN)?|L(?​:ACK(?​:FRIDAY)?|UE)|E(?​:RLIN|ER|ST)?|I(?​:[DOZ]|KE)?|N(?​:PPARIBAS)?|O(?​:UTIQUE|O)?|R(?​:USSELS)?|MW?|ZH?)|M(?​:[CDGHKLMNPQRSTVWXYZ]|O(?​:(?​:RTGAG)?E|TORCYCLES|NASH|SCOW|BI|DA|V)?|A(?​:N(?​:AGEMENT|GO)|RKET(?​:ING)?|ISON)?|E(?​:(?​:LBOURN|M)E|DIA|ET|NU)?|I(?​:(?​:AM|N)I|L)|U(?​:SEUM)?)|P(?​:[EFGKMNSWY]|R(?​:O(?​:D(?​:UCTIONS)?|PERT(?​:IES|Y))?|AXI|ESS)?|H(?​:OTO(?​:GRAPHY|S)?|YSIO)?|A(?​:R(?​:T(?​:NER)?|I)S)?|I(?​:C(?​:TURE)?S|ZZA|NK)|L(?​:UMBING|ACE)?|(?​:OS)?T|UB)|G(?​:[DFGHNPQSTWY]|R(?​:A(?​:PHIC|TI)S|EEN|IPE)?|U(?​:I(?​:TARS|DE)|RU)?|L(?​:OB(?​:AL|O)|ASS)?|A(?​:L(?​:LERY)?)?|I(?​:FTS?|VES)?|M(?​:AIL|O)?|B(?​:IZ)?|E(?​:NT)?|O[PV])|A(?​:[DFLMNOQWZ]|C(?​:T(?​:IVE|OR)|COUNTANTS|ADEMY)?|U(?​:CTION|DIO|TOS)?|S(?​:SO
CIATES|I
A)?|R(?​:CHI|MY|PA)?|I(?​:RFORCE)?|T(?​:TORNEY)?|G(?​:ENCY)?|E(?​:RO)?|XA?)|F(?​:[JM]|I(?​:NANC(?​:IAL|E)|SH(?​:ING)?|TNESS)?|U(?​:RNITURE|TBOL|ND)|L(?​:IGHTS|ORIST)|O(?​:UNDATION|O)?|R(?​:OGANS|L)?|(?​:EEDBAC)?K|A(?​:IL|RM))|R(?​:E(?​:P(?​:UBLICAN|AIR|ORT)|(?​:CIPE|VIEW)S|S(?​:TAURAN)?T|N(?​:TALS)?|ALTOR|ISEN?|HAB|D)?|O(?​:CKS|DEO)?|I(?​:CH|O)|S(?​:VP)?|U(?​:HR)?|YUKYU|W)|D(?​:[JKMZ]|I(?​:(?​:SCOUN|E)T|RECT(?​:ORY)?|AMONDS|GITAL)|E(?​:NT(?​:IST|AL)|MOCRAT|GREE|ALS|SI)?|A(?​:[DY]|TING|NCE)|O(?​:MAINS)?|URBAN|NP)|T(?​:[CDFGHJKLMNPTVWZ]|O(?​:(?​:OL|Y)S|DAY|KYO|WN|P)?|R(?​:A(?​:INING|VEL|DE))?|A(?​:T(?​:TOO|AR)|X)|I(?​:ENDA|ROL|PS)|E(?​:CHNOLOGY|L))|E(?​:[CEGR]|N(?​:GINEER(?​:ING)?|TERPRISES)|X(?​:P(?​:OSED|ERT)|CHANGE)|(?​:QUIPMEN|A)?T|DU(?​:CATION)?|S(?​:TATE|Q)?|VENTS|MAIL|US?)|V(?​:[CGU]|E(?​:(?​:NTURE|GA)S|RSICHERUNG|T)?|O(?​:T(?​:[EO]|ING)|YAGE|DKA)|I(?​:(?​:AJE|LLA)S|SION)?|(?​:LAANDERE)?N|A(?​:CATIONS)?)|L(?​:[BCKRSVY]|I(?​:M(?​:ITED|O)|GHTING|FE|NK)?|A(?​:CAIXA|WYER|ND)?|O(?​:NDON|ANS|TTO)|U(?​:X(?​:URY|E))?|T(?​:DA)?|EASE|GBT)|H(?​:[KM
NRTU]|O(
?​:L(?​:DINGS|IDAY)|ST(?​:ING)?|[RU]SE|MES|W)|E(?​:(?​:ALTHCA)?RE|LP)|A(?​:MBURG|US)|I(?​:PHOP|V))|I(?​:[DELOQRST]|N(?​:[GK]|(?​:VESTMENT|DUSTRIE)S|T(?​:ERNATIONAL)?|S(?​:TITUT|UR)E|FO)?|M(?​:MO(?​:BILIEN)?)?)|W(?​:E(?​:B(?​:SITE|CAM)|D)|I(?​:LLIAMHILL|EN|KI)|A(?​:LES|TCH|NG)|(?​:ORK)?S|HOSWHO|T[CF]|F)|N(?​:[FLOPUZ]|E(?​:T(?​:WORK)?|USTAR|W)?|A(?​:GOYA|ME|VY)?|I(?​:NJA)?|R[AW]?|GO?|Y?C|HK)|K(?​:[EGHMPWYZ]|I(?​:TCHEN|WI|M)?|(?​:AUFE|OEL)?N|R(?​:E?D)?)|O(?​:(?​:KINAW|TSUK)A|RG(?​:ANIC)?|N[GL]|OO|VH|M)|J(?​:[MP]|O(?​:B(?​:URG|S))?|E(?​:TZT)?|UEGOS)|Y(?​:[ET]|O(?​:KOHAMA|UTUBE)|A(?​:CHTS|NDEX))|U(?​:[AGKSYZ]|N(?​:IVERSITY|O)|OL)|Q(?​:UEBEC|PON|A)|Z(?​:[AMW]|ONE))/ix;

my $email_regex = qr/
  (?=.{0,64}\@​) # limit userpart to 64 chars
(and speed up searching?)
  (?<![a-z0-9!#\$%&'*+\/=?^_`{|}-]) # start boundary
  ( # capture email
  [a-z0-9!#\$%&'*+\/=?^_`{|}
-]+ # no dot in beginning
  (?​:\.[a-z0-9!#\$%&'*+\/=?^_`{|}~-]+)* # no consecutive dots, no ending
dot
  \@​
  (?​:[a-z0-9](?​:[a-z0-9-]{0,59}[a-z0-9])?\.){1,4} # max 4x61 char parts
(should be enough?)
  ${tlds} # ends with valid tld
  )
  (?!(?​:[a-z0-9-]|\.[a-z0-9])) # make sure domain ends here
/xi;

my(@​body) = (
  "<mailto​:xxxx.xxxx\@​outlook.com>",
  "A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
 
s{<?(?<!mailto​:)${email_regex}(?​:>|\s{1,10}(?!(?​:fa(?​:x|csi)|tel|phone|e?-?mail))[a-z]{2,11}​:)}{
}gi;
}

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 10, 2014

From @khwilliamson

On 09/10/2014 11​:51 AM, Mark Martinec wrote​:

Thanks Karl for a quick response!

Got it down to a sensible size, I'm sure it can be reduced further,
but I left the regexp in its original form for now (from SpamAssassin).

I realized it also fails on perl 5.20.0 build with clang.

This is not a recent regression, as it fails back through at least 5.12.
Neither valgrind nor clang asan give any extra information.

I'm hoping someone with more expertise than I currently have in this
area will look at this.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From Mark.Martinec@ijs.si

Got it down to this small test program​:

#!/usr/bin/perl

use strict;
use re 'taint';

my(@​body) = (
  "<mailto​:xxxx.xxxx\@​outlook.com>",
  "A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
  s{ <? (?<!mailto​:) \b ( [a-z0-9.]+ \@​ \S+ ) \b
  (?​: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi;
}

perl 5.20.{0,1} :
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) +
i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7455.
Abort trap

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 15​:05, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

Got it down to this small test program​:

#!/usr/bin/perl

use strict;
use re 'taint';

my(@​body) = (
"<mailto​:xxxx.xxxx\@​outlook.com>",
"A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
s{ <? (?<!mailto​:) \b ( [a-z0-9.]+ \@​ \S+ ) \b
(?​: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi;
}

perl 5.20.{0,1} :
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)),
function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7455.
Abort trap

Are you sure you have the script right? Because that code never fetches
from a capture buffer and does not fail on bleadperl.

I added a "print $1" to the script​:

$ cat rt122747.t
#!/usr/bin/perl

use strict;
use re 'taint';

my(@​body) = (
  "<mailto​:xxxx.xxxx\@​outlook.com>",
  "A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
  s{ <? (?<!mailto​:) \b ( [a-z0-9.]+ \@​ \S+ ) \b
  (?​: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi;
  print "matched​: >>$1<<\n";
}
__END__

And here is what I get from blead​:

$ ./perl -Ilib -T rt122747.t
matched​: >>.xxxx@​outlook.com<<
matched​: >>.xxxx@​outlook.com<<

/me confused.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 10 September 2014 20​:55, Karl Williamson <public@​khwilliamson.com> wrote​:

On 09/10/2014 11​:51 AM, Mark Martinec wrote​:

Thanks Karl for a quick response!

Got it down to a sensible size, I'm sure it can be reduced further,
but I left the regexp in its original form for now (from SpamAssassin).

I realized it also fails on perl 5.20.0 build with clang.

This is not a recent regression, as it fails back through at least 5.12.
Neither valgrind nor clang asan give any extra information.

I'm hoping someone with more expertise than I currently have in this area
will look at this.

I cant reproduce it with bleadperl at all. Nor can I reproduce with 5.14.

So I am pretty confused here. What script did you use to determine it is
an old regression?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @Hugmeir

On Thu, Sep 11, 2014 at 4​:57 PM, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 15​:05, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

Got it down to this small test program​:

#!/usr/bin/perl

use strict;
use re 'taint';

my(@​body) = (
"<mailto​:xxxx.xxxx\@​outlook.com>",
"A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
s{ <? (?<!mailto​:) \b ( [a-z0-9.]+ \@​ \S+ ) \b
(?​: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi;
}

perl 5.20.{0,1} :
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)),
function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7455.
Abort trap

Are you sure you have the script right? Because that code never fetches from
a capture buffer and does not fail on bleadperl.

I added a "print $1" to the script​:

$ cat rt122747.t
#!/usr/bin/perl

use strict;
use re 'taint';

my(@​body) = (
"<mailto​:xxxx.xxxx\@​outlook.com>",
"A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
s{ <? (?<!mailto​:) \b ( [a-z0-9.]+ \@​ \S+ ) \b
(?​: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi;
print "matched​: >>$1<<\n";
}
__END__

And here is what I get from blead​:

$ ./perl -Ilib -T rt122747.t
matched​: >>.xxxx@​outlook.com<<
matched​: >>.xxxx@​outlook.com<<

/me confused.

Yves

I can reproduce this on 5.10-5.20 but only for debugging builds; maybe
you forgot a -DDEBUGGING?

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

And here is what I get from blead​:

$ ./perl -Ilib -T rt122747.t
matched​: >>.xxxx@​outlook.com<<
matched​: >>.xxxx@​outlook.com<<

/me confused.

Yves

I can reproduce this on 5.10-5.20 but only for debugging builds; maybe
you forgot a -DDEBUGGING?

Bah, thought I was on a DEBUGGING build, but I wasnt. Thanks Brian.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @cpansprout

On Thu Sep 11 06​:07​:03 2014, mmartinec wrote​:

Got it down to this small test program​:

#!/usr/bin/perl

use strict;
use re 'taint';

my(@​body) = (
"<mailto​:xxxx.xxxx\@​outlook.com>",
"A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
s{ <? (?<!mailto​:) \b ( [a-z0-9.]+ \@​ \S+ ) \b
(?​: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi;
}

perl 5.20.{0,1} :
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) +
i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7455.
Abort trap

I think what’s happening is that the kludge to localise $1, etc. is executed when the regexp is in an inconsistent state. rx->subbeg is referring to the string from the previous match ('<mailto​:xxxx.xxxx@​outlook.com>'), but the offsets for $1 extend beyond the end of the 30-character string​:

(gdb) p rx->offs[1]
$8 = {
  start = 12,
  end = 33,
  start_tmp = 12
}

A watchpoint on rx->offs shows that it gets swapped out here in regexec.c​:

2706 swap = prog->offs;
2707 /* do we need a save destructor here for eval dies? */
2708 Newxz(prog->offs, (prog->nparens + 1), regexp_paren_pair);
2709 DEBUG_BUFFERS_r(PerlIO_printf(Perl_debug_log,
2710 "rex=0x%"UVxf" saving offs​: orig=0x%"UVxf" new=0x%"UVxf"\n"

when the backtrace is like this​:

#0 Perl_regexec_flags (my_perl=0x100803200, rx=0x10082fdf8, stringarg=0x10060b658 "A¹kerèeva xxxx.xxxx@​outlook.com ”", strend=0x10060b67d "", strbeg=0x10060b658 "A¹kerèeva xxxx.xxxx@​outlook.com ”", minend=0, sv=0x1008063e8, data=0x0, flags=1) at regexec.c​:2709
#1 0x0000000100247f3f in Perl_pp_subst (my_perl=0x100803200) at pp_hot.c​:2120
#2 0x00000001001b847c in Perl_runops_debug (my_perl=0x100803200) at dump.c​:2231
#3 0x000000010000a8ea in S_run_body (my_perl=0x100803200, oldscope=1) at perl.c​:2416
#4 0x0000000100009905 in perl_run (my_perl=0x100803200) at perl.c​:2339
#5 0x0000000100072698 in main (argc=3, argv=0x7fff5fbffa78, env=0x7fff5fbffa98) at miniperlmain.c​:120

So the ordering of some of this stuff needs to be rethought.

A git bisect points me to this commit​:

commit 44a2ac7
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri Dec 29 22​:45​:51 2006 +0100

  Re​: [PATCH] Change implementation of %+ to use a proper tied hash interface and add support for %-
  Message-ID​: <9b18b3110612291245q792fe91cu69422d2b81bb4f0b@​mail.gmail.com>

But I think it’s a false positive.

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From Mark.Martinec@ijs.si

I can reproduce this on 5.10-5.20 but only for debugging builds.

Indeed, I'm using a -DDEBUGGING perl.

Are you sure you have the script right?

Yes.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 17​:36, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

I can reproduce this on 5.10-5.20 but only for debugging builds.

Indeed, I'm using a -DDEBUGGING perl.

Are you sure you have the script right?

Yes.

Hrm, well even on a DEBUGGING build I cannot replicate in blead.

Can you show me your perl -V and the output of MY version of your script?
(Attached)

Brian if you happen to have your Configure options handy i would appreciate
knowing what they are.

And again I find it very odd that a script which never reads a capture
buffer dies with this error.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

#!/usr/bin/perl

use strict;
use re 'taint';

my(@​body) = (
  "<mailto​:xxxx.xxxx\@​outlook.com>",
  "A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
  s{ <? (?<!mailto​:) \b ( [a-z0-9.]+ \@​ \S+ ) \b
  (?​: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi;
  print "matched​: >>$1<<\n";
}

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

#!/usr/bin/perl

use strict;
use re 'taint';

my $tlds = qr/
(?​:X(?​:N--(?​:MGB(?​:A(?​:(?​:3A4F16|YH7GP)A|AM7A8H|B2BD)|ERP4A5D4AR|C0A9AZCG|BH1A71E|X4CD0AB|9AWBF)|F(?​:IQ(?​:(?​:228C5H|S8|Z9)S|64B)|PCRJ9C3D|ZC2C9E2C)|C(?​:LCHC0EA0B2G2A9GCD|ZR(?​:694B|U2D)|G4BKI|1AVG)|X(?​:KC2(?​:DL3A5EE0H|AL3HYE2A)|HQ521B)|(?​:(?​:GEC|H2B)RJ9|Q9JYB4|90A3A)C|80A(?​:S(?​:EHDB|WG)|DXHKS|O21A)|N(?​:QV7F(?​:S00EMA)?|GBC5AZD)|3(?​:E0B707E|BST00M|DS443G)|KP(?​:R(?​:W13|Y57)D|UT3I)|Y(?​:FRO4I67O|GBI2AMMX)|6(?​:QQ986B3XL|FRZ82G)|I(?​:1B6B1A6A2E|O0A7I)|L(?​:GBBAT1AD8J|1ACC)|(?​:D1ACJ3|ZFR164)B|O(?​:GBPF8FL|3CW4H)|S(?​:9BRJ9C|ES554G)|4(?​:5BRJ9C|GBRIM)|J(?​:6W193G|1AMH)|55Q(?​:W42G|X5D)|P(?​:GBS0DH|1AI)|WGB(?​:H1C|L6A)|1QQW23A|RHQV96G|UNUP4Y|VHQUV)|XX|YZ)|C(?​:[CDFGKMNVWXZ]|O(?​:N(?​:S(?​:TRUCTION|ULTING)|(?​:TRACTOR|DO)S)|M(?​:P(?​:UTER|ANY)|MUNITY)?|(?​:L(?​:LEG|OGN)|FFE)E|O(?​:[LP]|KING)|UNTRY|DES)?|A(?​:R(?​:E(?​:ERS?)?|AVAN|DS)|(?​:NCERRESEARC|S)H|P(?​:ETOWN|ITAL)|T(?​:ERING)?|M(?​:ERA|P)|B)?|L(?​:(?​:EAN|OTH)ING|I(?​:NIC|CK)|AIMS|UB)?|R(?​:EDIT(?​:CARD)?|UISES)?|H(?​:RISTMAS|URCH|EAP)?|I(?​:T(?​:IC|Y))?|E(?​:NTER|RN|O)|U(?​:ISINELLA)?|Y(?​:MRU)?)|S(?​:[BDGJKLMNRTVXZ]|U(?​:PP(?​:L(?​:IES|Y)|ORT)|R(?​:GERY|F)|ZUKI)?|O(?​:L(?​:UTIONS|AR)|FTWARE|CIAL|HU|Y)?|C(?​:[AB]|H(?​:MIDT|ULE)|OT)?|A(?​:ARLAND|RL)?|E(?​:RVICES|XY)?|H(?​:IKSHA|OES)?|P(?​:IEGEL|ACE)|I(?​:NGLES)?|Y(?​:STEMS)?)|B(?​:[BDFGHJSTVWY]|U(?​:ILD(?​:ERS)?|SINESS|ZZ)|A(?​:R(?​:GAINS)?|YERN)?|L(?​:ACK(?​:FRIDAY)?|UE)|E(?​:RLIN|ER|ST)?|I(?​:[DOZ]|KE)?|N(?​:PPARIBAS)?|O(?​:UTIQUE|O)?|R(?​:USSELS)?|MW?|ZH?)|M(?​:[CDGHKLMNPQRSTVWXYZ]|O(?​:(?​:RTGAG)?E|TORCYCLES|NASH|SCOW|BI|DA|V)?|A(?​:N(?​:AGEMENT|GO)|RKET(?​:ING)?|ISON)?|E(?​:(?​:LBOURN|M)E|DIA|ET|NU)?|I(?​:(?​:AM|N)I|L)|U(?​:SEUM)?)|P(?​:[EFGKMNSWY]|R(?​:O(?​:D(?​:UCTIONS)?|PERT(?​:IES|Y))?|AXI|ESS)?|H(?​:OTO(?​:GRAPHY|S)?|YSIO)?|A(?​:R(?​:T(?​:NER)?|I)S)?|I(?​:C(?​:TURE)?S|ZZA|NK)|L(?​:UMBING|ACE)?|(?​:OS)?T|UB)|G(?​:[DFGHNPQSTWY]|R(?​:A(?​:PHIC|TI)S|EEN|IPE)?|U(?​:I(?​:TARS|DE)|RU)?|L(?​:OB(?​:AL|O)|ASS)?|A(?​:L(?​:LERY)?)?|I(?​:FTS?|VES)?|M(?​:AIL|O)?|B(?​:IZ)?|E(?​:NT)?|O[PV])|A(?​:[DFLMNOQWZ]|C(?​:T(?​:IVE|OR)|COUNTANTS|ADEMY)?|U(?​:CTION|DIO|TOS)?|S(?​:SOCIATES|IA)?|R(?​:CHI|MY|PA)?|I(?​:RFORCE)?|T(?​:TORNEY)?|G(?​:ENCY)?|E(?​:RO)?|XA?)|F(?​:[JM]|I(?​:NANC(?​:IAL|E)|SH(?​:ING)?|TNESS)?|U(?​:RNITURE|TBOL|ND)|L(?​:IGHTS|ORIST)|O(?​:UNDATION|O)?|R(?​:OGANS|L)?|(?​:EEDBAC)?K|A(?​:IL|RM))|R(?​:E(?​:P(?​:UBLICAN|AIR|ORT)|(?​:CIPE|VIEW)S|S(?​:TAURAN)?T|N(?​:TALS)?|ALTOR|ISEN?|HAB|D)?|O(?​:CKS|DEO)?|I(?​:CH|O)|S(?​:VP)?|U(?​:HR)?|YUKYU|W)|D(?​:[JKMZ]|I(?​:(?​:SCOUN|E)T|RECT(?​:ORY)?|AMONDS|GITAL)|E(?​:NT(?​:IST|AL)|MOCRAT|GREE|ALS|SI)?|A(?​:[DY]|TING|NCE)|O(?​:MAINS)?|URBAN|NP)|T(?​:[CDFGHJKLMNPTVWZ]|O(?​:(?​:OL|Y)S|DAY|KYO|WN|P)?|R(?​:A(?​:INING|VEL|DE))?|A(?​:T(?​:TOO|AR)|X)|I(?​:ENDA|ROL|PS)|E(?​:CHNOLOGY|L))|E(?​:[CEGR]|N(?​:GINEER(?​:ING)?|TERPRISES)|X(?​:P(?​:OSED|ERT)|CHANGE)|(?​:QUIPMEN|A)?T|DU(?​:CATION)?|S(?​:TATE|Q)?|VENTS|MAIL|US?)|V(?​:[CGU]|E(?​:(?​:NTURE|GA)S|RSICHERUNG|T)?|O(?​:T(?​:[EO]|ING)|YAGE|DKA)|I(?​:(?​:AJE|LLA)S|SION)?|(?​:LAANDERE)?N|A(?​:CATIONS)?)|L(?​:[BCKRSVY]|I(?​:M(?​:ITED|O)|GHTING|FE|NK)?|A(?​:CAIXA|WYER|ND)?|O(?​:NDON|ANS|TTO)|U(?​:X(?​:URY|E))?|T(?​:DA)?|EASE|GBT)|H(?​:[KMNRTU]|O(?​:L(?​:DINGS|IDAY)|ST(?​:ING)?|[RU]SE|MES|W)|E(?​:(?​:ALTHCA)?RE|LP)|A(?​:MBURG|US)|I(?​:PHOP|V))|I(?​:[DELOQRST]|N(?​:[GK]|(?​:VESTMENT|DUSTRIE)S|T(?​:ERNATIONAL)?|S(?​:TITUT|UR)E|FO)?|M(?​:MO(?​:BILIEN)?)?)|W(?​:E(?​:B(?​:SITE|CAM)|D)|I(?​:LLIAMHILL|EN|KI)|A(?​:LES|TCH|NG)|(?​:ORK)?S|HOSWHO|T[CF]|F)|N(?​:[FLOPUZ]|E(?​:T(?​:WORK)?|USTAR|W)?|A(?​:GOYA|ME|VY)?|I(?​:NJA)?|R[AW]?|GO?|Y?C|HK)|K(?​:[EGHMPWYZ]|I(?​:TCHEN|WI|M)?|(?​:AUFE|OEL)?N|R(?​:E?D)?)|O(?​:(?​:KINAW|TSUK)A|RG(?​:ANIC)?|N[GL]|OO|VH|M)|J(?​:[MP]|O(?​:B(?​:URG|S))?|E(?​:TZT)?|UEGOS)|Y(?​:[ET]|O(?​:KOHAMA|UTUBE)|A(?​:CHTS|NDEX))|U(?​:[AGKSYZ]|N(?​:IVERSITY|O)|OL)|Q(?​:UEBEC|PON|A)|Z(?​:[AMW]|ONE))/ix;

my $email_regex = qr/
  (?=.{0,64}\@​) # limit userpart to 64 chars (and speed up searching?)
  (?<![a-z0-9!#\$%&'*+\/=?^_`{|}-]) # start boundary
  ( # capture email
  [a-z0-9!#\$%&'*+\/=?^_`{|}
-]+ # no dot in beginning
  (?​:\.[a-z0-9!#\$%&'*+\/=?^_`{|}~-]+)* # no consecutive dots, no ending dot
  \@​
  (?​:[a-z0-9](?​:[a-z0-9-]{0,59}[a-z0-9])?\.){1,4} # max 4x61 char parts (should be enough?)
  ${tlds} # ends with valid tld
  )
  (?!(?​:[a-z0-9-]|\.[a-z0-9])) # make sure domain ends here
/xi;

my(@​body) = (
  "<mailto​:xxxx.xxxx\@​outlook.com>",
  "A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
  s{<?(?<!mailto​:)${email_regex}(?​:>|\s{1,10}(?!(?​:fa(?​:x|csi)|tel|phone|e?-?mail))[a-z]{2,11}​:)}{ }gi;
}

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From Mark.Martinec@ijs.si

Hrm, well even on a DEBUGGING build I cannot replicate in blead.

Can you show me your perl -V and the output of MY version of your
script?
(Attached)

$ ./rt122747.t
matched​: >>.xxxx@​outlook.com<<
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) +
i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7455.
Abort trap

Happens on two hosts, one has 5.20.1-RC2 as documented at the top of
this PR
(gcc48, FreeBSD10.0, -DDEBUGGING),

the other host is a 5.20.0, built with a clang 3.4.1 compiler,
also on a FreeBSD 10.0, as follows​:

$ perl -V
Summary of my perl5 (revision 5 version 20 subversion 0) configuration​:

  Platform​:
  osname=freebsd, osvers=10.0-release,
archname=amd64-freebsd-thread-multi
  uname='freebsd 10amd64-ws-default-job-03 10.0-release freebsd
10.0-release amd64 '
  config_args='-sde -Dprefix=/usr/local
-Darchlib=/usr/local/lib/perl5/5.20/mach
-Dprivlib=/usr/local/lib/perl5/5.20
-Dman3dir=/usr/local/lib/perl5/5.20/perl/man/man3
-Dman1dir=/usr/local/man/man1
-Dsitearch=/usr/local/lib/perl5/site_perl/5.20/mach
-Dsitelib=/usr/local/lib/perl5/site_perl/5.20 -Dscriptdir=/usr/local/bin
-Dsiteman3dir=/usr/local/lib/perl5/5.20/man/man3
-Dsiteman1dir=/usr/local/man/man1 -Ui_malloc -Ui_iconv
-Uinstallusrbinperl -Dcc=cc -Duseshrplib -Dinc_version_list=none
-Dccflags=-DAPPLLIB_EXP="/usr/local/lib/perl5/5.20/BSDPAN" -Doptimize=-g
-DDEBUGGING -Ui_gdbm -Duse64bitint -Dusethreads=y -Dusemymalloc=n'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.20/BSDPAN"
-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include',
  optimize='-g',
  cppflags='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.20/BSDPAN"
-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.2.1 Compatible FreeBSD Clang 3.3
(tags/RELEASE_33/final 183502)', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags ='-pthread -Wl,-E -fstack-protector
-L/usr/local/lib'
  libpth=/usr/lib /usr/local/lib /usr/include/clang/3.3 /usr/lib
  libs=-lm -lcrypt -lutil
  perllibs=-lm -lcrypt -lutil
  libc=, so=so, useshrplib=true, libperl=libperl.so
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='
-Wl,-R/usr/local/lib/perl5/5.20/mach/CORE'
  cccdlflags='-DPIC -fPIC', lddlflags='-shared -L/usr/local/lib
-fstack-protector'

Characteristics of this binary (from libperl)​:
  Compile-time options​: DEBUGGING HAS_TIMES MULTIPLICITY PERLIO_LAYERS
  PERL_DONT_CREATE_GVSV
  PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
  PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
  PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
  PERL_TRACK_MEMPOOL USE_64_BIT_ALL USE_64_BIT_INT
  USE_ITHREADS USE_LARGE_FILES USE_LOCALE
  USE_LOCALE_COLLATE USE_LOCALE_CTYPE
  USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF
  USE_REENTRANT_API
  Built under freebsd
  Compiled at Jun 16 2014 15​:12​:36
  @​INC​:
  /usr/local/lib/perl5/5.20/BSDPAN
  /usr/local/lib/perl5/site_perl/5.20/mach
  /usr/local/lib/perl5/site_perl/5.20
  /usr/local/lib/perl5/5.20/mach
  /usr/local/lib/perl5/5.20
  .

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 17​:30, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Thu Sep 11 06​:07​:03 2014, mmartinec wrote​:

Got it down to this small test program​:

#!/usr/bin/perl

use strict;
use re 'taint';

my(@​body) = (
"<mailto​:xxxx.xxxx\@​outlook.com>",
"A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
s{ <? (?<!mailto​:) \b ( [a-z0-9.]+ \@​ \S+ ) \b
(?​: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi;
}

perl 5.20.{0,1} :
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) +
i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7455.
Abort trap

I think what’s happening is that the kludge to localise $1, etc. is
executed when the regexp is in an inconsistent state. rx->subbeg is
referring to the string from the previous match ('<mailto​:
xxxx.xxxx@​outlook.com>'), but the offsets for $1 extend beyond the end of
the 30-character string​:

(gdb) p rx->offs[1]
$8 = {
start = 12,
end = 33,
start_tmp = 12
}

A watchpoint on rx->offs shows that it gets swapped out here in regexec.c​:

2706 swap = prog->offs;
2707 /* do we need a save destructor here for eval dies? */
2708 Newxz(prog->offs, (prog->nparens + 1), regexp_paren_pair);
2709 DEBUG_BUFFERS_r(PerlIO_printf(Perl_debug_log,
2710 "rex=0x%"UVxf" saving offs​: orig=0x%"UVxf"
new=0x%"UVxf"\n"

when the backtrace is like this​:

#0 Perl_regexec_flags (my_perl=0x100803200, rx=0x10082fdf8,
stringarg=0x10060b658 "A¹kerèeva xxxx.xxxx@​outlook.com ”",
strend=0x10060b67d "", strbeg=0x10060b658 "A¹kerèeva xxxx.xxxx@​outlook.com
”", minend=0, sv=0x1008063e8, data=0x0, flags=1) at regexec.c​:2709
#1 0x0000000100247f3f in Perl_pp_subst (my_perl=0x100803200) at
pp_hot.c​:2120
#2 0x00000001001b847c in Perl_runops_debug (my_perl=0x100803200) at
dump.c​:2231
#3 0x000000010000a8ea in S_run_body (my_perl=0x100803200, oldscope=1) at
perl.c​:2416
#4 0x0000000100009905 in perl_run (my_perl=0x100803200) at perl.c​:2339
#5 0x0000000100072698 in main (argc=3, argv=0x7fff5fbffa78,
env=0x7fff5fbffa98) at miniperlmain.c​:120

So the ordering of some of this stuff needs to be rethought.

A git bisect points me to this commit​:

commit 44a2ac7
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri Dec 29 22​:45​:51 2006 +0100

Re&#8203;: \[PATCH\] Change implementation of %\+ to use a proper tied hash

interface and add support for %-
Message-ID​: <
9b18b3110612291245q792fe91cu69422d2b81bb4f0b@​mail.gmail.com>

But I think it’s a false positive.

Yes it almost definitely is. That is the patch where
Perl_reg_numbered_buff_fetch()
was added so it could be reused. Prior to that I bet there was no assert.

I still do not see where this function is called. Can you show me the
backtrace from where the assert fires? I am as yet unable to replicate on
bleadperl.

Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @Hugmeir

On Thu, Sep 11, 2014 at 5​:51 PM, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 17​:36, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

I can reproduce this on 5.10-5.20 but only for debugging builds.

Indeed, I'm using a -DDEBUGGING perl.

Are you sure you have the script right?

Yes.

Hrm, well even on a DEBUGGING build I cannot replicate in blead.

Can you show me your perl -V and the output of MY version of your script?
(Attached)

Brian if you happen to have your Configure options handy i would appreciate
knowing what they are.

And again I find it very odd that a script which never reads a capture
buffer dies with this error.

ml99299​:perl-blead brfraser$ ./perl -Ilib ~/Downloads/rt122747.t
matched​: >>.xxxx@​outlook.com<<
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) +
i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7552.
Abort trap​: 6

ml99299​:perl-blead brfraser$ ./perl -Ilib ~/Downloads/rt122747_2.t
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) +
i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7552.
Abort trap​: 6

$ ./perl -Ilib -V
Summary of my perl5 (revision 5 version 21 subversion 4) configuration​:
  Local Commit​: ff5975c78387030c95c7f997ee4755c6256d8360
  Ancestor​: 2febb45
  Platform​:
  osname=darwin, osvers=13.3.0, archname=darwin-2level
  uname='darwin ml99299 13.3.0 darwin kernel version 13.3.0​: tue jun
3 21​:27​:35 pdt 2014; root​:xnu-2422.110.17~1release_x86_64 x86_64 '
  config_args='-des -Dusedevel -DDEBUGGING'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-common -DPERL_DARWIN -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector',
  optimize='-O3 -g',
  cppflags='-fno-common -DPERL_DARWIN -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector'
  ccversion='', gccversion='4.2.1 Compatible Apple LLVM 5.1
(clang-503.0.40)', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=16, longdblkind=3
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -fstack-protector'
  libpth=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/5.1/lib
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib
/usr/lib
  libs=-ldbm -ldl -lm -lutil -lc
  perllibs=-ldl -lm -lutil -lc
  libc=, so=dylib, useshrplib=false, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup
-fstack-protector'

Characteristics of this binary (from libperl)​:
  Compile-time options​: DEBUGGING HAS_TIMES PERLIO_LAYERS
  PERL_DONT_CREATE_GVSV
  PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_MALLOC_WRAP
  PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
  PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT
  USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
  USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME
  USE_PERLIO USE_PERL_ATOF
  Locally applied patches​:
ff5975c78387030c95c7f997ee4755c6256d8360
  Built under darwin
  Compiled at Sep 11 2014 18​:10​:30
  %ENV​:
  PERL5LIB="/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all/lib/perl5​:/Volumes/git_tree/main/lib"
  PERLBREW_BASHRC_VERSION="0.67"
  PERLBREW_HOME="/Users/brfraser/.perlbrew"
  PERLBREW_LIB="all"
  PERLBREW_MANPATH="/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all/man​:/Users/brfraser/perl5/perlbrew/perls/perl-5.18.2/man"
  PERLBREW_PATH="/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all/bin​:/Users/brfraser/perl5/perlbrew/bin​:/Users/brfraser/perl5/perlbrew/perls/perl-5.18.2/bin"
  PERLBREW_PERL="perl-5.18.2"
  PERLBREW_ROOT="/Users/brfraser/perl5/perlbrew"
  PERLBREW_VERSION="0.67"
  PERL_LOCAL_LIB_ROOT="/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all"
  PERL_MB_OPT="--install_base /Users/brfraser/.perlbrew/libs/perl-5.18.2@​all"
  PERL_MM_OPT="INSTALL_BASE=/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all"
  @​INC​:
  lib
  /Users/brfraser/.perlbrew/libs/perl-5.18.2@​all/lib/perl5/darwin-2level
  /Users/brfraser/.perlbrew/libs/perl-5.18.2@​all/lib/perl5
  /Volumes/git_tree/main/lib
  /usr/local/lib/perl5/site_perl/5.21.4/darwin-2level
  /usr/local/lib/perl5/site_perl/5.21.4
  /usr/local/lib/perl5/5.21.4/darwin-2level
  /usr/local/lib/perl5/5.21.4
  .

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 18​:13, Brian Fraser <fraserbn@​gmail.com> wrote​:

On Thu, Sep 11, 2014 at 5​:51 PM, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 17​:36, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

I can reproduce this on 5.10-5.20 but only for debugging builds.

Indeed, I'm using a -DDEBUGGING perl.

Are you sure you have the script right?

Yes.

Hrm, well even on a DEBUGGING build I cannot replicate in blead.

Can you show me your perl -V and the output of MY version of your
script?
(Attached)

Brian if you happen to have your Configure options handy i would
appreciate
knowing what they are.

And again I find it very odd that a script which never reads a capture
buffer dies with this error.

ml99299​:perl-blead brfraser$ ./perl -Ilib ~/Downloads/rt122747.t
matched​: >>.xxxx@​outlook.com<<
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) +
i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7552.
Abort trap​: 6

ml99299​:perl-blead brfraser$ ./perl -Ilib ~/Downloads/rt122747_2.t
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) +
i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7552.
Abort trap​: 6

Well, this doesnt make any sense to me.

Can you show me the exact Configure you used, because so far every one I
have tried does not fail?

For instance I used this​:

./Configure -Doptimize=-g -d -Dusedevel -Dusethreads -Dcc=ccache\ gcc
-Dld=gcc -DDEBUGGING -Accflags="-msse2 -mssse3 -maes"

and this​:

./Configure -Doptimize=-g -d -Dusedevel -Dcc=ccache\ gcc -Dld=gcc
-DDEBUGGING -Accflags="-msse2 -mssse3 -maes"

and neither produce the failure you describe.

I would very much like to help fix this, but if I cant replicate it I cant
help.

$ ./perl -Ilib -V
Summary of my perl5 (revision 5 version 21 subversion 4) configuration​:
Local Commit​: ff5975c78387030c95c7f997ee4755c6256d8360
Ancestor​: 2febb45
Platform​:
osname=darwin, osvers=13.3.0, archname=darwin-2level
uname='darwin ml99299 13.3.0 darwin kernel version 13.3.0​: tue jun
3 21​:27​:35 pdt 2014; root​:xnu-2422.110.17~1release_x86_64 x86_64 '
config_args='-des -Dusedevel -DDEBUGGING'
hint=recommended, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler​:
cc='cc', ccflags ='-fno-common -DPERL_DARWIN -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector',
optimize='-O3 -g',
cppflags='-fno-common -DPERL_DARWIN -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector'
ccversion='', gccversion='4.2.1 Compatible Apple LLVM 5.1
(clang-503.0.40)', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=16, longdblkind=3
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries​:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='
-fstack-protector'

libpth=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/5.1/lib

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib
/usr/lib
libs=-ldbm -ldl -lm -lutil -lc
perllibs=-ldl -lm -lutil -lc
libc=, so=dylib, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup
-fstack-protector'

Characteristics of this binary (from libperl)​:
Compile-time options​: DEBUGGING HAS_TIMES PERLIO_LAYERS
PERL_DONT_CREATE_GVSV
PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_MALLOC_WRAP
PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT
USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME
USE_PERLIO USE_PERL_ATOF
Locally applied patches​:
ff5975c78387030c95c7f997ee4755c6256d8360
Built under darwin
Compiled at Sep 11 2014 18​:10​:30
%ENV​:
PERL5LIB="/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all
/lib/perl5​:/Volumes/git_tree/main/lib"
PERLBREW_BASHRC_VERSION="0.67"
PERLBREW_HOME="/Users/brfraser/.perlbrew"
PERLBREW_LIB="all"
PERLBREW_MANPATH="/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all
/man​:/Users/brfraser/perl5/perlbrew/perls/perl-5.18.2/man"
PERLBREW_PATH="/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all
/bin​:/Users/brfraser/perl5/perlbrew/bin​:/Users/brfraser/perl5/perlbrew/perls/perl-5.18.2/bin"
PERLBREW_PERL="perl-5.18.2"
PERLBREW_ROOT="/Users/brfraser/perl5/perlbrew"
PERLBREW_VERSION="0.67"
PERL_LOCAL_LIB_ROOT="/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all"
PERL_MB_OPT="--install_base
/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all"

PERL_MM_OPT="INSTALL_BASE=/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all"

You have a lot of environment references to Perl 5.18.2. I wonder if that
is relevant?

@​INC​:
lib
/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all/lib/perl5/darwin-2level
/Users/brfraser/.perlbrew/libs/perl-5.18.2@​all/lib/perl5
/Volumes/git_tree/main/lib
/usr/local/lib/perl5/site_perl/5.21.4/darwin-2level
/usr/local/lib/perl5/site_perl/5.21.4
/usr/local/lib/perl5/5.21.4/darwin-2level
/usr/local/lib/perl5/5.21.4
.

Here is my perl -V​:

$ ./perl -Ilib -V
Summary of my perl5 (revision 5 version 21 subversion 4) configuration​:
  Derived from​: d6f85a5
  uname='linux shire 3.8.0-19-generic #30-ubuntu smp wed may 1 16​:35​:23
utc 2013 x86_64 x86_64 x86_64 gnulinux '
  config_args='-Doptimize=-g -d -Dusedevel -Dusethreads -Dcc=ccache gcc
-Dld=gcc -DDEBUGGING -Accflags=-msse2 -mssse3 -maes'
  hint=previous, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='ccache gcc', ccflags ='-msse2 -mssse3 -msse4 -maes -fwrapv
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -msse2 -mssse3 -maes -msse2
-mssse3 -maes',
  optimize='-O2 -g',
  cppflags='-msse2 -mssse3 -msse4 -maes -fwrapv -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include -msse2 -mssse3 -msse4 -maes
-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -msse2 -mssse3 -maes -msse2
-mssse3 -msse4 -maes -fwrapv -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -msse2
-mssse3 -maes -msse2 -mssse3 -maes'
  ccversion='', gccversion='4.7.3', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16,
longdblkind=3
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='gcc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /lib64 /usr/lib64
/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib /usr/local/lib
/usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib
  libs=-lnsl -ldl -lm -lcrypt -lutil -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=libc-2.17.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.17'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'

Characteristics of this binary (from libperl)​:
  Compile-time options​: HAS_TIMES PERLIO_LAYERS PERL_DONT_CREATE_GVSV
  PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_MALLOC_WRAP
  PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
  PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT
  USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
  USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME
  USE_PERLIO USE_PERL_ATOF
  Locally applied patches​:
uncommitted-changes
  Built under linux
  Compiled at Sep 11 2014 18​:11​:23
  %ENV​:
  PERLBREW_BASHRC_VERSION="0.67"
  PERLBREW_CONFIGURE_FLAGS="-de -Dcc=ccache\ gcc -Dld=gcc"
  PERLBREW_HOME="/home/yorton/.perlbrew"
  PERLBREW_MANPATH=""
  PERLBREW_PATH="/home/yorton/perl5/perlbrew/bin"
  PERLBREW_ROOT="/home/yorton/perl5/perlbrew"
  PERLBREW_VERSION="0.67"
  @​INC​:
  lib
  /usr/local/lib/perl5/site_perl/5.21.4/x86_64-linux
  /usr/local/lib/perl5/site_perl/5.21.4
  /usr/local/lib/perl5/5.21.4/x86_64-linux
  /usr/local/lib/perl5/5.21.4

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From Mark.Martinec@ijs.si

I have repeated the exercise with a fresh install of perl-blead
under perlbrew - with same results.

$ perlbrew install blead -DDEBUGGING

$ perlbrew use perl-blead

$ perl -V
Summary of my perl5 (revision 5 version 21 subversion 4) configuration​:
  Snapshot of​: 2febb45
  Platform​:
  osname=freebsd, osvers=10.0-stable, archname=amd64-freebsd
  uname='freebsd neli.ijs.si 10.0-stable freebsd 10.0-stable #1
r269624m​: wed aug 6 15​:31​:56 cest 2014
mark@​neli.ijs.si​:usrobjusrsrcsysneli amd64 '
  config_args='-de -Dprefix=/home/mark/perl5/perlbrew/perls/perl-blead
-DDEBUGGING -Dusedevel'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_FORTIFY_SOURCE=2',
  optimize='-O -g',
  cppflags='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.2.1 Compatible FreeBSD Clang 3.4.1
(tags/RELEASE_34/dot1-final 208032)', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16,
longdblkind=3
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags ='-Wl,-E -fstack-protector -L/usr/local/lib'
  libpth=/usr/lib /usr/local/lib /usr/include/clang/3.4.1 /usr/lib
  libs=-lgdbm -lm -lcrypt -lutil -lc
  perllibs=-lm -lcrypt -lutil -lc
  libc=, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
  cccdlflags='-DPIC -fPIC', lddlflags='-shared -L/usr/local/lib
-fstack-protector'

Characteristics of this binary (from libperl)​:
  Compile-time options​: DEBUGGING HAS_TIMES PERLIO_LAYERS
  PERL_DONT_CREATE_GVSV
  PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
PERL_MALLOC_WRAP
  PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
  PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT
  USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
  USE_LOCALE_CTYPE USE_LOCALE_NUMERIC
USE_LOCALE_TIME
  USE_PERLIO USE_PERL_ATOF
  Built under freebsd
  Compiled at Sep 11 2014 19​:40​:55
  %ENV​:
  PERLBREW_BASHRC_VERSION="0.59"
  PERLBREW_HOME="/home/mark/.perlbrew"
  PERLBREW_MANPATH="/home/mark/perl5/perlbrew/perls/perl-blead/man"
 
PERLBREW_PATH="/home/mark/perl5/perlbrew/bin​:/home/mark/perl5/perlbrew/perls/perl-blead/bin"
  PERLBREW_PERL="perl-blead"
  PERLBREW_ROOT="/home/mark/perl5/perlbrew"
  PERLBREW_VERSION="0.59"
  @​INC​:
 
/home/mark/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.21.4/amd64-freebsd
  /home/mark/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.21.4
  /home/mark/perl5/perlbrew/perls/perl-blead/lib/5.21.4/amd64-freebsd
  /home/mark/perl5/perlbrew/perls/perl-blead/lib/5.21.4
  .

$ type perl
perl is hashed (/home/mark/perl5/perlbrew/perls/perl-blead/bin/perl)

$ perl ~/rt122747.t
matched​: >>.xxxx@​outlook.com<<
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) +
i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7552.
Abort trap
$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 20​:09, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

I have repeated the exercise with a fresh install of perl-blead
under perlbrew - with same results.

$ perlbrew install blead -DDEBUGGING

$ perlbrew use perl-blead

$ perl -V
Summary of my perl5 (revision 5 version 21 subversion 4) configuration​:
Snapshot of​: 2febb45
Platform​:
osname=freebsd, osvers=10.0-stable, archname=amd64-freebsd
uname='freebsd neli.ijs.si 10.0-stable freebsd 10.0-stable #1
r269624m​: wed aug 6 15​:31​:56 cest 2014 mark@​neli.ijs.si​:usrobjusrsrcsysneli
amd64 '
config_args='-de -Dprefix=/home/mark/perl5/perlbrew/perls/perl-blead
-DDEBUGGING -Dusedevel'
hint=recommended, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler​:
cc='cc', ccflags ='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_FORTIFY_SOURCE=2',
optimize='-O -g',
cppflags='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.2.1 Compatible FreeBSD Clang 3.4.1
(tags/RELEASE_34/dot1-final 208032)', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16,
longdblkind=3
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries​:
ld='cc', ldflags ='-Wl,-E -fstack-protector -L/usr/local/lib'
libpth=/usr/lib /usr/local/lib /usr/include/clang/3.4.1 /usr/lib
libs=-lgdbm -lm -lcrypt -lutil -lc
perllibs=-lm -lcrypt -lutil -lc
libc=, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
cccdlflags='-DPIC -fPIC', lddlflags='-shared -L/usr/local/lib
-fstack-protector'

Characteristics of this binary (from libperl)​:
Compile-time options​: DEBUGGING HAS_TIMES PERLIO_LAYERS
PERL_DONT_CREATE_GVSV
PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_MALLOC_WRAP
PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT
USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME
USE_PERLIO USE_PERL_ATOF
Built under freebsd
Compiled at Sep 11 2014 19​:40​:55
%ENV​:
PERLBREW_BASHRC_VERSION="0.59"
PERLBREW_HOME="/home/mark/.perlbrew"
PERLBREW_MANPATH="/home/mark/perl5/perlbrew/perls/perl-blead/man"
PERLBREW_PATH="/home/mark/perl5/perlbrew/bin​:/home/mark/
perl5/perlbrew/perls/perl-blead/bin"
PERLBREW_PERL="perl-blead"
PERLBREW_ROOT="/home/mark/perl5/perlbrew"
PERLBREW_VERSION="0.59"
@​INC​:
/home/mark/perl5/perlbrew/perls/perl-blead/lib/site_
perl/5.21.4/amd64-freebsd
/home/mark/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.21.4
/home/mark/perl5/perlbrew/perls/perl-blead/lib/5.21.4/amd64-freebsd
/home/mark/perl5/perlbrew/perls/perl-blead/lib/5.21.4
.

$ type perl
perl is hashed (/home/mark/perl5/perlbrew/perls/perl-blead/bin/perl)

$ perl ~/rt122747.t
matched​: >>.xxxx@​outlook.com<<
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)),
function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7552.
Abort trap
$

Thanks. I am still working out why I have not been able to get Perl to
build such that assert traps do anything. I think it is sometime stupid on
my behalf.

I changed the code to not use an assert but rather a simple if and I am
able to trigger the bug.

I am going to try to get to the bottom of this today.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 20​:23, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 20​:09, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

I have repeated the exercise with a fresh install of perl-blead
under perlbrew - with same results.

$ perlbrew install blead -DDEBUGGING

$ perlbrew use perl-blead

$ perl -V
Summary of my perl5 (revision 5 version 21 subversion 4) configuration​:
Snapshot of​: 2febb45
Platform​:
osname=freebsd, osvers=10.0-stable, archname=amd64-freebsd
uname='freebsd neli.ijs.si 10.0-stable freebsd 10.0-stable #1
r269624m​: wed aug 6 15​:31​:56 cest 2014 mark@​neli.ijs.si​:usrobjusrsrcsysneli
amd64 '
config_args='-de -Dprefix=/home/mark/perl5/perlbrew/perls/perl-blead
-DDEBUGGING -Dusedevel'
hint=recommended, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler​:
cc='cc', ccflags ='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_FORTIFY_SOURCE=2',
optimize='-O -g',
cppflags='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.2.1 Compatible FreeBSD Clang 3.4.1
(tags/RELEASE_34/dot1-final 208032)', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16,
longdblkind=3
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries​:
ld='cc', ldflags ='-Wl,-E -fstack-protector -L/usr/local/lib'
libpth=/usr/lib /usr/local/lib /usr/include/clang/3.4.1 /usr/lib
libs=-lgdbm -lm -lcrypt -lutil -lc
perllibs=-lm -lcrypt -lutil -lc
libc=, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
cccdlflags='-DPIC -fPIC', lddlflags='-shared -L/usr/local/lib
-fstack-protector'

Characteristics of this binary (from libperl)​:
Compile-time options​: DEBUGGING HAS_TIMES PERLIO_LAYERS
PERL_DONT_CREATE_GVSV
PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
PERL_MALLOC_WRAP
PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT
USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
USE_LOCALE_CTYPE USE_LOCALE_NUMERIC
USE_LOCALE_TIME
USE_PERLIO USE_PERL_ATOF
Built under freebsd
Compiled at Sep 11 2014 19​:40​:55
%ENV​:
PERLBREW_BASHRC_VERSION="0.59"
PERLBREW_HOME="/home/mark/.perlbrew"
PERLBREW_MANPATH="/home/mark/perl5/perlbrew/perls/perl-blead/man"
PERLBREW_PATH="/home/mark/perl5/perlbrew/bin​:/home/mark/
perl5/perlbrew/perls/perl-blead/bin"
PERLBREW_PERL="perl-blead"
PERLBREW_ROOT="/home/mark/perl5/perlbrew"
PERLBREW_VERSION="0.59"
@​INC​:
/home/mark/perl5/perlbrew/perls/perl-blead/lib/site_
perl/5.21.4/amd64-freebsd
/home/mark/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.21.4
/home/mark/perl5/perlbrew/perls/perl-blead/lib/5.21.4/amd64-freebsd
/home/mark/perl5/perlbrew/perls/perl-blead/lib/5.21.4
.

$ type perl
perl is hashed (/home/mark/perl5/perlbrew/perls/perl-blead/bin/perl)

$ perl ~/rt122747.t
matched​: >>.xxxx@​outlook.com<<
Assertion failed​: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)),
function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7552.
Abort trap
$

Thanks. I am still working out why I have not been able to get Perl to
build such that assert traps do anything. I think it is sometime stupid on
my behalf.

Which was that you need to do a git clean -dfX to wipe some things created
by Configure in a previous run.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 17​:51, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 17​:36, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

I can reproduce this on 5.10-5.20 but only for debugging builds.

Indeed, I'm using a -DDEBUGGING perl.

Are you sure you have the script right?

Yes.

Hrm, well even on a DEBUGGING build I cannot replicate in blead.

Now I can. Configure being overly "helpful" meant i was not building with
DEBUGGING even though I thought I was.

Now I can replicate I have determined the the use re 'taint'; is apparently
unnecessary, the script I posted which prints $1 will trigger it as well.
Reattached in this mail...

Since the re taint is not necessary this means the relation to the utf8
loading and save_re_context and things like that is irrelevant.

Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

#!/usr/bin/perl

use strict;

my(@​body) = (
  "<mailto​:xxxx.xxxx\@​outlook.com>",
  "A\x{B9}ker\x{E8}eva xxxx.xxxx\@​outlook.com \x{201D}",
);

for (@​body) {
  s{ <? (?<!mailto​:) \b ( [a-z0-9.]+ \@​ \S+ ) \b
  (?​: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi;
  print "matched​: >>$1<<\n";
}

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 20​:41, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 17​:51, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 17​:36, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

I can reproduce this on 5.10-5.20 but only for debugging builds.

Indeed, I'm using a -DDEBUGGING perl.

Are you sure you have the script right?

Yes.

Hrm, well even on a DEBUGGING build I cannot replicate in blead.

Now I can. Configure being overly "helpful" meant i was not building with
DEBUGGING even though I thought I was.

Now I can replicate I have determined the the use re 'taint'; is
apparently unnecessary, the script I posted which prints $1 will trigger it
as well. Reattached in this mail...

Since the re taint is not necessary this means the relation to the utf8
loading and save_re_context and things like that is irrelevant.

I was misreading things. In fact this is relevant​:

#0 0x00007ffff70e9037 in __GI_raise (sig=sig@​entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c​:56
#1 0x00007ffff70ec698 in __GI_abort () at abort.c​:90
#2 0x00007ffff70e1e03 in __assert_fail_base (fmt=0x7ffff7239158
"%s%s%s​:%u​: %s%sAssertion `%s' failed.\n%n",
  assertion=assertion@​entry=0x7ce6b8 "(STRLEN)rx->sublen >= (STRLEN)((s -
rx->subbeg) + i)", file=file@​entry=0x7cb018 "regcomp.c",
  line=line@​entry=7552, function=function@​entry=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch") at assert.c​:92
#3 0x00007ffff70e1eb2 in __GI___assert_fail (assertion=0x7ce6b8
"(STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)",
  file=0x7cb018 "regcomp.c", line=7552, function=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch")
  at assert.c​:101
#4 0x000000000051689b in Perl_reg_numbered_buff_fetch (my_perl=0xa95010,
r=0xac4b68, paren=1, sv=0xacd388) at regcomp.c​:7552
#5 0x00000000005721e6 in Perl_magic_get (my_perl=0xa95010, sv=0xacd388,
mg=0xad36f8) at mg.c​:789
#6 0x000000000056fc1f in Perl_mg_get (my_perl=0xa95010, sv=0xacd388) at
mg.c​:199
#7 0x000000000066352c in Perl_save_scalar (my_perl=0xa95010, gv=0xacd370)
at scope.c​:206
#8 0x0000000000542044 in Perl_save_re_context (my_perl=0xa95010) at
regcomp.c​:16814
#9 0x00000000007183cb in Perl__core_swash_init (my_perl=0xa95010,
pkg=0x862a4e "utf8", name=0x862a09 "ToCf", listsv=0xa95138,
  minbits=4, none=0, invlist=0x0, flags_p=0x0) at utf8.c​:2346
#10 0x0000000000716945 in Perl_to_utf8_case (my_perl=0xa95010, p=0xabf0ba
"”", ustrp=0x7fffffffd390 "\f", lenp=0x7fffffffd338,
  swashp=0xa958c8, normal=0x862a09 "ToCf", special=0x86272a "") at
utf8.c​:1800
#11 0x0000000000717c24 in Perl__to_utf8_fold_flags (my_perl=0xa95010,
p=0xabf0ba "”", ustrp=0x7fffffffd390 "\f",
  lenp=0x7fffffffd338, flags=2 '\002') at utf8.c​:2161
#12 0x0000000000721be7 in Perl_foldEQ_utf8_flags (my_perl=0xa95010,
s1=0xacc974 "phone", pe1=0x0, l1=5, u1=false, s2=0xabf0ba "”",
  pe2=0x7fffffffd500, l2=0, u2=true, flags=0) at utf8.c​:4044
#13 0x0000000000701990 in S_regmatch (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startpos=0xabf0a4 "xxxx.xxxx@​outlook.com ”",
  prog=0xacc8c8) at regexec.c​:4561
#14 0x00000000006fb104 in S_regtry (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startposp=0x7fffffffdc50) at regexec.c​:3231
#15 0x00000000006fa9fc in Perl_regexec_flags (my_perl=0xa95010,
rx=0xac4b68,
  stringarg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”", strend=0xabf0ac
"x@​outlook.com ”",
  strbeg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”", minend=0,
sv=0xa98078, data=0x0, flags=1) at regexec.c​:3090
#16 0x00000000005bd269 in Perl_pp_subst (my_perl=0xa95010) at pp_hot.c​:2120
#17 0x000000000055ad69 in Perl_runops_debug (my_perl=0xa95010) at
dump.c​:2353
#18 0x000000000045e9a2 in S_run_body (my_perl=0xa95010, oldscope=1) at
perl.c​:2416
#19 0x000000000045dd66 in perl_run (my_perl=0xa95010) at perl.c​:2339
#20 0x000000000041b35d in main (argc=3, argv=0x7fffffffe3f8,
env=0x7fffffffe418) at perlmain.c​:114

Do we really need to use the regex engine for swash init? Wouldnt the
sanest way to solve this class of bugs be to change how we store and
represent swashes?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 21​:32, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 20​:41, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 17​:51, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 17​:36, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

I can reproduce this on 5.10-5.20 but only for debugging builds.

Indeed, I'm using a -DDEBUGGING perl.

Are you sure you have the script right?

Yes.

Hrm, well even on a DEBUGGING build I cannot replicate in blead.

Now I can. Configure being overly "helpful" meant i was not building with
DEBUGGING even though I thought I was.

Now I can replicate I have determined the the use re 'taint'; is
apparently unnecessary, the script I posted which prints $1 will trigger it
as well. Reattached in this mail...

Since the re taint is not necessary this means the relation to the utf8
loading and save_re_context and things like that is irrelevant.

I was misreading things. In fact this is relevant​:

#0 0x00007ffff70e9037 in __GI_raise (sig=sig@​entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c​:56
#1 0x00007ffff70ec698 in __GI_abort () at abort.c​:90
#2 0x00007ffff70e1e03 in __assert_fail_base (fmt=0x7ffff7239158
"%s%s%s​:%u​: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@​entry=0x7ce6b8 "(STRLEN)rx->sublen >= (STRLEN)((s
- rx->subbeg) + i)", file=file@​entry=0x7cb018 "regcomp.c",
line=line@​entry=7552, function=function@​entry=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch") at assert.c​:92
#3 0x00007ffff70e1eb2 in __GI___assert_fail (assertion=0x7ce6b8
"(STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)",
file=0x7cb018 "regcomp.c", line=7552, function=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch")
at assert.c​:101
#4 0x000000000051689b in Perl_reg_numbered_buff_fetch (my_perl=0xa95010,
r=0xac4b68, paren=1, sv=0xacd388) at regcomp.c​:7552
#5 0x00000000005721e6 in Perl_magic_get (my_perl=0xa95010, sv=0xacd388,
mg=0xad36f8) at mg.c​:789
#6 0x000000000056fc1f in Perl_mg_get (my_perl=0xa95010, sv=0xacd388) at
mg.c​:199
#7 0x000000000066352c in Perl_save_scalar (my_perl=0xa95010, gv=0xacd370)
at scope.c​:206
#8 0x0000000000542044 in Perl_save_re_context (my_perl=0xa95010) at
regcomp.c​:16814
#9 0x00000000007183cb in Perl__core_swash_init (my_perl=0xa95010,
pkg=0x862a4e "utf8", name=0x862a09 "ToCf", listsv=0xa95138,
minbits=4, none=0, invlist=0x0, flags_p=0x0) at utf8.c​:2346
#10 0x0000000000716945 in Perl_to_utf8_case (my_perl=0xa95010, p=0xabf0ba
"”", ustrp=0x7fffffffd390 "\f", lenp=0x7fffffffd338,
swashp=0xa958c8, normal=0x862a09 "ToCf", special=0x86272a "") at
utf8.c​:1800
#11 0x0000000000717c24 in Perl__to_utf8_fold_flags (my_perl=0xa95010,
p=0xabf0ba "”", ustrp=0x7fffffffd390 "\f",
lenp=0x7fffffffd338, flags=2 '\002') at utf8.c​:2161
#12 0x0000000000721be7 in Perl_foldEQ_utf8_flags (my_perl=0xa95010,
s1=0xacc974 "phone", pe1=0x0, l1=5, u1=false, s2=0xabf0ba "”",
pe2=0x7fffffffd500, l2=0, u2=true, flags=0) at utf8.c​:4044
#13 0x0000000000701990 in S_regmatch (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startpos=0xabf0a4 "xxxx.xxxx@​outlook.com ”",
prog=0xacc8c8) at regexec.c​:4561
#14 0x00000000006fb104 in S_regtry (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startposp=0x7fffffffdc50) at regexec.c​:3231
#15 0x00000000006fa9fc in Perl_regexec_flags (my_perl=0xa95010,
rx=0xac4b68,
stringarg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”",
strend=0xabf0ac "x@​outlook.com ”",
strbeg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”", minend=0,
sv=0xa98078, data=0x0, flags=1) at regexec.c​:3090
#16 0x00000000005bd269 in Perl_pp_subst (my_perl=0xa95010) at pp_hot.c​:2120
#17 0x000000000055ad69 in Perl_runops_debug (my_perl=0xa95010) at
dump.c​:2353
#18 0x000000000045e9a2 in S_run_body (my_perl=0xa95010, oldscope=1) at
perl.c​:2416
#19 0x000000000045dd66 in perl_run (my_perl=0xa95010) at perl.c​:2339
#20 0x000000000041b35d in main (argc=3, argv=0x7fffffffe3f8,
env=0x7fffffffe418) at perlmain.c​:114

Do we really need to use the regex engine for swash init? Wouldnt the
sanest way to solve this class of bugs be to change how we store and
represent swashes?

Or maybe trigger the swash_init() during regex *compile*.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @cpansprout

On Thu Sep 11 12​:32​:57 2014, demerphq wrote​:

On 11 September 2014 20​:41, demerphq <demerphq@​gmail.com> wrote​:

Now I can replicate I have determined the the use re 'taint'; is
apparently unnecessary, the script I posted which prints $1 will
trigger it
as well. Reattached in this mail...

Since the re taint is not necessary this means the relation to the
utf8
loading and save_re_context and things like that is irrelevant.

I was misreading things. In fact this is relevant​:

#0 0x00007ffff70e9037 in __GI_raise (sig=sig@​entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c​:56
#1 0x00007ffff70ec698 in __GI_abort () at abort.c​:90
#2 0x00007ffff70e1e03 in __assert_fail_base (fmt=0x7ffff7239158
"%s%s%s​:%u​: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@​entry=0x7ce6b8 "(STRLEN)rx->sublen >=
(STRLEN)((s -
rx->subbeg) + i)", file=file@​entry=0x7cb018 "regcomp.c",
line=line@​entry=7552, function=function@​entry=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch") at
assert.c​:92
#3 0x00007ffff70e1eb2 in __GI___assert_fail (assertion=0x7ce6b8
"(STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)",
file=0x7cb018 "regcomp.c", line=7552, function=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch")
at assert.c​:101
#4 0x000000000051689b in Perl_reg_numbered_buff_fetch
(my_perl=0xa95010,
r=0xac4b68, paren=1, sv=0xacd388) at regcomp.c​:7552
#5 0x00000000005721e6 in Perl_magic_get (my_perl=0xa95010,
sv=0xacd388,
mg=0xad36f8) at mg.c​:789
#6 0x000000000056fc1f in Perl_mg_get (my_perl=0xa95010, sv=0xacd388)
at
mg.c​:199
#7 0x000000000066352c in Perl_save_scalar (my_perl=0xa95010,
gv=0xacd370)
at scope.c​:206
#8 0x0000000000542044 in Perl_save_re_context (my_perl=0xa95010) at
regcomp.c​:16814
#9 0x00000000007183cb in Perl__core_swash_init (my_perl=0xa95010,
pkg=0x862a4e "utf8", name=0x862a09 "ToCf", listsv=0xa95138,
minbits=4, none=0, invlist=0x0, flags_p=0x0) at utf8.c​:2346
#10 0x0000000000716945 in Perl_to_utf8_case (my_perl=0xa95010,
p=0xabf0ba
"”", ustrp=0x7fffffffd390 "\f", lenp=0x7fffffffd338,
swashp=0xa958c8, normal=0x862a09 "ToCf", special=0x86272a "") at
utf8.c​:1800
#11 0x0000000000717c24 in Perl__to_utf8_fold_flags (my_perl=0xa95010,
p=0xabf0ba "”", ustrp=0x7fffffffd390 "\f",
lenp=0x7fffffffd338, flags=2 '\002') at utf8.c​:2161
#12 0x0000000000721be7 in Perl_foldEQ_utf8_flags (my_perl=0xa95010,
s1=0xacc974 "phone", pe1=0x0, l1=5, u1=false, s2=0xabf0ba "”",
pe2=0x7fffffffd500, l2=0, u2=true, flags=0) at utf8.c​:4044
#13 0x0000000000701990 in S_regmatch (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startpos=0xabf0a4 "xxxx.xxxx@​outlook.com ”",
prog=0xacc8c8) at regexec.c​:4561
#14 0x00000000006fb104 in S_regtry (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startposp=0x7fffffffdc50) at regexec.c​:3231
#15 0x00000000006fa9fc in Perl_regexec_flags (my_perl=0xa95010,
rx=0xac4b68,
stringarg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”",
strend=0xabf0ac
"x@​outlook.com ”",
strbeg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”", minend=0,
sv=0xa98078, data=0x0, flags=1) at regexec.c​:3090
#16 0x00000000005bd269 in Perl_pp_subst (my_perl=0xa95010) at
pp_hot.c​:2120
#17 0x000000000055ad69 in Perl_runops_debug (my_perl=0xa95010) at
dump.c​:2353
#18 0x000000000045e9a2 in S_run_body (my_perl=0xa95010, oldscope=1) at
perl.c​:2416
#19 0x000000000045dd66 in perl_run (my_perl=0xa95010) at perl.c​:2339
#20 0x000000000041b35d in main (argc=3, argv=0x7fffffffe3f8,
env=0x7fffffffe418) at perlmain.c​:114

So you’ve beaten me to the backtrace.

Do we really need to use the regex engine for swash init? Wouldnt the
sanest way to solve this class of bugs be to change how we store and
represent swashes?

Short of that, could we just stop the init code from using regexps itself?

That said, is it even necessary at present to localise $1 et al.? As long as the swash init code does not access $1 after a failed match, would it matter that the current (outer) regexp is in an inconsistent state? Or could we make PL_curpm null during the swash init?

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @cpansprout

On Thu Sep 11 11​:26​:03 2014, demerphq wrote​:

Which was that you need to do a git clean -dfX to wipe some things
created
by Configure in a previous run.

‘rm config.sh Policy.sh’ usually works for me, but I may be missing something. At least I don’t have to rebuild everything, though.

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 21​:40, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Thu Sep 11 12​:32​:57 2014, demerphq wrote​:

On 11 September 2014 20​:41, demerphq <demerphq@​gmail.com> wrote​:

Now I can replicate I have determined the the use re 'taint'; is
apparently unnecessary, the script I posted which prints $1 will
trigger it
as well. Reattached in this mail...

Since the re taint is not necessary this means the relation to the
utf8
loading and save_re_context and things like that is irrelevant.

I was misreading things. In fact this is relevant​:

#0 0x00007ffff70e9037 in __GI_raise (sig=sig@​entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c​:56
#1 0x00007ffff70ec698 in __GI_abort () at abort.c​:90
#2 0x00007ffff70e1e03 in __assert_fail_base (fmt=0x7ffff7239158
"%s%s%s​:%u​: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@​entry=0x7ce6b8 "(STRLEN)rx->sublen >=
(STRLEN)((s -
rx->subbeg) + i)", file=file@​entry=0x7cb018 "regcomp.c",
line=line@​entry=7552, function=function@​entry=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch") at
assert.c​:92
#3 0x00007ffff70e1eb2 in __GI___assert_fail (assertion=0x7ce6b8
"(STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)",
file=0x7cb018 "regcomp.c", line=7552, function=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch")
at assert.c​:101
#4 0x000000000051689b in Perl_reg_numbered_buff_fetch
(my_perl=0xa95010,
r=0xac4b68, paren=1, sv=0xacd388) at regcomp.c​:7552
#5 0x00000000005721e6 in Perl_magic_get (my_perl=0xa95010,
sv=0xacd388,
mg=0xad36f8) at mg.c​:789
#6 0x000000000056fc1f in Perl_mg_get (my_perl=0xa95010, sv=0xacd388)
at
mg.c​:199
#7 0x000000000066352c in Perl_save_scalar (my_perl=0xa95010,
gv=0xacd370)
at scope.c​:206
#8 0x0000000000542044 in Perl_save_re_context (my_perl=0xa95010) at
regcomp.c​:16814
#9 0x00000000007183cb in Perl__core_swash_init (my_perl=0xa95010,
pkg=0x862a4e "utf8", name=0x862a09 "ToCf", listsv=0xa95138,
minbits=4, none=0, invlist=0x0, flags_p=0x0) at utf8.c​:2346
#10 0x0000000000716945 in Perl_to_utf8_case (my_perl=0xa95010,
p=0xabf0ba
"”", ustrp=0x7fffffffd390 "\f", lenp=0x7fffffffd338,
swashp=0xa958c8, normal=0x862a09 "ToCf", special=0x86272a "") at
utf8.c​:1800
#11 0x0000000000717c24 in Perl__to_utf8_fold_flags (my_perl=0xa95010,
p=0xabf0ba "”", ustrp=0x7fffffffd390 "\f",
lenp=0x7fffffffd338, flags=2 '\002') at utf8.c​:2161
#12 0x0000000000721be7 in Perl_foldEQ_utf8_flags (my_perl=0xa95010,
s1=0xacc974 "phone", pe1=0x0, l1=5, u1=false, s2=0xabf0ba "”",
pe2=0x7fffffffd500, l2=0, u2=true, flags=0) at utf8.c​:4044
#13 0x0000000000701990 in S_regmatch (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startpos=0xabf0a4 "xxxx.xxxx@​outlook.com ”",
prog=0xacc8c8) at regexec.c​:4561
#14 0x00000000006fb104 in S_regtry (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startposp=0x7fffffffdc50) at regexec.c​:3231
#15 0x00000000006fa9fc in Perl_regexec_flags (my_perl=0xa95010,
rx=0xac4b68,
stringarg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”",
strend=0xabf0ac
"x@​outlook.com ”",
strbeg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”", minend=0,
sv=0xa98078, data=0x0, flags=1) at regexec.c​:3090
#16 0x00000000005bd269 in Perl_pp_subst (my_perl=0xa95010) at
pp_hot.c​:2120
#17 0x000000000055ad69 in Perl_runops_debug (my_perl=0xa95010) at
dump.c​:2353
#18 0x000000000045e9a2 in S_run_body (my_perl=0xa95010, oldscope=1) at
perl.c​:2416
#19 0x000000000045dd66 in perl_run (my_perl=0xa95010) at perl.c​:2339
#20 0x000000000041b35d in main (argc=3, argv=0x7fffffffe3f8,
env=0x7fffffffe418) at perlmain.c​:114

So you’ve beaten me to the backtrace.

Yes, sorry about that. Configure was driving me bonkers.

Do we really need to use the regex engine for swash init? Wouldnt the
sanest way to solve this class of bugs be to change how we store and
represent swashes?

Short of that, could we just stop the init code from using regexps itself?

That is what I meant.

That said, is it even necessary at present to localise $1 et al.? As long
as the swash init code does not access $1 after a failed match, would it
matter that the current (outer) regexp is in an inconsistent state? Or
could we make PL_curpm null during the swash init?

The latter is an amazingly good idea and I am testing a patch that does
exactly that as I type.

Yves

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 21​:53, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 21​:40, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Thu Sep 11 12​:32​:57 2014, demerphq wrote​:

On 11 September 2014 20​:41, demerphq <demerphq@​gmail.com> wrote​:

Now I can replicate I have determined the the use re 'taint'; is
apparently unnecessary, the script I posted which prints $1 will
trigger it
as well. Reattached in this mail...

Since the re taint is not necessary this means the relation to the
utf8
loading and save_re_context and things like that is irrelevant.

I was misreading things. In fact this is relevant​:

#0 0x00007ffff70e9037 in __GI_raise (sig=sig@​entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c​:56
#1 0x00007ffff70ec698 in __GI_abort () at abort.c​:90
#2 0x00007ffff70e1e03 in __assert_fail_base (fmt=0x7ffff7239158
"%s%s%s​:%u​: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@​entry=0x7ce6b8 "(STRLEN)rx->sublen >=
(STRLEN)((s -
rx->subbeg) + i)", file=file@​entry=0x7cb018 "regcomp.c",
line=line@​entry=7552, function=function@​entry=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch") at
assert.c​:92
#3 0x00007ffff70e1eb2 in __GI___assert_fail (assertion=0x7ce6b8
"(STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)",
file=0x7cb018 "regcomp.c", line=7552, function=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch")
at assert.c​:101
#4 0x000000000051689b in Perl_reg_numbered_buff_fetch
(my_perl=0xa95010,
r=0xac4b68, paren=1, sv=0xacd388) at regcomp.c​:7552
#5 0x00000000005721e6 in Perl_magic_get (my_perl=0xa95010,
sv=0xacd388,
mg=0xad36f8) at mg.c​:789
#6 0x000000000056fc1f in Perl_mg_get (my_perl=0xa95010, sv=0xacd388)
at
mg.c​:199
#7 0x000000000066352c in Perl_save_scalar (my_perl=0xa95010,
gv=0xacd370)
at scope.c​:206
#8 0x0000000000542044 in Perl_save_re_context (my_perl=0xa95010) at
regcomp.c​:16814
#9 0x00000000007183cb in Perl__core_swash_init (my_perl=0xa95010,
pkg=0x862a4e "utf8", name=0x862a09 "ToCf", listsv=0xa95138,
minbits=4, none=0, invlist=0x0, flags_p=0x0) at utf8.c​:2346
#10 0x0000000000716945 in Perl_to_utf8_case (my_perl=0xa95010,
p=0xabf0ba
"”", ustrp=0x7fffffffd390 "\f", lenp=0x7fffffffd338,
swashp=0xa958c8, normal=0x862a09 "ToCf", special=0x86272a "") at
utf8.c​:1800
#11 0x0000000000717c24 in Perl__to_utf8_fold_flags (my_perl=0xa95010,
p=0xabf0ba "”", ustrp=0x7fffffffd390 "\f",
lenp=0x7fffffffd338, flags=2 '\002') at utf8.c​:2161
#12 0x0000000000721be7 in Perl_foldEQ_utf8_flags (my_perl=0xa95010,
s1=0xacc974 "phone", pe1=0x0, l1=5, u1=false, s2=0xabf0ba "”",
pe2=0x7fffffffd500, l2=0, u2=true, flags=0) at utf8.c​:4044
#13 0x0000000000701990 in S_regmatch (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startpos=0xabf0a4 "xxxx.xxxx@​outlook.com ”",
prog=0xacc8c8) at regexec.c​:4561
#14 0x00000000006fb104 in S_regtry (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startposp=0x7fffffffdc50) at regexec.c​:3231
#15 0x00000000006fa9fc in Perl_regexec_flags (my_perl=0xa95010,
rx=0xac4b68,
stringarg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”",
strend=0xabf0ac
"x@​outlook.com ”",
strbeg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”", minend=0,
sv=0xa98078, data=0x0, flags=1) at regexec.c​:3090
#16 0x00000000005bd269 in Perl_pp_subst (my_perl=0xa95010) at
pp_hot.c​:2120
#17 0x000000000055ad69 in Perl_runops_debug (my_perl=0xa95010) at
dump.c​:2353
#18 0x000000000045e9a2 in S_run_body (my_perl=0xa95010, oldscope=1) at
perl.c​:2416
#19 0x000000000045dd66 in perl_run (my_perl=0xa95010) at perl.c​:2339
#20 0x000000000041b35d in main (argc=3, argv=0x7fffffffe3f8,
env=0x7fffffffe418) at perlmain.c​:114

So you’ve beaten me to the backtrace.

Yes, sorry about that. Configure was driving me bonkers.

Do we really need to use the regex engine for swash init? Wouldnt the
sanest way to solve this class of bugs be to change how we store and
represent swashes?

Short of that, could we just stop the init code from using regexps itself?

That is what I meant.

That said, is it even necessary at present to localise $1 et al.? As
long as the swash init code does not access $1 after a failed match, would
it matter that the current (outer) regexp is in an inconsistent state? Or
could we make PL_curpm null during the swash init?

The latter is an amazingly good idea and I am testing a patch that does
exactly that as I type.

It passed basic regex tests. I am running a full test now, but at the same
time I pushed smoke-me/rt_122747 which includes

commit 55b10d6
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Thu Sep 11 21​:55​:08 2014 +0200

  perl #122747​: localize PL_curpm to null in _core_swash_init

  This is a naive patch to set PL_curpm to null before we do any
  swash intialization. This "hides" the current regop from the swash
  code, with the intent of prevent weird reentrancy bugs.

  Thanks to FC for the suggestion!

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 22​:00, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 21​:53, demerphq <demerphq@​gmail.com> wrote​:

On 11 September 2014 21​:40, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Thu Sep 11 12​:32​:57 2014, demerphq wrote​:

On 11 September 2014 20​:41, demerphq <demerphq@​gmail.com> wrote​:

Now I can replicate I have determined the the use re 'taint'; is
apparently unnecessary, the script I posted which prints $1 will
trigger it
as well. Reattached in this mail...

Since the re taint is not necessary this means the relation to the
utf8
loading and save_re_context and things like that is irrelevant.

I was misreading things. In fact this is relevant​:

#0 0x00007ffff70e9037 in __GI_raise (sig=sig@​entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c​:56
#1 0x00007ffff70ec698 in __GI_abort () at abort.c​:90
#2 0x00007ffff70e1e03 in __assert_fail_base (fmt=0x7ffff7239158
"%s%s%s​:%u​: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@​entry=0x7ce6b8 "(STRLEN)rx->sublen >=
(STRLEN)((s -
rx->subbeg) + i)", file=file@​entry=0x7cb018 "regcomp.c",
line=line@​entry=7552, function=function@​entry=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch") at
assert.c​:92
#3 0x00007ffff70e1eb2 in __GI___assert_fail (assertion=0x7ce6b8
"(STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)",
file=0x7cb018 "regcomp.c", line=7552, function=0x7d5430
<__PRETTY_FUNCTION__.17671> "Perl_reg_numbered_buff_fetch")
at assert.c​:101
#4 0x000000000051689b in Perl_reg_numbered_buff_fetch
(my_perl=0xa95010,
r=0xac4b68, paren=1, sv=0xacd388) at regcomp.c​:7552
#5 0x00000000005721e6 in Perl_magic_get (my_perl=0xa95010,
sv=0xacd388,
mg=0xad36f8) at mg.c​:789
#6 0x000000000056fc1f in Perl_mg_get (my_perl=0xa95010, sv=0xacd388)
at
mg.c​:199
#7 0x000000000066352c in Perl_save_scalar (my_perl=0xa95010,
gv=0xacd370)
at scope.c​:206
#8 0x0000000000542044 in Perl_save_re_context (my_perl=0xa95010) at
regcomp.c​:16814
#9 0x00000000007183cb in Perl__core_swash_init (my_perl=0xa95010,
pkg=0x862a4e "utf8", name=0x862a09 "ToCf", listsv=0xa95138,
minbits=4, none=0, invlist=0x0, flags_p=0x0) at utf8.c​:2346
#10 0x0000000000716945 in Perl_to_utf8_case (my_perl=0xa95010,
p=0xabf0ba
"”", ustrp=0x7fffffffd390 "\f", lenp=0x7fffffffd338,
swashp=0xa958c8, normal=0x862a09 "ToCf", special=0x86272a "") at
utf8.c​:1800
#11 0x0000000000717c24 in Perl__to_utf8_fold_flags (my_perl=0xa95010,
p=0xabf0ba "”", ustrp=0x7fffffffd390 "\f",
lenp=0x7fffffffd338, flags=2 '\002') at utf8.c​:2161
#12 0x0000000000721be7 in Perl_foldEQ_utf8_flags (my_perl=0xa95010,
s1=0xacc974 "phone", pe1=0x0, l1=5, u1=false, s2=0xabf0ba "”",
pe2=0x7fffffffd500, l2=0, u2=true, flags=0) at utf8.c​:4044
#13 0x0000000000701990 in S_regmatch (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startpos=0xabf0a4 "xxxx.xxxx@​outlook.com ”",
prog=0xacc8c8) at regexec.c​:4561
#14 0x00000000006fb104 in S_regtry (my_perl=0xa95010,
reginfo=0x7fffffffddf0, startposp=0x7fffffffdc50) at regexec.c​:3231
#15 0x00000000006fa9fc in Perl_regexec_flags (my_perl=0xa95010,
rx=0xac4b68,
stringarg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”",
strend=0xabf0ac
"x@​outlook.com ”",
strbeg=0xabf098 "A¹kerèeva xxxx.xxxx@​outlook.com ”", minend=0,
sv=0xa98078, data=0x0, flags=1) at regexec.c​:3090
#16 0x00000000005bd269 in Perl_pp_subst (my_perl=0xa95010) at
pp_hot.c​:2120
#17 0x000000000055ad69 in Perl_runops_debug (my_perl=0xa95010) at
dump.c​:2353
#18 0x000000000045e9a2 in S_run_body (my_perl=0xa95010, oldscope=1) at
perl.c​:2416
#19 0x000000000045dd66 in perl_run (my_perl=0xa95010) at perl.c​:2339
#20 0x000000000041b35d in main (argc=3, argv=0x7fffffffe3f8,
env=0x7fffffffe418) at perlmain.c​:114

So you’ve beaten me to the backtrace.

Yes, sorry about that. Configure was driving me bonkers.

Do we really need to use the regex engine for swash init? Wouldnt the
sanest way to solve this class of bugs be to change how we store and
represent swashes?

Short of that, could we just stop the init code from using regexps
itself?

That is what I meant.

That said, is it even necessary at present to localise $1 et al.? As
long as the swash init code does not access $1 after a failed match, would
it matter that the current (outer) regexp is in an inconsistent state? Or
could we make PL_curpm null during the swash init?

The latter is an amazingly good idea and I am testing a patch that does
exactly that as I type.

It passed basic regex tests. I am running a full test now, but at the same
time I pushed smoke-me/rt_122747 which includes

commit 55b10d6
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Thu Sep 11 21​:55​:08 2014 +0200

perl \#122747&#8203;: localize PL\_curpm to null in \_core\_swash\_init

This is a naive patch to set PL\_curpm to null before we do any
swash intialization\. This "hides" the current regop from the swash
code\, with the intent of prevent weird reentrancy bugs\.

Thanks to FC for the suggestion\!

And now I just pushed to blead​:

commit 2c1f00b
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Thu Sep 11 21​:55​:08 2014 +0200

  perl #122747​: localize PL_curpm to null in _core_swash_init

  Set PL_curpm to null before we do any swash intialization
  in _core_swash_init(). This "hides" the current regop from the
  swash code, with the intent of prevent weird reentrancy bugs
  when the swashes are initialized.

  Long term you could argue that we should just not use the regex
  engine to initialize a swash, and then this would be unnecessary.

  Thanks to FC for the suggestion!

I believe that this ticket can be closed.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @cpansprout

On Thu Sep 11 13​:49​:31 2014, demerphq wrote​:

And now I just pushed to blead​:

commit 2c1f00b
...
I believe that this ticket can be closed.

Wait. Two things​:

• Let’s commit a test.
• Can we remove the $1 localisation?

That localisation doesn’t make much sense to me, even without your PL_curpm change. Saving and restoring the value of something that is just a proxy for a value stored elsewhere is weird. Can we just delete save_re_context?

(I guess the latter issue doesn’t need to keep the ticket open.)

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @demerphq

On 11 September 2014 23​:42, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Thu Sep 11 13​:49​:31 2014, demerphq wrote​:

And now I just pushed to blead​:

commit 2c1f00b
...
I believe that this ticket can be closed.

Wait. Two things​:

• Let’s commit a test.

Ah, ahem. Good catch. :-)

• Can we remove the $1 localisation?

That localisation doesn’t make much sense to me, even without your
PL_curpm change. Saving and restoring the value of something that is just
a proxy for a value stored elsewhere is weird. Can we just delete
save_re_context?

(I guess the latter issue doesn’t need to keep the ticket open.)

Agreed. Lets open a different ticket for that.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From Mark.Martinec@ijs.si

I believe that this ticket can be closed.

Thanks you all for the fast resolution!

Will this be able to make it into 5.20.1-RC3 ?

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 11, 2014

From @cpansprout

On Thu Sep 11 14​:59​:25 2014, mmartinec wrote​:

I believe that this ticket can be closed.

Thanks you all for the fast resolution!

Will this be able to make it into 5.20.1-RC3 ?

Seeing that this is a crashing bug, it meets the policy. It gets my vote. (2c1f00b, that is, which alone is sufficient for maint.)

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @iabyn

On Thu, Sep 11, 2014 at 03​:12​:12PM -0700, Father Chrysostomos via RT wrote​:

On Thu Sep 11 14​:59​:25 2014, mmartinec wrote​:

I believe that this ticket can be closed.

Thanks you all for the fast resolution!

Will this be able to make it into 5.20.1-RC3 ?

Seeing that this is a crashing bug, it meets the policy. It gets my vote. (2c1f00b, that is, which alone is sufficient for maint.)

On the other hand, its a long-standing (and clearly rare) issue, and
squeezing it in at the very last gasp into RC3 when it's had no time to
settle or be BBCed seems like a really good way to inadvertently break
5.20.1. There's always 5.20.2.

--
A walk of a thousand miles begins with a single step...
then continues for another 1,999,999 or so.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @jkeenan

On 09/12/2014 06​:27 AM, Dave Mitchell wrote​:

On Thu, Sep 11, 2014 at 03​:12​:12PM -0700, Father Chrysostomos via RT wrote​:

On Thu Sep 11 14​:59​:25 2014, mmartinec wrote​:

I believe that this ticket can be closed.

Thanks you all for the fast resolution!

Will this be able to make it into 5.20.1-RC3 ?

Seeing that this is a crashing bug, it meets the policy. It gets my vote. (2c1f00b, that is, which alone is sufficient for maint.)

On the other hand, its a long-standing (and clearly rare) issue, and
squeezing it in at the very last gasp into RC3 when it's had no time to
settle or be BBCed seems like a really good way to inadvertently break
5.20.1. There's always 5.20.2.

What-Dave-said++

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 13​:08, James E Keenan <jkeen@​verizon.net> wrote​:

On 09/12/2014 06​:27 AM, Dave Mitchell wrote​:

On Thu, Sep 11, 2014 at 03​:12​:12PM -0700, Father Chrysostomos via RT
wrote​:

On Thu Sep 11 14​:59​:25 2014, mmartinec wrote​:

I believe that this ticket can be closed.

Thanks you all for the fast resolution!

Will this be able to make it into 5.20.1-RC3 ?

Seeing that this is a crashing bug, it meets the policy. It gets my
vote. (2c1f00b, that is, which alone is sufficient for maint.)

On the other hand, its a long-standing (and clearly rare) issue, and
squeezing it in at the very last gasp into RC3 when it's had no time to
settle or be BBCed seems like a really good way to inadvertently break
5.20.1. There's always 5.20.2.

What-Dave-said++

FWIW, I think delaying for a minor release is a good idea. If Mark really
needs this he can cherry-pick the patch and build a custom perl.

Sorry for any inconvenience Mark, but this patch has characteristics, (such
as being almost *too* easy), which make me a think a bit of time to cook in
blead is a good idea.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From Mark.Martinec@ijs.si

Seeing that this is a crashing bug, it meets the policy. It gets my
vote. (2c1f00b, that is, which alone is sufficient for maint.)

On the other hand, its a long-standing (and clearly rare) issue, and
squeezing it in at the very last gasp into RC3 when it's had no time
to
settle or be BBCed seems like a really good way to inadvertently
break
5.20.1. There's always 5.20.2.

What-Dave-said++

FWIW, I think delaying for a minor release is a good idea. If Mark
really
needs this he can cherry-pick the patch and build a custom perl.

Sorry for any inconvenience Mark, but this patch has characteristics,
(such
as being almost *too* easy), which make me a think a bit of time to
cook in
blead is a good idea.

Sure, understood. It is indeed uncomfortably close to a 5.20.1 release.

On the other hand, its a long-standing (and clearly rare) issue

If the issue is otherwise not harmful (like causing memory corruption)
and the assert failure can be safely ignored, perhaps there can just be
a warning in perldelta that -DDEBUGGING must not be used in regular use.

The event is not so rare (e.g. one such case per several days of mail
filtering), but goes by unnoticed as the system-installed perl is
usually not built with debugging. I had debugging enabled because of
trying out a fresh release candidate.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2014

From @demerphq

On 12 September 2014 17​:46, Mark Martinec <Mark.Martinec@​ijs.si> wrote​:

Seeing that this is a crashing bug, it meets the policy. It gets my

vote. (2c1f00b, that is, which alone is sufficient for maint.)

On the other hand, its a long-standing (and clearly rare) issue, and
squeezing it in at the very last gasp into RC3 when it's had no time to
settle or be BBCed seems like a really good way to inadvertently break
5.20.1. There's always 5.20.2.

What-Dave-said++

FWIW, I think delaying for a minor release is a good idea. If Mark really
needs this he can cherry-pick the patch and build a custom perl.

Sorry for any inconvenience Mark, but this patch has characteristics,
(such
as being almost *too* easy), which make me a think a bit of time to cook
in
blead is a good idea.

Sure, understood. It is indeed uncomfortably close to a 5.20.1 release.

On the other hand, its a long-standing (and clearly rare) issue

If the issue is otherwise not harmful (like causing memory corruption)
and the assert failure can be safely ignored, perhaps there can just be
a warning in perldelta that -DDEBUGGING must not be used in regular use.

I think that is correct. I would need to audit to be sure.

The event is not so rare (e.g. one such case per several days of mail
filtering),

Hrm. That is annoying.

but goes by unnoticed as the system-installed perl is
usually not built with debugging. I had debugging enabled because of
trying out a fresh release candidate.

I see.

I dont know if that changes the balance of issues enough to justify it
going into 5.20.1. I will leave that to others to decide.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 14, 2014

From @demerphq

On 11 September 2014 23​:42, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Thu Sep 11 13​:49​:31 2014, demerphq wrote​:

And now I just pushed to blead​:

commit 2c1f00b
...
I believe that this ticket can be closed.

Wait. Two things​:

• Let’s commit a test.

Done in​: 409c647

• Can we remove the $1 localisation?

That localisation doesn’t make much sense to me, even without your
PL_curpm change. Saving and restoring the value of something that is just
a proxy for a value stored elsewhere is weird. Can we just delete
save_re_context?

Did you already follow up on this?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 14, 2014

From @cpansprout

On Sun Sep 14 09​:57​:14 2014, demerphq wrote​:

On 11 September 2014 23​:42, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

• Can we remove the $1 localisation?

That localisation doesn’t make much sense to me, even without your
PL_curpm change. Saving and restoring the value of something that is just
a proxy for a value stored elsewhere is weird. Can we just delete
save_re_context?

Did you already follow up on this?

Yes, in these commits (in reverse order)​:

2018906 pp_ctl.c​: Remove junk from #endif
0ddd4a5 Mathomise save_re_context
e32ff4e pp_ctl.c​: Remove PL_curcop assignment
1a419e6 utf8.c​: Move an #ifndef for clarity
1ca1bae Remove obsolete comment from utf8.c
d28a925 Don’t call save_re_context
b4fa55d Gut Perl_save_re_context

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 14, 2014

From @demerphq

On 14 September 2014 21​:49, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Sun Sep 14 09​:57​:14 2014, demerphq wrote​:

On 11 September 2014 23​:42, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

• Can we remove the $1 localisation?

That localisation doesn’t make much sense to me, even without your
PL_curpm change. Saving and restoring the value of something that is
just
a proxy for a value stored elsewhere is weird. Can we just delete
save_re_context?

Did you already follow up on this?

Yes, in these commits (in reverse order)​:

2018906 pp_ctl.c​: Remove junk from #endif
0ddd4a5 Mathomise save_re_context
e32ff4e pp_ctl.c​: Remove PL_curcop assignment
1a419e6 utf8.c​: Move an #ifndef for clarity
1ca1bae Remove obsolete comment from utf8.c
d28a925 Don’t call save_re_context
b4fa55d Gut Perl_save_re_context

Great. Thanks. BTW, did you dig into the history of the function to see why
it was added in the first place? Did it ever make sense?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 14, 2014

From @cpansprout

On Sun Sep 14 14​:15​:18 2014, demerphq wrote​:

On 14 September 2014 21​:49, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Sun Sep 14 09​:57​:14 2014, demerphq wrote​:

On 11 September 2014 23​:42, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

• Can we remove the $1 localisation?

That localisation doesn’t make much sense to me, even without your
PL_curpm change. Saving and restoring the value of something that is
just
a proxy for a value stored elsewhere is weird. Can we just delete
save_re_context?

Did you already follow up on this?

Yes, in these commits (in reverse order)​:

2018906 pp_ctl.c​: Remove junk from #endif
0ddd4a5 Mathomise save_re_context
e32ff4e pp_ctl.c​: Remove PL_curcop assignment
1a419e6 utf8.c​: Move an #ifndef for clarity
1ca1bae Remove obsolete comment from utf8.c
d28a925 Don’t call save_re_context
b4fa55d Gut Perl_save_re_context

Great. Thanks. BTW, did you dig into the history of the function to see why
it was added in the first place?

7d75537 explains the history.

Did it ever make sense?

Yes, originally save_re_context saved a whole list of global variables used by the regexp engine (which no longer exist).

The localisation of $1 etc. was added later. I’m not sure that part ever made sense. I’m not sure either that it’s worth trying to figure it out. It was commit ada6e8a that did it, to fix bug #18107.

--

Father Chrysostomos

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 14, 2014

From @demerphq

On 14 September 2014 23​:30, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Sun Sep 14 14​:15​:18 2014, demerphq wrote​:

On 14 September 2014 21​:49, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Sun Sep 14 09​:57​:14 2014, demerphq wrote​:

On 11 September 2014 23​:42, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

• Can we remove the $1 localisation?

That localisation doesn’t make much sense to me, even without your
PL_curpm change. Saving and restoring the value of something that
is
just
a proxy for a value stored elsewhere is weird. Can we just delete
save_re_context?

Did you already follow up on this?

Yes, in these commits (in reverse order)​:

2018906 pp_ctl.c​: Remove junk from #endif
0ddd4a5 Mathomise save_re_context
e32ff4e pp_ctl.c​: Remove PL_curcop assignment
1a419e6 utf8.c​: Move an #ifndef for clarity
1ca1bae Remove obsolete comment from utf8.c
d28a925 Don’t call save_re_context
b4fa55d Gut Perl_save_re_context

Great. Thanks. BTW, did you dig into the history of the function to see
why
it was added in the first place?

7d75537 explains the history.

Did it ever make sense?

Yes, originally save_re_context saved a whole list of global variables
used by the regexp engine (which no longer exist).

The localisation of $1 etc. was added later. I’m not sure that part ever
made sense. I’m not sure either that it’s worth trying to figure it out.
It was commit ada6e8a that did it, to fix bug #18107.

Cool thanks, that was very educational.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 15, 2014

@cpansprout - Status changed from 'open' to 'pending release'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 15, 2014

From @iabyn

On Sun, Sep 14, 2014 at 02​:30​:41PM -0700, Father Chrysostomos via RT wrote​:

The localisation of $1 etc. was added later. I’m not sure that part
ever made sense. I’m not sure either that it’s worth trying to figure
it out. It was commit ada6e8a that did it, to fix bug #18107.

I've always had the strong suspicion that the $1 localization was an
incorrect bug-fix, and its been on my list of things to look at. Thanks
for sorting it out.

--
The optimist believes that he lives in the best of all possible worlds.
As does the pessimist.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 2, 2015

From @khwilliamson

Thanks for submitting this ticket

The issue should be resolved with the release today of Perl v5.22. If you find that the problem persists, feel free to reopen this ticket

--
Karl Williamson for the Perl 5 porters team

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 2, 2015

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.