Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index segfaults on 2G strings in 64bit perl #13700

Closed
p5pRT opened this issue Apr 2, 2014 · 14 comments

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Apr 2, 2014

Migrated from rt.perl.org#121562 (status was 'resolved')

Searchable as RT121562$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 2, 2014

From mike.chamberlain@pirum.com

Created by mike.chamberlain@pirum.com

This is a bug report for perl from mike.chamberlain@​pirum.com,
generated with the help of perlbug 1.39 running under perl 5.14.2.

-----------------------------------------------------------------

Hi

We're attemping to parse a large file over 2G in size, and it's segfaulting. We are scanning
the file to identify key points within it. Ultimately the simplest implementation of the bug is​:

chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault (core dumped)
chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-2); $x .="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
END​: 2147483646

Regards

Mike

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl 5.14.2:

Configured by Debian Project at Mon Sep 30 03:43:22 UTC 2013.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
    uname='linux babin 3.2.0-4-amd64 #1 smp debian 3.2.46-1+deb7u1 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Dldflags= -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.14 -Darchlib=/usr/lib/perl/5.14 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.14.2 -Dsitearch=/usr/local/lib/perl/5.14.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.14.2 -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.7.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=, so=so, useshrplib=true, libperl=libperl.so.5.14.2
    gnulibc_version='2.13'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'

Locally applied patches:
    DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
    DEBPKG:fixes/respect_umask - Respect umask during installation
    DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
    DEBPKG:debian/extutils_set_libperl_path - EU:MM: Set location of libperl.a to /usr/lib
    DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
    DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/517938 http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
    DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option
    DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
    DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
    DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/580034 Point users to Debian packages of deprecated core modules
    DEBPKG:fixes/hurd-ccflags - [a190e64] http://bugs.debian.org/587901 [perl #92244] Make hints/gnu.sh append to $ccflags rather than overriding them
    DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
    DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
    DEBPKG:fixes/extutils-cbuilder-cflags - [011e8fb] http://bugs.debian.org/624460 [perl #89478] Append CFLAGS and LDFLAGS to their Config.pm counterparts in EU::CBuilder
    DEBPKG:fixes/module-build-home-directory - http://bugs.debian.org/624850 [rt.cpan.org #67893] Fix failing tilde test when run under a UID without a passwd entry
    DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.14.2-21+deb7u1 in patchlevel.h
    DEBPKG:fixes/h2ph-multiarch - [e7ec705] http://bugs.debian.org/625808 [perl #90122] Make h2ph correctly search gcc include directories
    DEBPKG:fixes/index-tainting - [3b36395] http://bugs.debian.org/291450 [perl #64804] RT 64804: tainting with index() of a constant
    DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
    DEBPKG:fixes/sys-syslog-socket-timeout-kfreebsd.patch - http://bugs.debian.org/627821 [rt.cpan.org #69997] Use a socket timeout on GNU/kFreeBSD to catch ICMP port unreachable messages
    DEBPKG:fixes/hurd-hints - http://bugs.debian.org/636609 Improve general GNU hints, needed for GNU/Hurd.
    DEBPKG:fixes/pod_fixes - [7698aed] http://bugs.debian.org/637816 Fix typos in several pod/perl*.pod files
    DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
    DEBPKG:fixes/digest_eval_hole - http://bugs.debian.org/644108 Close the eval "require $module" security hole in Digest->new($algorithm)
    DEBPKG:fixes/hurd-ndbm - [f0d0a20] [perl #102680] http://bugs.debian.org/645989 Add GNU/Hurd hints for NDBM_File
    DEBPKG:fixes/sysconf.t-posix - [8040185] [perl #102888] http://bugs.debian.org/646016 Fix hang in ext/POSIX/t/sysconf.t on GNU/Hurd
    DEBPKG:fixes/hurd-largefile - [1fda587] [perl #103014] http://bugs.debian.org/645790 enable LFS on GNU/Hurd
    DEBPKG:debian/hurd_test_todo_syslog - http://bugs.debian.org/650093 Disable failing GNU/Hurd tests in cpan/Sys-Syslog/t/syslog.t
    DEBPKG:fixes/hurd_skip_itimer_virtual - [rt.cpan.org #72754] http://bugs.debian.org/650094 Skip interval timer tests in Time::HiRes on GNU/Hurd
    DEBPKG:debian/hurd_test_skip_sigdispatch - http://bugs.debian.org/650188 Disable failing GNU/Hurd tests op/sigdispatch.t
    DEBPKG:debian/hurd_test_skip_stack - http://bugs.debian.org/650175 Disable failing GNU/Hurd tests dist/threads/t/stack.t
    DEBPKG:debian/hurd_test_skip_pipe - http://bugs.debian.org/650187 Disable failing GNU/Hurd tests io/pipe.t
    DEBPKG:debian/hurd_test_skip_io_pipe - http://bugs.debian.org/650096 Disable failing GNU/Hurd tests dist/IO/t/io_pipe.t
    DEBPKG:fixes/manpage_name_CPAN - http://bugs.debian.org/650448 [rt.cpan.org #73396] cpan/CPAN: add NAME headings in modules with POD
    DEBPKG:fixes/manpage_name_CPANPLUS - http://bugs.debian.org/650450 [rt.cpan.org #73398] cpan/CPANPLUS: add NAME headings in modules with POD
    DEBPKG:fixes/manpage_name_Test-Harness - http://bugs.debian.org/650451 [rt.cpan.org #73399] cpan/Test-Harness: add NAME headings in modules with POD
    DEBPKG:fixes/manpage_name_Term-UI - http://bugs.debian.org/650452 [rt.cpan.org #73400] cpan/Term-UI: add NAME headings in modules with POD
    DEBPKG:fixes/podlators_ae_ligature_fallback - http://bugs.debian.org/652851 Fix the ASCII fallback string for AE
    DEBPKG:fixes/fsf_postal_address - [de89470] Update references to the FSF's postal address
    DEBPKG:fixes/cpan_module_pod_fixes - [perl #106870] [rt.cpan.org #73447] [rt.cpan.org #73446] Fix POD formatting in Term-Cap and Pod-Parser
    DEBPKG:fixes/cgi_no_shellwords_pl - Use Text::ParseWords instead of shellwords.pl
    DEBPKG:fixes/path_max_fallback - [perl #109262] http://bugs.debian.org/656869 Don't use _POSIX_PATH_MAX as a fallback PATH_MAX
    DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/660195 [rt.cpan.org #28632] Make EU::MM pass LD through to recursive Makefile.PL invocations
    DEBPKG:fixes/propagate_tainted_errors.patch - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=663158 [perl #111654] properly propagate tainted errors
    DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
    DEBPKG:fixes/socket_cache_propagate - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=659075 [rt.cpan.org #61577] [perl #112736] sockdomain and socktype undef on newly accepted sockets
    DEBPKG:fixes/ipc_open3 - [perl #114454] http://bugs.debian.org/683894 IPC::Open3::open3(..., '-') broken
    DEBPKG:fixes/string_repeat_overrun - http://bugs.debian.org/689314 [b675304] avoid calling memset with a negative count
    DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
    DEBPKG:fixes/kfreebsd-overrides - http://bugs.debian.org/689713 [perl #115324] [7dc6565] Remove unnecessary overrides in gnukfreebsd and gnuknetbsd hints.
    DEBPKG:fixes/tainted-smartmatch - [be88a5c] http://bugs.debian.org/690571 [perl #93590] $tainted ~~ [...] failing
    DEBPKG:fixes/regexp-matching-starter - [6e634c5] http://bugs.debian.org/690975 [perl #101710] Regression with /i, latin1 chars.
    DEBPKG:fixes/regexp-matching-fold - [399fb9c] http://bugs.debian.org/690976 regexec.c: Fix "\x{FB01}\x{FB00}" =~ /ff/i
    DEBPKG:fixes/regexp-matching-opposite-case - [dc91d5a] http://bugs.debian.org/690979 [perl #101970] /[[:lower:]]/i matches upper case
    DEBPKG:fixes/reading-glob-copy-handle - [fd1564b] http://bugs.debian.org/629363 [perl #92258] <$fh> hangs on a glob copy
    DEBPKG:fixes/smartmatch-rhs-precedence - http://bugs.debian.org/691102 [011be0b] Enforce Any ~~ Object smartmatch precedence
    DEBPKG:fixes/perlcheat-update - http://bugs.debian.org/691112 [ab0ae0a] Update PerlCheat to 5.14
    DEBPKG:fixes/cgi-cr-escaping - http://bugs.debian.org/693420 CR escaping for P3P and Set-Cookie headers
    DEBPKG:fixes/maketext-code-execution - [1735f6f] http://bugs.debian.org/695224 Fix misparsing of maketext strings.
    DEBPKG:fixes/storable-security-warning - [664f237] http://bugs.debian.org/695223 add a note about security concerns in Storable
    DEBPKG:fixes/digest-sha-doublefree - [rt.cpan.org #82655] http://bugs.debian.org/698172 [a8c6ff7] Fix a double-free bug in Digest::SHA
    DEBPKG:fixes/64bitint-signedness-wraparound - http://bugs.debian.org/698320 [94e529c] Avoid wraparound when casting unsigned size_t to signed ssize_t.
    DEBPKG:fixes/stdin-sigchld - http://bugs.debian.org/700171 [perl #116621] [be48bbe] add a couple missing LEAVEs in perlio_async_run()
    DEBPKG:fixes/hsplit-rehash - [d59e31f] http://bugs.debian.org/702296 Prevent premature hsplit() calls, and only trigger REHASH after hsplit()
    DEBPKG:fixes/encode-memleak - http://bugs.debian.org/702416 [5814803] Encode: Fixed a memory leak that occurred in the UTF-8 encoding.
    DEBPKG:fixes/threads_shared_elements_crash - [perl #119089] http://bugs.debian.org/718438 threads::shared should not crash if shared elements outlive their aggregate.
    DEBPKG:fixes/perlbug-patchlist - [3541c11] http://bugs.debian.org/710842 [perl #118433] Make perlbug look up the list of local patches at run time
    DEBPKG:fixes/digest_sha_double_free - [ee8c6f4] [rt.cpan.org #86295] http://bugs.debian.org/711206 maint-5.18: Digest-SHA crash fix in 5.85
    DEBPKG:fixes/pl_eval_start_use_after_free - [eae139f] [perl #115992] PL_eval_start use-after-free
    DEBPKG:fixes/regcomp_fix_segv - [ebb390a] [perl #115994] fix segv in regcomp.c:S_join_exact()
    DEBPKG:fixes/list_util_off_by_two - [623a911] fix off-by-two error in List::Util
    DEBPKG:fixes/sdbm_off_by_one - [7f5f08b] [perl #111586] sdbm.c: fix off-by-one access to global ".dir"
    DEBPKG:fixes/socket_unpack_sockaddr_un_heap_buffer_overflow - [e508642] [perl #111594] Socket::unpack_sockaddr_un heap-buffer-overflow


@INC for perl 5.14.2:
    ..
    /usr/local/pirum/lib
    /usr/local/pirum/
    /etc/perl
    /usr/local/lib/perl/5.14.2
    /usr/local/share/perl/5.14.2
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.14
    /usr/share/perl/5.14
    /usr/local/lib/site_perl
    .


Environment for perl 5.14.2:
    HOME=/home/chambm
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/pirum/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/local/mysql/bin
    PERL5LIB=..:/usr/local/pirum/lib:/usr/local/pirum/
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 3, 2014

From @tonycoz

On Wed Apr 02 02​:18​:05 2014, mike.chamberlain@​pirum.com wrote​:

We're attemping to parse a large file over 2G in size, and it's
segfaulting. We are scanning
the file to identify key points within it. Ultimately the simplest
implementation of the bug is​:

chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); $x
.="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault (core dumped)
chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-2); $x
.="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
END​: 2147483646

Reproduced in blead​:

[tonyc@​dromedary-001 perl]$ ./perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault
[tonyc@​dromedary-001 perl]$ ./perl -v

This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.10-34-g6447043)) built for x86_64-linux-thread-multi

It looks like pp_index suffers from the I32 bug, assuming fbm_instr() and rninstr() are safe it should be easy to fix.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 3, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 9, 2014

From @tonycoz

On Thu Apr 03 15​:44​:40 2014, tonyc wrote​:

On Wed Apr 02 02​:18​:05 2014, mike.chamberlain@​pirum.com wrote​:

We're attemping to parse a large file over 2G in size, and it's
segfaulting. We are scanning
the file to identify key points within it. Ultimately the simplest
implementation of the bug is​:

chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); $x
.="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault (core dumped)
chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-2); $x
.="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
END​: 2147483646

It looks like pp_index suffers from the I32 bug, assuming fbm_instr()
and rninstr() are safe it should be easy to fix.

Fix attached, for 5.21, though perhaps it should be in 5.20.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 9, 2014

From @tonycoz

0001-perl-121562-fix-the-I32-bug-for-index-and-rindex.patch
From eca5fe3ca0af1af58bcf4ae90fb24ede7c9d4c56 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 9 Apr 2014 06:15:08 +0200
Subject: [perl #121562] fix the I32 bug for index() and rindex()

---
 MANIFEST         |    1 +
 pp.c             |    6 +++---
 t/bigmem/index.t |   26 ++++++++++++++++++++++++++
 3 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 t/bigmem/index.t

diff --git a/MANIFEST b/MANIFEST
index 9652cd5..190315f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -4920,6 +4920,7 @@ t/base/rs.t			See if record-read works
 t/base/term.t			See if various terms work
 t/base/while.t			See if while work
 t/benchmark/rt26188-speed-up-keys-on-empty-hash.t	Benchmark if keys on empty hashes is fast enough
+t/bigmem/index.t		Check that index() handles large offsets
 t/bigmem/pos.t			Check that pos() handles large offsets
 t/bigmem/read.t			Check read() handles large offsets
 t/bigmem/regexp.t		Test regular expressions with large strings
diff --git a/pp.c b/pp.c
index 4ec6887..071b4f0 100644
--- a/pp.c
+++ b/pp.c
@@ -3197,8 +3197,8 @@ PP(pp_index)
     SV *temp = NULL;
     STRLEN biglen;
     STRLEN llen = 0;
-    I32 offset;
-    I32 retval;
+    SSize_t offset;
+    SSize_t retval;
     const char *big_p;
     const char *little_p;
     bool big_utf8;
@@ -3287,7 +3287,7 @@ PP(pp_index)
     }
     if (offset < 0)
 	offset = 0;
-    else if (offset > (I32)biglen)
+    else if (offset > (SSize_t)biglen)
 	offset = biglen;
     if (!(little_p = is_index
 	  ? fbm_instr((unsigned char*)big_p + offset,
diff --git a/t/bigmem/index.t b/t/bigmem/index.t
new file mode 100644
index 0000000..0c3658c
--- /dev/null
+++ b/t/bigmem/index.t
@@ -0,0 +1,26 @@
+#!perl
+BEGIN {
+    chdir 't';
+    unshift @INC, "../lib";
+}
+
+use strict;
+require './test.pl';
+use Config qw(%Config);
+
+# some copying means we end up using 4GB, checked with top
+$ENV{PERL_TEST_MEMORY} >= 4
+    or skip_all("Need ~4GB for this test");
+$Config{ptrsize} >= 8
+    or skip_all("Need 64-bit pointers for this test");
+
+plan(tests => 2);
+
+my $space = " "; # avoid constant folding from doubling memory usage
+my $work = $space x 0x80000000 . "\n\n";
+
+# this would SEGV
+is(index($work, "\n"), 0x80000000, "test index() over 2G mark");
+
+# this would simply fail
+is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark");
-- 
1.7.1

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 10, 2014

From @ilmari

"Tony Cook via RT" <perlbug-followup@​perl.org> writes​:

On Thu Apr 03 15​:44​:40 2014, tonyc wrote​:

On Wed Apr 02 02​:18​:05 2014, mike.chamberlain@​pirum.com wrote​:

We're attemping to parse a large file over 2G in size, and it's
segfaulting. We are scanning
the file to identify key points within it. Ultimately the simplest
implementation of the bug is​:

chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); $x
.="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault (core dumped)
chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-2); $x
.="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
END​: 2147483646

It looks like pp_index suffers from the I32 bug, assuming fbm_instr()
and rninstr() are safe it should be easy to fix.

Fix attached, for 5.21, though perhaps it should be in 5.20.

It needs to be switched to use sv_pos_(u2b|b2u)_flags as well, to handle
UTF-8 strings properly​:

pp.c​: In function ‘Perl_pp_index’​:
pp.c​:3284​:6​: warning​: passing argument 2 of ‘Perl_sv_pos_u2b’ from incompatible pointer type [enabled by default]
  sv_pos_u2b(big, &offset, 0);
  ^
In file included from perl.h​:5012​:0,
  from pp.c​:28​:
proto.h​:4266​:20​: note​: expected ‘I32 * const’ but argument is of type ‘ssize_t *’
PERL_CALLCONV void Perl_sv_pos_u2b(pTHX_ SV *const sv, I32 *const offsetp, I32 *const lenp)
  ^
pp.c​:3301​:6​: warning​: passing argument 2 of ‘Perl_sv_pos_b2u’ from incompatible pointer type [enabled by default]
  sv_pos_b2u(big, &retval);
  ^
In file included from perl.h​:5012​:0,
  from pp.c​:28​:
proto.h​:4256​:20​: note​: expected ‘I32 * const’ but argument is of type ‘ssize_t *’
PERL_CALLCONV void Perl_sv_pos_b2u(pTHX_ SV *const sv, I32 *const offsetp)
  ^

And adding utf8​::upgrade($work) to t/bigmem/index.t, gives​:

ilmari@​nurket​:~/src/perl/t$ PERL_TEST_MEMORY=4 ../perl -I../lib bigmem/index.t
1..2
panic​: sv_pos_b2u​: bad byte offset, blen=2147483650, byte=18446744071562067968 at bigmem/index.t line 23.
# Looks like you planned 2 tests but ran 0.

--
"A disappointingly low fraction of the human race is,
at any given time, on fire." - Stig Sandbeck Mathisen

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 10, 2014

From @karenetheridge

On Thu, Apr 03, 2014 at 03​:44​:41PM -0700, Tony Cook via RT wrote​:

Reproduced in blead​:

[tonyc@​dromedary-001 perl]$ ./perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault
[tonyc@​dromedary-001 perl]$ ./perl -v

This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.10-34-g6447043)) built for x86_64-linux-thread-multi

FWIW, I cannot reproduce this on 64-bit darwin​:

$; perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
END​: 2147483647

$; perl -v
This is perl 5, version 19, subversion 10 (v5.19.10) built for darwin-2level
$; uname -a
Darwin bourbon 12.5.0 Darwin Kernel Version 12.5.0​: Sun Sep 29 13​:33​:47 PDT 2013; root​:xnu-2050.48.121/RELEASE_X86_64 x86_64
$; perl -V | grep 32
$; perl -V | grep 64
  uname='darwin bourbon 12.5.0 darwin kernel version 12.5.0​: sun sep 29 13​:33​:47 pdt 2013; root​:xnu-2050.48.12
1release_x86_64 x86_64 '
  use64bitint=define, use64bitall=define, uselongdouble=undef
  PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 10, 2014

From @shlomif

On Wed Apr 09 17​:32​:07 2014, perl@​froods.org wrote​:

On Thu, Apr 03, 2014 at 03​:44​:41PM -0700, Tony Cook via RT wrote​:

Reproduced in blead​:

[tonyc@​dromedary-001 perl]$ ./perl -e 'my $x = " " x ((2**31 )-1); $x
.="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault
[tonyc@​dromedary-001 perl]$ ./perl -v

This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.10-34-
g6447043)) built for x86_64-linux-thread-multi

For what it’s worth, I can reproduce this on Mageia Linux x86-64 5/Cauldron with both the Mageia perl-5.18.2-4.mga5 ( /usr/bin/perl ) and bleadperl​:

[SHELL]
shlomif@​telaviv1​:~$ /home/shlomif/apps/perl/bleadperl/bin/perl5.19.11 -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault
shlomif@​telaviv1​:~$ /home/shlomif/apps/perl/bleadperl/bin/perl5.19.11 -v

This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.9-308-g935db47*)) built for x86_64-linux
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2014, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http​://www.perl.org/, the Perl Home Page.

shlomif@​telaviv1​:~$ perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault
shlomif@​telaviv1​:~$ perl -v

This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-thread-multi
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2013, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http​://www.perl.org/, the Perl Home Page.

shlomif@​telaviv1​:~$
[/SHELL]

FWIW, I cannot reproduce this on 64-bit darwin​:

$; perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x,
"\n", 0); print "END​: $end\n"'
END​: 2147483647

Does valgrind complain about something when doing that?

Regards,

-- Shlomi Fish

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 14, 2014

From @tonycoz

On Wed Apr 09 17​:02​:58 2014, ilmari wrote​:

"Tony Cook via RT" <perlbug-followup@​perl.org> writes​:

On Thu Apr 03 15​:44​:40 2014, tonyc wrote​:

On Wed Apr 02 02​:18​:05 2014, mike.chamberlain@​pirum.com wrote​:

We're attemping to parse a large file over 2G in size, and it's
segfaulting. We are scanning
the file to identify key points within it. Ultimately the simplest
implementation of the bug is​:

chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1);
$x
.="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
Segmentation fault (core dumped)
chambm@​wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-2);
$x
.="\n"; my $end = index($x, "\n", 0); print "END​: $end\n"'
END​: 2147483646

It looks like pp_index suffers from the I32 bug, assuming
fbm_instr()
and rninstr() are safe it should be easy to fix.

Fix attached, for 5.21, though perhaps it should be in 5.20.

It needs to be switched to use sv_pos_(u2b|b2u)_flags as well, to
handle
UTF-8 strings properly​:

pp.c​: In function ‘Perl_pp_index’​:
pp.c​:3284​:6​: warning​: passing argument 2 of ‘Perl_sv_pos_u2b’ from
incompatible pointer type [enabled by default]
sv_pos_u2b(big, &offset, 0);
^
In file included from perl.h​:5012​:0,
from pp.c​:28​:
proto.h​:4266​:20​: note​: expected ‘I32 * const’ but argument is of type
‘ssize_t *’
PERL_CALLCONV void Perl_sv_pos_u2b(pTHX_ SV *const sv, I32 *const
offsetp, I32 *const lenp)
^
pp.c​:3301​:6​: warning​: passing argument 2 of ‘Perl_sv_pos_b2u’ from
incompatible pointer type [enabled by default]
sv_pos_b2u(big, &retval);
^
In file included from perl.h​:5012​:0,
from pp.c​:28​:
proto.h​:4256​:20​: note​: expected ‘I32 * const’ but argument is of type
‘ssize_t *’
PERL_CALLCONV void Perl_sv_pos_b2u(pTHX_ SV *const sv, I32 *const
offsetp)

Thanks, I looked for warnings when I built it, but must have missed them.

And adding utf8​::upgrade($work) to t/bigmem/index.t, gives​:

ilmari@​nurket​:~/src/perl/t$ PERL_TEST_MEMORY=4 ../perl -I../lib
bigmem/index.t
1..2
panic​: sv_pos_b2u​: bad byte offset, blen=2147483650,
byte=18446744071562067968 at bigmem/index.t line 23.
# Looks like you planned 2 tests but ran 0.

I should have tested the unicode path too.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 14, 2014

From @rjbs

* Tony Cook via RT <perlbug-followup@​perl.org> [2014-04-09T00​:30​:36]

Fix attached, for 5.21, though perhaps it should be in 5.20.

Putting aside the subsequent amendment to the patch, do we want this in 5.20?
It's not a regression, but if we are quite confident in the fix, ti would be a
good thing to fix!

--
rjbs

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 15, 2014

From @tonycoz

On Mon Apr 14 04​:22​:05 2014, tonyc wrote​:
...

Thanks, I looked for warnings when I built it, but must have missed them.

...

I should have tested the unicode path too.

Here's a new patch.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Apr 15, 2014

From @tonycoz

0001-perl-121562-fix-the-I32-bug-for-index-and-rindex.patch
From ab05702adb71d31562278ae9c3946d757c09d534 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 15 Apr 2014 03:57:57 +0200
Subject: [PATCH] [perl #121562] fix the I32 bug for index() and rindex()

---
 MANIFEST         |    1 +
 pp.c             |   10 +++++-----
 t/bigmem/index.t |   37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+), 5 deletions(-)
 create mode 100644 t/bigmem/index.t

diff --git a/MANIFEST b/MANIFEST
index 9652cd5..190315f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -4920,6 +4920,7 @@ t/base/rs.t			See if record-read works
 t/base/term.t			See if various terms work
 t/base/while.t			See if while work
 t/benchmark/rt26188-speed-up-keys-on-empty-hash.t	Benchmark if keys on empty hashes is fast enough
+t/bigmem/index.t		Check that index() handles large offsets
 t/bigmem/pos.t			Check that pos() handles large offsets
 t/bigmem/read.t			Check read() handles large offsets
 t/bigmem/regexp.t		Test regular expressions with large strings
diff --git a/pp.c b/pp.c
index 4ec6887..04c1f29 100644
--- a/pp.c
+++ b/pp.c
@@ -3197,8 +3197,8 @@ PP(pp_index)
     SV *temp = NULL;
     STRLEN biglen;
     STRLEN llen = 0;
-    I32 offset;
-    I32 retval;
+    SSize_t offset = 0;
+    SSize_t retval;
     const char *big_p;
     const char *little_p;
     bool big_utf8;
@@ -3281,13 +3281,13 @@ PP(pp_index)
 	offset = is_index ? 0 : biglen;
     else {
 	if (big_utf8 && offset > 0)
-	    sv_pos_u2b(big, &offset, 0);
+	    offset = sv_pos_u2b_flags(big, offset, 0, SV_CONST_RETURN);
 	if (!is_index)
 	    offset += llen;
     }
     if (offset < 0)
 	offset = 0;
-    else if (offset > (I32)biglen)
+    else if (offset > (SSize_t)biglen)
 	offset = biglen;
     if (!(little_p = is_index
 	  ? fbm_instr((unsigned char*)big_p + offset,
@@ -3298,7 +3298,7 @@ PP(pp_index)
     else {
 	retval = little_p - big_p;
 	if (retval > 0 && big_utf8)
-	    sv_pos_b2u(big, &retval);
+	    retval = sv_pos_b2u_flags(big, retval, SV_CONST_RETURN);
     }
     SvREFCNT_dec(temp);
  fail:
diff --git a/t/bigmem/index.t b/t/bigmem/index.t
new file mode 100644
index 0000000..fdd502c
--- /dev/null
+++ b/t/bigmem/index.t
@@ -0,0 +1,37 @@
+#!perl
+BEGIN {
+    chdir 't';
+    unshift @INC, "../lib";
+}
+
+use strict;
+require './test.pl';
+use Config qw(%Config);
+
+# memory usage checked with top
+$ENV{PERL_TEST_MEMORY} >= 2
+    or skip_all("Need ~2GB for this test");
+$Config{ptrsize} >= 8
+    or skip_all("Need 64-bit pointers for this test");
+
+plan(tests => 4);
+
+my $space = " "; # avoid constant folding from doubling memory usage
+# concatenation here increases memory usage significantly
+my $work = $space x 0x80000002;
+substr($work, 0x80000000) = "\n\n";
+
+# this would SEGV
+is(index($work, "\n"), 0x80000000, "test index() over 2G mark");
+
+# this would simply fail
+is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark");
+
+utf8::upgrade($work);
+
+# this would SEGV
+is(index($work, "\n"), 0x80000000, "test index() over 2G mark (utf8-ish)");
+
+# this would simply fail
+is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark (utf8-ish)");
+
-- 
1.7.1

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2014

From @tonycoz

On Mon Apr 14 19​:01​:04 2014, tonyc wrote​:

On Mon Apr 14 04​:22​:05 2014, tonyc wrote​:
...

Thanks, I looked for warnings when I built it, but must have missed them.

...

I should have tested the unicode path too.

Here's a new patch.

Applied as b464e2b.

Tony

@p5pRT p5pRT closed this May 28, 2014
@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2014

@tonycoz - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.