Skip to content


Subversion checkout URL

You can clone with
Download ZIP

Comparing changes

Choose two branches to see what's changed or to start a new pull request. If you need to, you can also compare across forks.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also compare across forks.
base fork: schacon/perl
head fork: castaway/perl
Checking mergeability… Don't worry, you can still create the pull request.
This comparison is big! We're only showing the most recent 250 commits
Commits on Aug 26, 2012
Karl Williamson utf8.c: collapse a function parameter
Now that we have a flags parameter, we can get put this parameter as
just another flag, giving a cleaner interface to this internal-only
function.  This also renames the flag parameter to <flag_p> to indicate
it needs to be dereferenced.
Karl Williamson utf8.c: Shorten hash key for speed
Experiments have shown that longer hash keys impact performance.  See
the thread at

This patch shortens a key used very frequently.  There are other keys in
this hash which are used frequently in some circumstances, but I expect
to change to use fewer in the future, so am not changing them now
Karl Williamson utf8.c: Add comment about speed-up attempt
This might keep someone later from attempting the speedup which didn't
actually help, so I didn't commit it
Karl Williamson utf8.c: Bypass a subroutine wrapper
We might as well call the core swash initialization, since we are the
core here, since the public one merely wraps it.
Karl Williamson utf8.c: Prefer binsearch over swash hash for small swashes
A binary swash is a hash of bitmaps used to cache the results of looking
up if a code point matches a Unicode property or regex bracketed
character class.  An inversion list is a data structure that also holds
information about what code points match a Unicode property or character
class.  It is implemented as an SV* to a sorted C array, and hence can
be searched using a binary search.

This patch converts to using a binary search of an  inversion list
instead of a hash look-up for inversion lists that are no more than 512
elements (9 iterations of the search loop).  That number can be easily
adjusted, if necessary.

Theoretically, a hash is faster than a binary lookup over a very long
period.  So this may negatively impact long-running servers.  But in the
short run, where most programs reside, the binary search is
significantly faster.

A swash is filled as necessary as time goes on, caching each new
distinct code point it is called with.  If it is called with  many, many
such code points, its performance can degrade as collisions increase.  A
binary search does not have that drawback.  However, most real-world
scenarios do not have a program being called with huge numbers of
distinct code points.  Mostly, the program will be called with code
points from just one or a few of the world's scripts, so will remain
sparse.  The bitmaps in a swash are each 64 bits long (except for ASCII,
where it is 128).  That means that when the swash is populated, a lookup
of a single code point that hasn't been checked before will have to
lookup the 63 adjoining code points as well, increasing its startup
overhead.  Of course, if one of those 63 code points is later accessed,
no extra populate happens.  This is a typical case where a languages
code points are all near each other.

The bottom line, though, is in the short term, this patch speeds up the
processing of \X regex matching about 35-40%, with modern Korean (which
has uniquely complicated \X processing) closer to 40%, and other scripts
closer to 35%.

The 512 boundary means that over 90% of the official Unicode properties
are handled using binary search.  I settled on that number by
experimenting with several properties besides \X and with various
powers-of-2 limits.  Until I got that high, performance kept improving
when the property went from being a swash to a binary search.  \X
improved even up to 2048, which encompasses 100% of the official Unicode

The implementation changes so that an inversion list instead of a swash
is returned by swash_init() when the input flags allows it to do so, for
all inversion lists shorter than the compiled in constant of 512
(actually <= 512).  The other functions that access swashes have added
intelligence to deal with an object of either type.  Should someone in
CPAN be using the public swash_init() interface, they will not see any
difference, as the option to get an inversion list is not available to
Karl Williamson utf8.c: indent in new block: White space-only 971d486
Karl Williamson regex: Speed up \X processing
For most Unicode releases, GCB=prepend matches absolutely nothing.  And
that appears to be the case going forward, as they added things to it,
and removed them later based on field experience.

An earlier commit has improved the performance of this significantly by
using a binary search of an empty array instead of a swash hash.
However, that search requires several layers of function calls to
discover that it is empty, which this commit avoids.

This patch will use whatever swash_init() returns unless it is empty,
preserving backwards compatibility with older Unicode releases.  But if
it is empty, the routine sets things up so that future calls will always
fail without further testing.
Karl Williamson regexec.c: White-space only
Indent inside newly formed block
Father Chrysostomos Banish boolkeys
Since 6ea72b3, rv2hv and padhv have had the ability to return boo-
leans in scalar context, instead of bucket stats, if flagged the right
way.  sub { %hash || ... } is optimised to take advantage of this.  If
the || is in unknown context at compile time, the %hash is flagged as
being maybe a true boolean.  When flagged that way, it returns a bool-
ean if block_gimme() returns G_VOID.

If rv2hv and padhv can already do this, then we don’t need the
boolkeys op any more.  We can just flag the rv2hv to return a boolean.
In all the cases where boolkeys was used, we know at compile time that
it is true boolean context, so we add a new flag for that.
Father Chrysostomos Correct typo in flag name adc42c3
@Tux Tux Add Configure probe for ip_mreq_source
Needed to upgrade Socket from CPAN
@steve-m-hay steve-m-hay Upgrade Socket from 2.004 to 2.006 aff163d
Karl Williamson perldelta for Unicode property performance gains a3d5177
Karl Williamson Revert "Experimentally Use Unicode 6.2 beta"
This reverts commit 5435c37.
A new beta has been released, and so we should use that instead.
Karl Williamson lib/unicore/README.perl: Make usablea s shell script
This adds comment symbols and redirects error messages to /dev/null for
likely things that will fail
Karl Williamson mktables: Correct generated table comment f0fd993
Karl Williamson mktables: Re-order some code, change comments
Unicode 6.2 is changing some of these things; this re-ordering will make
that more convenient.
Karl Williamson Prepare for Unicode 6.2
This changes code to be able to handle Unicode 6.2, while continuing to
handle all prevrious releases.

The major change was a new definition of \X, which adds a property to
its calculation.  Unfortunately \X is hard-coded into regexec.c, and so
has to revised whenever there is a change of this magnitude in Unicode,
which fortunately isn't all that often.  I refactored the code in
mktables to make it easier next time there is a change like this one.
Karl Williamson Use new Unicode 6.2 beta
These supposedly are the final data files for 6.2.  Earlier changes
originally proposed for 6.2 have been deferred until a later release.
Thus there is no change in the general category of ASCII characters in
these files from what they were in 6.1 and earlier, unlike what had been

Unlike the previous experimental beta, code is now in place in Perl to
handle the revised definition of \X in 6.2.  The current working draft
of that definition is at
Father Chrysostomos Restore ‘Can’t localize through ref’ to lv subs
In commit 40c94d1, I put an if statement inside an if statement,
skipping the else that followed if the outer if was true:

  if (...) {

  else if {


  if (...) {
     if (...) {
  else if {

The result was that ‘Can’t localize through a reference’ no longer
happened if the operator (%{} or @{}) was the last thing in an lvalue
sub, if the lvalue sub was not called in lvalue context.

$ perl5.14.0 -e 'sub foo :lvalue { local %{\%foo} } foo(); print "ok\n"'
Can't localize through a reference at -e line 1.
$ perl5.16.0 -e 'sub foo :lvalue { local %{\%foo} } foo(); print "ok\n"'

If the sub is called in lvalue context, the bug exists there, too, but
is much older (probably 82d0398):

$ perl5.6.2 -e 'sub f :lvalue { local %{\%foo} } (f()) =3; print "ok\n"'
Can't localize through a reference at -e line 1.
$ perl5.8.1 -e 'sub f :lvalue { local %{\%foo} } (f()) =3; print "ok\n"'

The simplest solution is to change the order of the conditions.  If
the rv2hv or rv2av op is passed a reference, and has ‘local’ in front
of it (OPf_MOD && OPpLVAL_INTRO), that should die, regardless of
whether it is the last thing in an lvalue sub.
Father Chrysostomos Croak for \local %{\%foo}
See the previous commit.

When I moved the check for local %$ref earlier, I didn’t move it
early enough.
Father Chrysostomos pp_hot.c: pp_rv2av: Squash repetitive code
The LVRET that I removed (in the if(SvTYPE(sv) == type) block) actu-
ally never evaluates to true, because that block is only entered for
%hash->{elem} or @array->[elem], in which the parent op is helem or
aelem, not leavesublv or return.  LVRET only returns true if the cur-
rent op is the last op in an lvalue sub.  Likewise, the OPpMAYBE_LVSUB
flag is never set in that case, so checking it now is harmless (the
cases that used to enter the if(SvTyPE(sv)==type) block now fall
through to the OPpMAYBE_LVSUB check).

(Using LVRET in pp_rv2av is actually incorrect, and I corrected most instances in 40c94d1, but this one remained.)
Father Chrysostomos Remove boolkeys op 605fa6b
Father Chrysostomos Increase $Opcode::VERSION to 1.24 440292d
Father Chrysostomos wrap long pod lines be1d34d
Commits on Aug 27, 2012
Karl Williamson regen/ Comment out obsolete code
Tricky folds have been removed from the code, so the removed #defines
are obsolete.  I'm leaving this in, in so it can conveniently be
referred to in case we ever need it again.
Karl Williamson Add utility and .h for character's UTF-8
This add regen/ takes Unicode characters and generates
utf8_strings.h to contains #defines for macros that translate from the
name to the UTF-8.  This is needed in a few places, where previously
things were manually figured out and hard-coded in.  Doing this instead
makes this easier, and removes EBCDIC dependencies/bugs, as the file
would simply be regen'd on an EBCDIC platform.
Commits on Aug 28, 2012
Father Chrysostomos [perl #114070] Fix lines nums after <<foo
The line numbers for operators after a here-doc marker on the same
line were off by the length of the here-doc.

This is because the here-doc parser would artificially increase the
line number as it went, because it was stealing lines out of the
input stream.

Instead, we can record the number of lines in the here-doc, and add it
to the line number the next time we need to increment it.

This also fixes the line numbers after s//<<END/e to the end of the
file, which were off because the line number adjusted by the <<END was
localised to the s///.

Since herelines is visible to inner lexing scopes, the outer lexing
scope can see changes made by the inner one.

The lack of localisation does cause problems with line numbers inside
quote-like operators (but they were off by one already), which will be
addressed in subsequent commits.
Father Chrysostomos Stop unterminated here-docs from leaking memory 2d85e41
Father Chrysostomos toke.c: Merge KEY_tr and KEY_y 8ce4b50
Father Chrysostomos Stop invalid y/// ranges from leaking 4dc843b
Father Chrysostomos Add PL_parser->lex_shared struct; move herelines into it
PL_parser->herelines needs to be visible to inner lexing scopes, which
also need to have their own copy of it, so that the here-doc parser
can modify the right herelines variable corresponding to the
PL_linestr from which it is stealing its body.  (A subsequent commit
will take take of that.)
Father Chrysostomos op.c: newSTATEOP: don’t check PL_parser after using it
If it is null, we would already have crashed when reaching this
Father Chrysostomos Fix line numbers inside here-docs
A previous commit put the number of lines in a here-doc in a separ-
ate parser field, which was added on to the line number at the next
CopLINE_inc (actually COPLINE_INC_WITH_HERELINES, now used through-
out toke.c).

Code interpolated inside the here-doc was picking up that value,
throwing line numbers off.

Before that, they were already off by one.

This commit fixes both.

I removed the CLINE from S_scan_heredoc and stopped using TERM (which
uses CLINE) for here-docs.  CLINE sets PL_copline, which is used to
pass a specific line number to newSTATEOP, which may or may not be the
same number as CopLINE(PL_curcop).  newSTATEOP grabs that number and
sets PL_copline to -1 (aka NOLINE).  I assume this was used to make
the statement containing the <<foo marker have the right line number.
But it didn’t fully work out, as subsequent statements on the same
line had the wrong number.  That I fixed a few commits ago when I
introduced herelines, making CopLINE(PL_curcop) have the right line
number for that line.  So the CLINE is not actually necessary anymore.
It was causing a problem also with the first statement inside the
heredoc (in ${...}), which would ‘steal’ the line number of the
<<foo marker.

This also means that <FH> and <.*> no longer do CLINE, but it is not
necessary, as they cannot span multiple lines.
Father Chrysostomos parser.h: Document copline with more detail
It took me a while to figure this out, so here it is for
future readers.
Father Chrysostomos Revert "smoke-me diag"
This reverts commit 372a31d.

I missed this when I was merging that branch.  It should never have
made its way into blead.  It was to find out why the Windows smokes
were temporarily failing, by dumping in the logs.
This was what led to 0ee3649.
Father Chrysostomos Fix eval 'q;;'
The parser expects a semicolon at the end of every statement, so the
lexer provides one.  It was not doing so for evals ending with a
semicolon, even if that semicolon was not a statement-terminating
Father Chrysostomos Stop (caller $n)[6] from including final "\n;"
String eval appends "\n;" to the string before evaluating it.
(caller $n)[6], which returns the text of the eval, was giving the
modified string, rather than the original.

In fact, it was returning the actual string buffer that the parser
uses.  This commit changes it to create a new mortal SV from that
string buffer, but without the last two characters.

It unfortunately breaks this JAPH:

eval'BEGIN{${\(caller 2)[6]}=~y< !"$()+\-145=ACHMT^acfhinrsty{}>
<nlrhta"o Pe e,\nkrcrJ uthspeia">}say if+chr(1) -int"145"!=${^MATCH}'
Father Chrysostomos Stop here-docs from gutting (caller $n)[6]
(caller $n)[6] returns the text of the eval.  Actually, it would
return, not the text of the eval, but the text with all the here-doc
bodies missing.

In this commit, I’m abusing the SvSCREAM flag to indicate that the
eval text stored in the context stack is refcounted.
Father Chrysostomos caller.t: Fix ‘Caller’ test
This string eval was always failing, leaving @c with its previous
value, which just happened to be what we were expecting.
Father Chrysostomos Use PL_parser->lex_shared instead of Sv[IN]VX(PL_linestr)
Unfortunately, PL_parser->linestr and PL_parser->bufptr are both
part of the API, so we can’t just move them to PL_parser->lex_shared.
Instead, we have to copy them in sublex_push, to make them visible to
inner lexing scopes.

This allows the SvIVX(PL_linestr) and SvNVX(PL_linestr) hack to
be removed.

It should also speed things up slightly.  We are already allocating
PL_parser->lex_shared in sublex_push, so there should be no need to
upgrade PL_linestr to SvNVX as well.

I was pleasantly surprised to see how the here-doc code seemed to
shrink all by itself when modified to account.

PL_sublex_info.super_bufptr is also superseded by the addition of
->ls_bufptr to the LEXSHARED struct.  Its old values when localised
were not visible, being stashed away on the savestack, so it was
harder to use.
Father Chrysostomos op.c: Two more boolean %hash optimisations
In commit c8fe3bd I used the wrong flag for ?:, causing it to slow
down unless the ?: was in void context.

OP_NOT has been sensitive to void context all along, which was never

These two should be just as fast.  The second should not be slower:

@sartak sartak "op-entry" DTrace probe fe83c36
@sartak sartak "loading-file" and "loaded-file" DTrace probes 32aeab2
Father Chrysostomos Add t/run/ to MANIFEST d2301c2
Father Chrysostomos Add another address for Shawn Moore to a3d8b84
Father Chrysostomos perldtrace.pod: Remove a stray =item 52f3623
Father Chrysostomos perldtrace.pod: typo 42b5c62
Father Chrysostomos note CPAN pod link target; regen pod issues bf71573
@steve-m-hay steve-m-hay Fix File::Copy test failure on Windows
Failure was introduced by 43ddfa5 which looks for a warning message from
code that isn't run on Windows.
@steve-m-hay steve-m-hay Revert File::Copy::copy() to fail when copying a file onto itself
Copying a file onto itself was made a fatal error by 96a91e0.
This was changed in 754f2cd from an undesirable croak() to return 1,
but the documentation was never changed from it being a fatal error.
It should probably have remained an error as per the documentation (but
updated not to say fatal) for consistency with cases of copying a file
onto itself via symbolic links or hard links.
@steve-m-hay steve-m-hay perldelta for 43ddfa5 and 39b80fd. 846de5c
Karl Williamson regexec.c: Remove no longer needed comments
These comments gave the derivation of the published Unicode algorithm
for determining what goes into \X to how it is actually implemented.

The new version of the Unicode text will be much more like what we've
implemented, so the derivation is no longer necessary; and is about to
be obsolete because of the Unicode document, and some changes to how we
Karl Williamson Refactor \X regex handling to avoid a typical case table lookup
Prior to this commit 98.4% of Unicode code points that went through \X
had to be looked up to see if they begin a grapheme cluster; then looked
up again to find that they didn't require special handling.  This commit
refactors things so only one look-up is required for those 98.4%.  It
changes the table generated by mktables to accomplish this, and hence
the name of it, and references to it are changed to correspond.
Karl Williamson Avoid duplicate table look ups.
These two spots both are matching 'c+' where 'c' is some character
against a Unicode table.  Prior to this patch, if it matched a single
'c', it would fall into a while loop, where it matches that same 'c'
again.  Simply increment the pointer past the first match, and the while loop
will start looking for succeeding matches starting with the next
character in the input.
Karl Williamson regexec.c: White-space only
This outdents a block whose enclosing braces have been removed, and
reflows things to correspond.
Commits on Aug 29, 2012
Nicholas Clark Refactor t/porting/filenames.t to shrink the code and the TAP generated.
Fold the function validate_file_name() into its only caller. Put the tested
pathname into each test description to avoid a call to note() - this halves
the size of the TAP generated. Fold the chained tests into a chained
if/elsif/else sequence. Eliminate the use of File::Spec, as all platforms
can cope internally with F<../MANIFEST>.
Nicholas Clark t/porting/checkcase.t now passes no_chdir to File::Find::find().
This avoids the test occasionally aborting due to File::Find::find() calling
warn::warnif(), which in turn attempts to lazy load Carp, which doesn't work
for a test using relative paths in @INC with the current directory changed.
Nicholas Clark t/porting/exec-bit.t isn't using File::{Basename,Find,Spec::Functions}.
No point loading modules that it uses nothing from.
Nicholas Clark t/porting/dual-life.t now passes no_chdir to File::Find::find().
File::Find::find() can call warn::warnif(), which in turn attempts to lazy
load Carp, which doesn't work for a test using relative paths in @INC with
the current directory changed.
Nicholas Clark t/porting/podcheck.t now passes no_chdir to File::Find::find().
File::Find::find() can call warn::warnif(), which in turn attempts to lazy
load Carp, which doesn't work for a test using relative paths in @INC with
the current directory changed.
Nicholas Clark Add /\.gif\z/ files to the non-Pod exceptions in t/porting/podcheck.t 51e1fe8
@perlDreamer perlDreamer Refactor t/re/no_utf8_pt.t to use instead of making TAP by hand. 09afc3e
@perlDreamer perlDreamer Refactor t/porting/checkcase.t to use instead of making TAP b…
…y hand.
@perlDreamer perlDreamer Refactor t/uni/ to use instead of making TAP by hand. be250cb
@perlDreamer perlDreamer Update t/op/lop.t to use instead of making TAP by hand. 73ee8f5
@perlDreamer perlDreamer Document the last five tests of t/op/lop.t 6f02a29
Nicholas Clark Remove a no-longer needed lexical from t/op/lop.t
Jim Keenan spotted the commented out code referencing the variable $test.
Turns out that it is completely redundant, so its declaration can go too.
Commits on Aug 30, 2012
Jerry D. Hedden Fix Cygwin build warnings
Fixes the following build warnings under Cygwin:

cygwin.c: In function 'do_spawn':
cygwin.c:132:5: warning: assignment from incompatible pointer type
cygwin.c: In function 'XS_Cygwin_posix_to_win_path':
cygwin.c:346:9: warning: 'err' may be used uninitialized in this function
cygwin.c: In function 'XS_Cygwin_win_to_posix_path':
cygwin.c:257:9: warning: 'err' may be used uninitialized in this function
@perlDreamer perlDreamer Refactor t/op/die.t to use instead of making TAP by hand.
[With a few whitespace tweaks]
Nicholas Clark Refactor t/op/die.t to re-use the same $SIG{__DIE__} handler where po…

Restore testing that the $SIG{__DIE__} handler is called for the case of
C<die bless [ 7 ], "Error";> which was removed by the previous refactoring.
Re-using the same $SIG{__DIE__} handler results in 4 more tests of isa_ok()
for an 'ARRAY' - this isn't going to hurt anyone.
Commits on Aug 31, 2012
Father Chrysostomos toke.c: S_scan_heredoc: prune dead code
This incorrect code (using a pointer after finding it to be null)
is the result of the refactoring in 60f40a3.  It was trying to
account for a string eval with no line break in it.  But that can’t
happen as of 1107659 (if it could it would crash).

So remove it and add an assertion, along with a comment explaining the
Father Chrysostomos Avoid uninit warning for qq|${\<<FOO}|
If a here-doc occurs inside a single-line quote-like operator inside
a file (as opposed to an eval), it produces an uninitialized warning.
The goto I added in commit 99bd9d9 wentto the wrong place.
Father Chrysostomos toke.c:S_scan_heredoc: Put stream-based parser in else block
We currently have the code laid out like this:

    if (peek) {
        ... peek inside the parent linestr buffer
    else if (eval) {
        ... grab the heredoc body from linestr ...
        start with an empty string for the heredoc body

    ... parse the body of the heredoc from the input stream ...

The final bit is inside a while loop whose condition is never true
after either of the first two branches of the if/else has executed.
But the code is very hard to read, and it is difficult to fix bugs, as
code cannot be added before the while loop, and the while loop condi-
tion cannot change, without affecting heredocs in string eval.

So put the final parser inside the else.  Future commits will
depend on this.
Father Chrysostomos Finish fixing here-docs in re-evals
This commit fixes here-docs in single-line re-evals in files (as
opposed to evals) and here-docs in single-line quote-like operators
inside re-evals.

In both cases, the here-doc parser has to look into an outer
lexing scope to find the here-doc body.  And in both cases it
was stomping on PL_linestr (the current line buffer) while
PL_sublex_info.re_eval_start was pointing to an offset in that buffer.
(re_eval_start is used to construct the string to include in the
regexp’s stringification once the lexer reaches the end of the

Fixing this entails moving re_eval_start and re_eval_str to
PL_parser->lex_shared, making the pre-localised values visible.
This is so that the code that peeks into an outer linestr buffer to
steal the here-doc body can set up re_eval_str in the right scope.
(re_eval_str is used to store the re-eval text when the here-
oc parser has no choice but to modify linestr; see also commit

It also entails making the stream-based parser (i.e., that reads from
an input stream) leave PL_linestr alone, instead of clobbering it and
then reconstructing part of it afterwards.
Father Chrysostomos Fix here-doc body extraction in eval 's//<<END/'
Outside of string eval, this:

s//<<END/e; print "a

prints this:


But when we have string eval involved,

eval 's//<<END/e; print "a

we get this:




The buggy code in question goes back to commit 0244c3a.

Since PL_linestr only contains the contents of the replacement
("<<END"), it peeks into the outer lexing scope’s linestr buffer, mod-
ifying it in place to remove the here-doc body, by copying everything
after the here-doc back to the spot where the body begins.

It was off by one, however, and left an extra line break.

When the code in question is reached, the variables are set as follows:

bufptr = "; print \"a"...  (just after the s///)
s      = "\nb\\n\""        (newline after the heredoc terminator)

The herewas variable already contains everything after the quote-
like operator containing the <<heredoc marker to the end of the line
including the \n ("; print \"a\n").

But then we concatenate everything from s onwards.  So we end up with
the \n before the here-doc body and the \n from after the here-doc
terminator juxtaposed.

So after using s to extract the re-eval string, we increment s so it
points afer the final newline.
Father Chrysostomos lex.t: Mangle obscenity (albeit euphemistic)
It is harder to hack on perl with someone looking over one’s shoulder
when there are comments like this, even when it is euphemistic in its
use of voiced dental stops instead of the voiceless kind.
Father Chrysostomos Make eval "s//<<END/e" slightly faster
The code that peeks into an outer linestr buffer to find the heredoc
body has to modify that buffer and remove the heredoc body from it.

It copies the text after the quote-like operator up to the end of the
line into a new SV, concatenates the text after the heredoc body into
a new SV, and then copies it back to linestr right after the quote-
like operator.

So, in this example:

eval "s//<<END/e; # jiggles\nfoo\nEND\ndie;"

It ends up copying this:

               "; # jiggles\ndie;\n;"

into this at the position shown:

eval "s//<<END/e; # jiggles\nfoo\nEND\ndie;\n;"

There is no need for two copies.  And there is no need to copy the
rest of the line where the heredoc marker is.
Father Chrysostomos toke.c:S_scan_heredoc: put the croaking code in one spot 932d0cf
Father Chrysostomos toke.c:scan_heredoc: less pointer fiddling; one less SV
The loop for reading lines of input to find the end of a here-doc has
always checked to see whether the cursor (s) was at the end of the
current buffer:

    while (s >= PL_bufend) {	/* multiple line string? */

(Actually, when it was added in perl 3.000, it was in scanstr and
that loop was not specific to here-docs, but also applied to multi-
line strings.)

The code inside the loop ends up fiddling with s by setting it explic-
itly to the end of the buffer or the end of the here-doc marker, minus
one to make sure it does not coincide with the end of the buffer.

This doesn’t make any sense, and it makes the rest of this function
more complicated.

Because the loop used to be outside the else block, it was also
reached for a here-doc inside a string eval, but the code for that
ensured the condition for the while loop was never true.

Since the while loop set s to one less than it needed to be set to,
in order to break out of it, it had to have s++ just after the loop.
That s++ was reached also by the eval code, which, consequently, had
to adjust its value of s.

That adjustment actually took place farther up in the function, where
the herewas SV was assigned to.  (herewas contains the text after the
here-doc marker to the end of the line.)  The beginning of herewas
would point to the last character of the here-doc marker inside an
eval, so that subtracting SvCUR(herewas) from the buffer end would
result in an adjusted pointer.

herewas is currently not actually used, except for the length.  Until
recently, the text inside it would be copied back into PL_linestr to
recreate where the lexer needed to continue (because PL_linestr was
being clobbered).  That no longer happens.

So we can get rid of herewas altogether.  Since it is in an else
block, the stream-based parser does not need to fiddle pointers to
exit the loop.  It can just break explicitly.  So the s++ can also
go, requiring changes (and simplifications) to the eval code.  The
comment about it being a multiline string is irrelevant and can go,
too.  It dates from when that line was actually in scanstr and applied
to quoted strings containing line breaks.
Father Chrysostomos toke.c:scan_heredoc: Remove unnecessary assignment
Updating PL_bufend after lex_next_chunk is not necessary, as
lex_next_chunk itself does it.
Father Chrysostomos toke.c:scan_heredoc: Merge two adjacent #ifdefs 6ac5e9b
Father Chrysostomos toke.c:scan_heredoc: Remove incorrect part of comment
I missed this in 60f40a3 when I stopped abusing IVX and NVX.
Father Chrysostomos toke.c:scan_heredoc: Merge similar code
The code for looking in outer lexing scopes was mostly identical to
the code for looking in PL_linestr.
Father Chrysostomos toke.c:scan_heredoc: comments, comments 19bbc0d
Father Chrysostomos toke.c: PL_in_eval purge
Many uses of PL_in_eval in toke.c are redundant.

PL_in_eval indicates not that we are parsing a string eval, but that
we are being called from an eval, whether stringy on not.  Even if
PL_in_eval were only for string eval, it would still not indicate that
we are parsing a string eval, because of eval 'require'.

This commit removes redundant uses of it (making things theoretically
slightly faster).
Father Chrysostomos Fix two minor s//.../e parsing bugs
It may be an odd place to allow comments, but s//"" # hello/e has\
always worked, *unless* there happens to be a null before the first #.

scan_subst in toke.c wraps the replacement text in do { ... } when the
/e flag is present.

It was adding a line break before the final } if the replacement text
contained #, because otherwise the } would be commented out.

But to find the # it was using strchr, which stops at the first null.
So eval "s//'\0'#/e" would fail.

It makes little sense to me to check whether the replacement contains
# before adding the line break.  It would be faster just to add the
line break without checking.

But then I discovered this bug:

s//"#" . <<END/e;
Can't find string terminator "END" anywhere before EOF at - line 1.

So now I have two bugs to fix.

The easiest solution seems to be to omit the line break and make the
comment parser skip the } at the end of a s///e replacement.
Father Chrysostomos Break s//3}->{3/e
This should never have worked:

%_=(_,"Just another ");
$_="Perl hacker,\n";
Father Chrysostomos Add skip_without_dynamic_extension f12ade2
Father Chrysostomos utf8cache.t: Skip only the XS-dependent test 4785469
Father Chrysostomos [perl #114410] Reset utf8 pos cache on get
If a scalar is gmagical, then the string buffer could change without
the utf8 pos cache being updated.

So it should respond to get-magic, not just set-magic.  Actually add-
ing get-magic to the utf8 magic vtable would cause all scalars with
this magic to be flagged gmagical.  Instead, in magic_get, we can call
Father Chrysostomos Stop substr($utf8) from calling get-magic twice
By calling get-magic twice, it could cause its string buffer to be
reallocated, resulting in incorrect and random return values.
Father Chrysostomos Stop calling get-magic twice for lvalue pos($utf8) 92cf669
Father Chrysostomos Stop calling get-magic twice when reading lvalue substr($utf8) a4036ec
Father Chrysostomos Stop calling get-magic twice when reading lvalue substr($utf8) ab445a1
Father Chrysostomos Stop calling get-magic twice in pack "u", $utf8 3f63b0e
Father Chrysostomos Stop calling get-magic twice in sprintf "%1s", $utf8 4b8b610
Father Chrysostomos Stop calling get-magic twice in sprintf "%.1s", $utf8 d8f2f09
@tonycoz tonycoz [perl #112776] TODO test for warning 3f305fa
@tonycoz tonycoz [perl #112776] avoid warning on an initialized non-parameter
A initialized non-parameter in the parameter block would warn
when $^W was set, and Module::Build sets $^W.
@tsee tsee Silence ParseXS warning about abusing the CODE section
See RT #114198. DynaLoader was warning about somewhat dubious use of
RETVAL with a CODE section but without an OUTPUT section. This fixes
that problem, but I have obviously not been able to test on all affected
operating systems.
Nicholas Clark Remove the VM/ESA port.
VM/ESA was a mainframe OS. IBM ended service on it in June 2003. It was
superseded by Z/VM.
@tonycoz tonycoz correct -Dmad skip count for tests introduced in 2d85e41 and 4dc843b a121486
@ap ap [perl #114498] Document (0)[1,2] better f51152e
@craigberry craigberry Files ending in .eg are also non-pod. 29a4534
@craigberry craigberry Make new File::Copy test case insensitive.
On VMS with default setttings, the filename is reported as copy.t,
not Copy.t, so make the regex allow that.
@SBECK-github SBECK-github Bump Locale-Codes from 3.22 to 3.23 94814ff
Father Chrysostomos Commit 6b00f56 broke s/${\%x}{3}//e
It was meant to check whether it was inside the replacement part of
s///e, but it only checked that it was inside s///e.  PL_lex_repl is
set on both sides, but is only equal to PL_linestr on the rhs.
Father Chrysostomos s/${foo#}//e should be an error
See also the previous commit.

This one was caused by 9c74ccc.

Again, we can’t just check whether PL_lex_repl has the SvEVALED
flag set (which means we are in s///e), but must also check whether
PL_lex_repl == PL_linestr (which means we are in the replacement part
of s///e).
Father Chrysostomos Document cmdline switches 09b6b4f
Father Chrysostomos Revert "toke.c: PL_in_eval purge"
This reverts commit 5c49e90.

This change broke line numbers under mad when the last statement in the main program lacks a semicolon.

I was mistaken in thinking that PL_rsfp would always be true when
PL_in_eval is false.

But the use of PL_in_eval is still wrong.  Under a mad build, we get
this inconsistency in line numbers:

$ perl -e 'print "\n-e undef\n"' > foo
$ ./miniperl foo
Use of uninitialized value in -e at foo line 2.
$ ./miniperl -we 'require "foo"'
Use of uninitialized value in -e at foo line 3.
foo did not return a true value at -e line 1.
Jerry D. Hedden Fix skip_without_dynamic_extension to just skip
skip_without_dynamic_extension() mistakenly ends with skip_all()
instead of skip().
@steve-m-hay steve-m-hay Upgrade DB_File to 1.827 82c92bb
@steve-m-hay steve-m-hay perldelta for 94814ff and 5e56f3f11f 2a527d3
Commits on Sep 01, 2012
@karenetheridge karenetheridge RT#114312: prevent ls from colourizing output
ANSI colour codes in the `ls -l /dev` output was preventing some substitutions
from matching, causing a subsequent test to fail when 'stdout' or 'stderr' was
not properly removed from $DEV.
@craigberry craigberry Add Karen Etheridge to AUTHORS. 4a7d38e
Commits on Sep 03, 2012
Nicholas Clark Test that the line number for a "sub redefined" warning is for the st…

The Perl interpreter is careful to use the line number of the start of a
subroutine's redefinition for the warning, but there were no tests for this.
Nicholas Clark Test that the warning for "Found = in conditional" is for the start l…

The Perl interpreter is careful to use the line number of the start of
the "Found = in conditional", but there were no tests for this.
Nicholas Clark Test that the warning for "can be 0, test with defined" is for the st…

The Perl interpreter is careful to use the line number of the start of
the 'Value of %s can be "0"; test with defined()" warning, but there were no
tests for this.
Nicholas Clark newXS_len_flags() shouldn't change the line number on PL_curcop when …

This can actually generate incorrect line numbers in runtime warnings, when
XSUBs are redefined from calls made from BEGIN blocks, and the line number
from the opening brace of the begin block is mashed with the filename of the
current line. For compiletime warnings, PL_curcop == &PL_compiling, so the
line numbers will be correct whether taken from PL_compiling or PL_parser.

This code dates back to perl-5.000, when it was added to newXS(). It appears
to be a copy of code present in newSUB() since alpha 2.
@maddingue maddingue Upgrade to XSLoader 0.16 681a49b
@rgs rgs Make XSLoader's UPSTREAM as undef
The upstream is supposed to be "blead", but the CPAN version of
XSLoader 0.16 and the one that has shipped with perl 5.17.3 are
different (doc changes only). The problem (cmp_version.t failing)
should disappear after the next perl release.
Commits on Sep 04, 2012
@maddingue maddingue Make dual-lived work on 5.8 again
Before releasing the version of from bleadperl to the CPAN,
I tested it with the versions of Perl I have by hand, and it appears
that the current code fails to compile on 5.8:

  Bareword "_DOWNGRADE" not allowed while "strict subs" in use at
  lib/ line 142.

Added by bd8cb55

Removing the short-circuit return allows the code to compile and the
tests to pass on all stable Perl from 5.8.2 to 5.16.1.
Nicholas Clark Under -DPERL_DEBUG_READONLY_OPS don't work around glibc 2.2.5 _moddi3…
… bugs.

The work around involves a runtime check and substituting OP pointers based
on the result. The substitution fails if the optree is mapped read-only.
Nicholas Clark With -DPERL_DEBUG_READONLY_OPS, changing a slab refcnt shouldn't make…
… it r/w.

Perl_op_refcnt_inc() and Perl_op_refcnt_dec() now both take care to leave the
slab in the same state as they found it. Previously both would
unconditionally make the slab read-write.
Nicholas Clark In op.c, change S_Slab_to_rw() from an OP * parameter to an OPSLAB *.
This makes it consistent with Perl_Slab_to_ro(), which takes an OPSLAB *.
Nicholas Clark Perl_magic_setdbline() should clear and set read-only OP slabs.
The debugger implements breakpoints by setting/clearing OPf_SPECIAL on
OP_DBSTATE ops. This means that it is writing to the optree at runtime,
and it falls foul of the enforced read-only OP slabs when debugging with

Avoid this by removing static from Slab_to_rw(), and using it and Slab_to_ro()
in Perl_magic_setdbline() to temporarily make the slab re-write whilst
changing the breakpoint flag.

With this all tests pass with -DPERL_DEBUG_READONLY_OPS (on this system)
Nicholas Clark In Perl_cv_forget_slab(), simplify the conditionally compiled code.
This refactoring reduces the line count and makes it clear that the basic
logic is the same with or without -DPERL_DEBUG_READONLY_OPS. It make no
change to the generated assembler on a normal build.
Nicholas Clark Merge improvements to -DPERL_DEBUG_READONLY_OPS into blead.
All tests pass with -Dusethreads -DPERL_DEBUG_READONLY_OPS (on this system)
Nicholas Clark Document the reason for the early return in Perl_newPROG() for OP_STUB. 22e660b
Andy Dougherty Avoid garbled sed command in hints/
Solaris sed does not understand the GNU /i flag.
Andy Dougherty Collapse duplicate settings in hints/ 39234ce
Jerry D. Hedden Fix compiler warning about empty if body
This is meant to correct the following 'blead' build warning:

op.c: In function 'Perl_op_free':
op.c:713:30: warning: suggest braces around empty body in an 'if' statement
Commits on Sep 05, 2012
Shlomi Fish perl5db: fix an accidental effect of strictures 32050a6
Shlomi Fish perl5db: more tests
This patch adds more tests for lib/ on lib/perl5db.t. One note
is that I'm a bit uncomfortable about the test for ".", which did
not initially work exactly as I expected, due to debugger quirks.

This patch also fixes a bug where the /pattern/ command (and possibly
the ?pattern? command as well) got broken due to the addition of "use
strict;", and adds tests for them.
@bingos bingos Update Archive-Tar to CPAN version 1.90

  * important changes in version 1.90 05/09/2012 (Tom Jones)
  - documentation fixes
Commits on Sep 07, 2012
Andy Dougherty Fix alignment for darwin with -Dusemorebits.
By default, the darwin build assumes a "multiarchitecture" build.
Configure has a hardwired default of '8' for alignbytes (and then
proceeds to ignore it with another hard-wired '8' in config.h).
That '8' was supposed to be a safe value, in case perl was built
on one architecture but run on another with a stricter constraint.
With darwin and -Dusemorebits, however, the alignment should be on
16-byte boundaries.  We don't want to penalize all darwin builds for
this unlikely configuration, but we do want to allow it.

This patch causes Configure to compute alignbytes even for multiarch
builds, but if the result is less than 8, it sets it to 8 (which preserves
the previous behavior).  If, however, alignbytes is 16, Configure won't
decrease it.  Then, this patch also fixes config_h.SH so that it uses
the value determined by Configure instead of the previous hardwired value.
Commits on Sep 08, 2012
Jerry D. Hedden Upgrade to threads::shared 1.41 2d28267
@iabyn iabyn document args to regexec_flags and API
Document in the API, and clarify in the source code, what the arguments
to Perl_regexec_flags are.

NB: this info is based on code inspection, not any real knowledge on my
@iabyn iabyn PL_sawampersand: use 3 bit flags rather than bool
Set a separate flag for each of $`, $& and $'.
It still works fine in boolean context.

This will allow us to have more refined control over what parts
of a match string to copy (we currently copy the whole string).
@iabyn iabyn regexec_flags(): simplify length calculation
The code to calculate the length of the string to copy was

    PL_regeol - startpos + (stringarg - strbeg);

This is a hangover from the original (perl 3) regexp implementation
that under //i, copied and folded the original buffer: so startpos might
not equal stringarg. These days it always is (except under a match failure
with (*COMMIT), and the code we're interested is only executed on success).

So simplify to just PL_regeol - strbeg.
@iabyn iabyn Separate handling of ${^PREMATCH} from $` etc
Currently the handling of getting the value, length etc of ${^PREMATCH}
etc is identical to that of $` etc.

Handle them separately, by adding RX_BUFF_IDX_CARET_PREMATCH etc
constants to the existing RX_BUFF_IDX_PREMATCH set.

This allows, when retrieving them, to always return undef if the current
match didn't use //p. Previously the result depended on stuff such
as whether the (non-//p) pattern included captures or not.

The documentation for ${^PREMATCH} etc states that it's only guaranteed to
return a defined value when the last pattern was //p.

As well as making things more consistent, this is a necessary
prerequisite for the following commit, which may not always copy the
whole string during a non-//p match.
@iabyn iabyn Don't copy all of the match string buffer
When a pattern matches, and that pattern contains captures (or $`, $&, $'
or /p are present), a copy is made of the whole original string, so
that $1 et al continue to hold the correct value even if the original
string is subsequently modified. This can have severe performance
penalties; for example, this code causes a 1Mb buffer to be allocated,
copied and freed a million times:

    $x = 'x' x 1_000_000;
    1 while $x =~ /(.)/g;

This commit changes this so that, where possible, only the needed
substring of the original string is copied: in the above case, only a
1-byte buffer is copied each time. Also, it now reuses or reallocs the
buffer, rather than freeing and mallocing each time.

Now that PL_sawampersand is a 3-bit flag indicating separately whether
$`, $& and $' have been seen, they each contribute only their own
individual penalty; which ones have been seen will limit the extent to
which we can avoid copying the whole buffer.

Note that the above code *without* the $& is not currently slow, but only
because the copying is artificially disabled to avoid the performance hit.
The next but one commit will remove that hack, meaning that it will still
be fast, but will now be correct in the presence of a modified original

We achieve this by by adding suboffset and subcoffset fields to the
existing subbeg and sublen fields of a regex, to indicate how many bytes
and characters have been skipped from the logical start of the string till
the physical start of the buffer. To avoid copying stuff at the end, we
just reduce sublen. For example, in this:

    "abcdefgh" =~ /(c)d/

subbeg points to a malloced buffer containing "c\0"; sublen == 1,
and suboffset == 2 (as does subcoffset).

while if $& has been seen,

subbeg points to a malloced buffer containing "cd\0"; sublen == 2,
and suboffset == 2.

If in addition $' has been seen, then

subbeg points to a malloced buffer containing "cdefgh\0"; sublen == 6,
and suboffset == 2.

The regex engine won't do this by default; there are two new flag bits,
REXEC_COPY_SKIP_PRE and REXEC_COPY_SKIP_POST, which in conjunction with
REXEC_COPY_STR, request that the engine skip the start or end of the
buffer (it will still copy in the presence of the relevant $`, $&, $',

Only pp_match has been enhanced to use these extra flags; substitution
can't easily benefit, since the usual action of s///g is to copy the
whole string first time round, then perform subsequent matching iterations
against the copy, without further copying. So you still need to copy most
of the buffer.
@iabyn iabyn rationalise t/re/pat_psycho.t
Do some cleanup of this file, without changing its functionality.

Once upon a time, the psycho tests were scattered throughout a single
pat.t file, before being moved into their own file. Now that they're all
in a single file, make the $PERL_SKIP_PSYCHO_TEST test a single "skip_all"
test at the beginning of the file, rather than testing it separately in
each code block.

Also, make some of the test descriptions more useful, and add a bit of
debugging output.
@iabyn iabyn stop $foo =~ /(bar)/g skipping copy
Normally in the presence of captures, a successful regex execution
makes a copy of the matched string, so that $1 et al give the right
value even if the original string is changed; i.e.

    $foo =~ /(123)/g;
    $foo = "bar";
    is("$1", "123");

Until now that test would fail, because perl used to skip the copy for
the scalar /(...)/g case (but not the C<$&; //g> case). This was to
avoid a huge slowdown in code like the following:

    $x = 'x' x 1_000_000;
    1 while $x =~ /(.)/g;

which would otherwise end up copying a 1Mb string a million times.

Now that (with the last commit but one) we copy only the required
substring of the original string (a 1-byte substring in the above
example), we can remove this fast-but-incorrect hack.
@iabyn iabyn tidy up patten match copying code
(no functional changes).

1. Remove some dead code from pp_split; it's protected by an assert
that it could never be called.

2. Simplify the flags settings for the call to CALLREGEXEC() in
pp_substcont: on subsequent matches we always set REXEC_NOT_FIRST,
which forces the regex engine not to copy anyway, so passing the
REXEC_COPY_STR is pointless, as is the conditional code to set it.

3. (whitespace change): split a conditional expression over 2 lines
for easier reading.
@iabyn iabyn m// and s///; don't copy TEMP/AMAGIC strings
Currently pp_match and pp_subst make a copy of the match string if it's
SvTEMP(), and in the case of pp_match, also if it's SvAMAGIC().

This is no longer necessary, as the code will always copy the string
anyway if its actually needed after the match, i.e. if it detects the
presence of $1, $& or //p etc. Until a few commits ago, this wasn't the
case for pp_match: it would sometimes skip copying even in the presence of
$1 et al for efficiency reasons. Now that that's fixed, we can remove the
SvTEMP() and SvAMAGIC() tests.

As to why pp_subst did the SvTEMP test, I don't know: but removing it
didn't make any tests fail!
@iabyn iabyn fix a bug in handling $+[0] and unicode
The code to decide what substring of a pattern target to copy for the
sake of $1, $& etc, would, in the absence of $&, only copy the minimum
range needed to cover $1,$2,...., which might be a shorter range than
what $& covers. This is fine most of the time, but, when calculating
$+[0] on a unicode string, it needs a copy of the whole part of the string
covered by $&, since it needs to convert the byte offest into a char
So to fix this, always copy as a minimum, the $& range.
I suppose we could be more clever about this: detect the presence
of @+ in the code, only do it for UTF8 etc; but this is simple
and non-fragile.
@iabyn iabyn [MERGE] only copy bits of regex match string
When making a copy of the string being matched against (so that $1, $&
et al continue to show the correct value even if the original string is
subsequently modified), only copy that substring of the original string
needed for the capture variables, rather than copying the whole string.

This is a big win for code like

    $_ = 'x' x 1_000_000;
    1 while /(.)/;

Also, when pessimizing if the code contains $`, $& or $', record
the presence of each variable separately, so that the determination of the
substring range is based on each variable separately. So performance-wise,

   $&; /x/

is now roughly equivalent to


whereas previously it was like



   $&; $'; /x/

is now roughly equivalent to



Finally, this code (when not in the presence of $& etc)

    $_ = 'x' x 1_000_000;
    1 while /(.)/;

used to skip the buffer copy for performance reasons, but suffered from $1
etc changing if the original string changed. That's now been fixed too.
@iabyn iabyn fix s/(.)/die/e
Commit 6502e08 introduced copying just
the part of the regex string that were needed; but piggy-backing on that
commit was a temporary change I made that I forgot to undo, which - it
turns out - causes SEGVs and similar when the replacement part of a
substitution dies.

This commits reverts that change.

Spotted as
    Bleadperl v5.17.3-255-g6502e08 breaks GAAS/URI-1.60.tar.gz
(not assigned an RT ticket number yet)
@craigberry craigberry Out of memory message should not allocate memory.
This fixes [perl #40595].  When Perl_malloc reports an out of
memory error, it should not make calls to PerlIO functions that
may turn around and allocate memory using Perl_malloc.  A simple
write() should be ok, though.  Inspired by S_write_no_mem() from
util.c.  Also replaces the local write2 function, which did the
same thing slightly differently.

Under -DDEBUGGING, there are other calls to PerlIO_printf that are
also likely unsafe, but that problem is not addressed here.
Commits on Sep 10, 2012
@craigberry craigberry Fix C++, MYMALLOC, sdbm combination.
The prototypes for the home-grown malloc replacements were not
protected with extern "C" declarations, so linking the SDBM_File
extension failed when configuring with -Dusemymalloc=y and building
with C++.
Colin Kuskie (via RT) Refactor t/porting/customized to use instead of making TAP by…
… hand
Colin Kuskie (via RT) Refactor t/op/cond.t to use instead of making TAP by hand 63811f1
@perlDreamer perlDreamer Refactor t/op/my.t to use instead of making TAP by hand 8a7eb8f
Andy Dougherty Fix [perl #114812] Configure not finding isblank().
Configure would not find isblank() when run with g++ because
the probe used exit() without including <stdlib.h>.  The simplest fix
is to have the probe use return instead.
@jmdh jmdh Correct obvious typos in acknowledgements list 9e53330
@rafl rafl Stop CPAN from indexing mad/ a326526
@rafl rafl Perldelta up to 9e53330 5faa50e
@rafl rafl Remove some set but unused variables
Thanks, gcc, for letting me know.
Commits on Sep 11, 2012
Shlomi Fish Add more tests, Revert back to C-style for loops
This patch to lib/ and lib/perl5db.t adds more tests for the L
and S commands and reverts some changes from C-style for loops to
while+continue loops which were not very popular.
@amenonsen amenonsen Add changelog entry for 2.38 6ac1779
@amenonsen amenonsen Bump version to 2.39 because I botched the 2.38 release 7a950fe
@craigberry craigberry Identify MallocCfg* globals as variables, not functions.
Otherwise building on VMS with -Dusemymalloc=y fails because we
enter them as procedures in the linker options file and the linker
knows we're lying and will have none of it.
@steve-m-hay steve-m-hay Forward declare static functions in win32/win32.c
This makes calling them easier without worrying about the order of
@steve-m-hay steve-m-hay ANSIfy output from invalid parameter handler, and write it to stderr
The function, file and expression are very unlikely to contain anything
requiring UTF-16 output, and the output is less likely to interfere with
anything when written to stderr rather than stdout.

Note that the function doesn't currently do anything without hacking the
makefiles because we don't currently build with _DEBUG and the debug CRT.
I haven't changed that yet (other than locally) because there is actually
some output from it which causes a couple of tests to fail.
@steve-m-hay steve-m-hay Silence invalid parameter messages from win32_signal
This is the first step towards enabling the invalid parameter handler
without it causing undue noise. In this case the invalid parameters are
intentional, so provide a means to silence messages about them.

There is still noise from win32_close() and win32_select() which needs
resolving by some means too before the handler can be switched on without
its output causing test failures.
@steve-m-hay steve-m-hay Update perldelta entry for [perl #114496].
Improved text by Tony C, from the bug report.
@rafl rafl Tell about the Storable 2.39 upgrade c715af8
@rafl rafl autouse has synchronised to CPAN adac38d
@rafl rafl B::Lint hsa been synchronised to CPAN b58c5ae
@rafl rafl Automatically create core-cpan-diff cache dir 05bdd68
@rafl rafl Synchronise bignum with CPAN 993386a
@rafl rafl Dumpvalue has been synchronised to CPAN f6e46c4
@rafl rafl ExtUtils::Manifest has been synchronised to CPAN bd78550
@rafl rafl Term::ReadLine has been synchronised to CPAN 5152317
@rafl rafl Text::Abbrev has been synchronised to CPAN 5e96eee
@rafl rafl Perldelta up to 5e96eee eebee32
@dagolden dagolden Updated Search::Dict to 1.07 as on CPAN 1d04412
Commits on Sep 12, 2012
Father Chrysostomos pad.c: Share pad name lists between clones
Pad names are immutable once the sub is compiled.  They are shared
between clones.  Instead of creating a new array containing the same
pad name SVs, just share the whole array.

cv_undef does not need to modify the pad name list when removing an
anonymous sub, so we can just delete that code.  That was the only
thing modifying them between compilation and freeing, as far as I
could tell.
Father Chrysostomos Unify CvDEPTH for formats and subs
As Dave Mitchell pointed out, while putting the CvDEPTH field for for-
mats in the SvCUR slot might save memory for formats, it slows down
sub calls because CvDEPTH is used on subs in very hot code paths.
Checking the SvTYPE to determine which field to use should not be
@perlDreamer perlDreamer Refactor to use instead of making TAP by hand. Add test names. 18b94ad
@iabyn iabyn update docs for $`, $&, $' changes
mention that they're now detected individually, and mention in reapi
the new RX_BUFF_IDX_* symbolic constants.
@iabyn iabyn perldelta: add recent regex API changes 050862b
@iabyn iabyn add test for 6502e08, s/(.)/die/e
Forgot to add a test along with the commit that fixed this
@iabyn iabyn stop ""-overloaded Regex recursing
There was code to detect this, but it checked for the returned value being
the same as before, but in this case it was returning a *new* temporary
reference to the same Regexp object; so check for that too.
@Leont Leont Eradicate race condition in t/op/sigsystem.t (#114562) c5e8d5c
@PeterMartini PeterMartini Add PL_subname to the save stack
Otherwise, PL_subname is left as utf8::SWASHNEW after isIDFIRST_lazy_if
(etc) is called in UTF context
Father Chrysostomos toke.c: Under -DT, dump complement properly

$ ./miniperl -DT -e '~foo'
### <== ?? 126


$ ./miniperl -DT -e '~foo'
### <== '~'
Father Chrysostomos Fix listop-hash-infix parsing
With some list operators, this happens:

$ ./miniperl -e 'warn({$_ => 1} + 1) if 0'
syntax error at -e line 1, near "} +"
Execution of -e aborted due to compilation errors.

Putting + before the { or changing warn to print makes the prob-
lem go away.

The lexer is losing track of what token it expects next, so it ends
up interpreting the + as a unary plus, instead of an infix plus.  The
parser doesn’t like that.

It happens because of this logic under case '{' (aka leftbracket:) in

	switch (PL_expect) {
	case XTERM:
	    if (PL_oldoldbufptr == PL_last_lop)
		PL_lex_brackstack[PL_lex_brackets++] = XTERM;
		PL_lex_brackstack[PL_lex_brackets++] = XOPERATOR;

The value we put on the brackstack is what we expect to find after the
closing brace (case '}' pops it off).

This particular if/else goes all the back to ef6361f (perl
5.000), or at least that was when it moved inside the XTERM case.
Before that, we had this:

	if (oldoldbufptr == last_lop)
	    lex_brackstack[lex_brackets++] = XTERM;
	    lex_brackstack[lex_brackets++] = XOPERATOR;
	if (expect == XTERM)

So it appears that the XTERM/XOPERATOR distinction, based on last_lop
was the ‘old’ (and wrong) way of doing it, but it had to be changed in
perl 5.000 for cases other than XTERM.  That it remained for XTERM was
probably an oversight, which is easy to understand, since I seem to be
the first one to stumble across this after 18 years (what’s the rele-
vant Klortho number?).

Removing this last_lop check causes no tests to fail.  And it makes
sense, since anything coming right after an anonymous hash that could
be either an infix or prefix operator must be infix.
Father Chrysostomos op.c: Document newGIVENOP(..., 0)
Something I missed in b5a6481.
Commits on Sep 13, 2012
@doy doy whoops, move this back where it was
apparently utf8->SWASHNEW calls "require 'unicore/'" too
@steve-m-hay steve-m-hay Avoid POSIX::close when closing files by descriptor in IPC::Open3
Closing a file descriptor with POSIX::close bypasses PerlIO's ref-counting
of file descriptors and leads to MSVC++'s invalid parameter handler being
triggered when the PerlIO stream is closed later because that attempts to
close the underlying file descriptor again, but it's already closed.

So instead, we effectively fdopen() a new PerlIO stream and then close it
again to effect the closure of the file descriptor.
@steve-m-hay steve-m-hay Fix a couple of headings in perlgit.pod which look to be the wrong level 99cd8e4
@steve-m-hay steve-m-hay Document how to create and use smoke-me branches
The instructions are based on the following helpful email from Tony Cook:
after I tested them myself in course of commit 8700fd3.
@steve-m-hay steve-m-hay perldelta for 8700fd3 5f877a7
Commits on Sep 14, 2012
Karl Williamson regcomp.c: Wrap some long lines 98e1e01
Karl Williamson Fix \X handling for Unicode 5.1 - 6.0
Commit 27d4fc3 neglected to include a
change required for a few Unicode releases where the \X prepend property
is not empty.  This does that, and suppresses a mktables warning for
Unicode releases prior to 6.2
Karl Williamson Unicode/ Clarify pod c865229
Karl Williamson /, Add guard to .h
Future commits will have other headers #include the headers generated by
these programs.  It is best to guard against the preprocessor from
trying to process these twice
Karl Williamson regen/ Copy empty input lines to output
This allows the generated .h to look better.
Karl Williamson regen/ Allow explicit default on input
An input line without a command is considered to be a request for the
UTF-8 encoded string of the code point.  This allows an explicit
'string' to be used.
Karl Williamson regen/ Add ability to get native charset
This adds a new capability to this program: to input a Unicode code point and
create a macro that expands to the platform's native value for it.

This will allow removal of a bunch of EBCDIC dependencies in the core.
Karl Williamson Rename regen'd hdr to reflect expanded capabilities
The recently added utf8_strings.h has been expanded to include more than
just strings.  I'm renaming it to avoid confusion.
Karl Williamson Remove some EBCDIC dependencies
A new regen'd header file has been created that contains the native
values for certain characters.  By using those macros, we can eliminate
EBCDIC dependencies.
Karl Williamson ext/B/B.xs: Remove EBCDIC dependency
These are unnecessary EBCDIC dependencies: It uses isPRINT() on EBCDIC,
and an expression on ASCII, but isPRINT() is defined to be precisely
that expression on ASCII platforms.
Karl Williamson utf8.h: Correct improper EBCDIC conversion
These macros were incorrect for EBCDIC.  The relationships are based on
I8, the intermediate-utf8 defined for UTF-EBCDIC, not the final encoding.
I was the culprit who did this orginally; I was confused by the names of
the conversion macros.  I'm adding names that are clearer to me; which
have already been defined in utfebcdic.h, but weren't defined for
non-EBCDIC platforms.
Karl Williamson utf8.h: White-space only
This reflows some lines to fit into 80 columns
Karl Williamson utf8.h: Save a branch in a macro
By adding a mask, we can save a branch.  The two expressions match the
exact same code points.
Karl Williamson regen/ Handle ranges, \p{}
Instead of having to list all code points in a class, you can now use
\p{} or a range.

This changes some classes to use the \p{}, so that any changes Unicode
makes to the definitions don't have to manually be done here as well.
Karl Williamson regen/ Remove Encode:: dependency
Newer options to unpack alleviate the need for Encode, and run faster.
Karl Williamson regen/ Work on EBCDIC platforms
This will now automatically generate macros for non-ASCII platforms,
by mapping the Unicode input to native output.

Doing this will allow several cases of EBCDIC dependencies in other code
to be removed, and fixes the bug that this previously had with non-ASCII
Karl Williamson regen/ Fix bug for character '0'
The character '0' could be omitted from some generated macros due to
it's testing the value of a hash entry (getting 0 or false) instead
of if it exists or not.
Karl Williamson regen/ Change to work on an empty class
Future commits will add Unicode properties for this to generate macros,
and some of them may be empty in some Unicode releases.  This just
causes such a generated macro to evaluate to 0.
Karl Williamson regen/ Generate macros for \X processing
\X is implemented in regexec.c as a complicated series of property
look-ups.  It turns out that many of those are for just a few code
points, and so can be more efficiently implemented with a macro than a
swash.  This generates those.
Karl Williamson regexec.c: Use new macros instead of swashes
A previous commit has caused macros to be generated that will match
Unicode code points of interest to the \X algorithm.  This patch uses
them.  This speeds up modern Korean processing by 15%.

Together with recent previous commits, the throughput of modern Korean
under \X has more than doubled, and is now comparable to other
languages (which have increased themselved by 35%)
Karl Williamson Move 2 functions from utf8.c to regexec.c
One of these functions is currently commented out.  The other is called
only in regexec.c in one place, and was recently revised to no longer
require the static function in utf8.c that it formerly called.  They can
be made static inline.
Karl Williamson regen/ Add name parameter
A future commit will want to use the first surrogate code point's UTF-8
value.  Add this to the generated macros, and give it a name, since
there is no official one.  The program has to be modified to cope with
Karl Williamson regen/ Allow comments in input
Lines whose first non-blank character is a '#' are now considered to be
comments, and ignored.  This allows the moving of some lines that have
been commented out back to after the __DATA__ where they really belong.
Karl Williamson regen/ Error check input better
This makes sure that the modifiers specified in the input are known to
the program.
Karl Williamson regen/ Add documentation cc08b31
Karl Williamson regen/ Add new output macro type
The new type 'high' is used on only above-Latin1 code points.  It is
designed for code that already knows the tested code point is not
Latin1, and avoids unnecessary tests.
Karl Williamson Use macro not swash for utf8 quotemeta
The rules for matching whether an above-Latin1 code point are now saved
in a macro generated from a trie by regen/, and these are
now used by pp.c to test these cases.  This allows removal of a wrapper
subroutine, and also there is no need for dynamic loading at run-time
into a swash.

This macro is about as big as I'm comfortable compiling in, but it
saves the building of a hash that can grow over time, and removes a
subroutine and interpreter variables.  Indeed, performance benchmarks
show that it is about the same speed as a hash, but it does not require
having to load the rules in from disk the first time it is used.
Karl Williamson regen/ Pass options deeper into call stack
This is to prepare for future commits which will act differently at the
deep level depending on some of the options.
Karl Williamson regen/ Rename a variable
I find it confusing that the array element name is the same as the full array
Karl Williamson regen/ Add an optimization
Branches can be eliminated from the macros that are generated here
by using a mask in cases where applicable.  This adds checking to see if
this optimization is possible, and applies it if so.
Karl Williamson regen/ Rmv always true components from gen'd macro
This adds a test and returns 1 from a subroutine if the condition will
always match; and in the caller it adds a check for that, and omits the
condition from the generated macro.
Karl Williamson regen/ Extend previously added optimization
A previous commit added an optimization to save a branch in the
generated code at the expense of an extra mask when the input class has
certain characteristics.  This extends that to the case where
sub-portions of the class have similar characteristics.  The first
optimization for the entire class is moved to right before the new loop
that checks each range in it.
Karl Williamson regen/ White-space only
Indent a newly-formed block
Karl Williamson regen/ Add optimization
On UTF-8 input known to be valid, continuation bytes must be in the
range 0x80 .. 0x9F.  Therefore, any tests for being within those bounds
will always be true, and may be omitted.
Karl Williamson utf8.h: Remove some EBCDIC dependencies
regen/ has been enhanced in previous commits so that it
generates as good code as these hand-defined macro definitions for
various UTF-8 constructs.  And, it should be able to generate EBCDIC
ones as well.  By using its definitions, we can remove the EBCDIC
dependencies for them.  It is quite possible that the EBCDIC versions
were wrong, since they have never been tested.  Even if has bugs under EBCDIC, it is easier to find and fix
those in one place, than all the sundry definitions.
Karl Williamson regen/ Add ability to restrict platforms
This adds the capability to skip definitions if they are for other than
a desired platform.
Karl Williamson utf8.h: Use machine generated IS_UTF8_CHAR()
This takes the output of regen/ for all the 1-4 byte
UTF8-representations of Unicode code points, and replaces the current
hand-rolled definition there.  It does this only for ASCII platforms,
leaving EBCDIC to be machine generated when run on such a platform.

I would rather have both versions to be regenerated each time it is
needed to save an EBCDIC dependency, but it takes more than 10 minutes
on my computer to process the 2 billion code points that have to be
checked for on ASCII platforms, and currently t/porting/regen.t runs
this program every times; and that slow down would be unacceptable.  If
this is ever run under EBCDIC, the macro should be machine computed
(very slowly).  So, even though there is an EBCDIC dependency, it has
essentially been solved.
Karl Williamson Merge branch for mostly regen/ into blead
I started this work planning to enhance regen/ to accept
Unicode properties as input so that some small properties used in \X
could be compiled in, instead of having to be read from disk.  In doing
so, I saw some opportunities to move some EBCDIC dependencies down to a
more basic level, thus replacing quite a few existing ones with just a
couple at the lower levels.  This also led to my enhancing the macros
output by to be at least as good (in terms of numbers of
branches, etc) as the hand-coded ones it replaces.

I also spotted a few bugs in existing code that hadn't been triggered
@perlDreamer perlDreamer Refactor t/op/exists_sub.t to use instead of making TAP by hand. 09b1b8c
@perlDreamer perlDreamer Refactor t/op/overload_integer.t to use instead of making TAP…
… by hand.

With minor change from committer: Always assign $@ asap after an eval.
@perlDreamer perlDreamer Refactor t/run/switch0.t to use instead of making TAP by hand. 54138b1
@perlDreamer perlDreamer Refactor t/op/push.t to use instead of making TAP by hand. e589c1f
Nicholas Clark Restore the build under -DPERL_OLD_COPY_ON_WRITE
This was broken as a side effect of commit 6502e08, recently merged
to blead.
Nicholas Clark Fix buggy -DPERL_POISON code in S_rxres_free(), exposed by a recent t…

The code had been buggily attempting to overwrite just-freed memory since
PERL_POISON was added by commit 94010e7 in June 2005. However, no
regression test exercised this code path until recently.

Also fix the offset in the array of UVs used by PERL_OLD_COPY_ON_WRITE to
store RX_SAVED_COPY(). It now uses p[2]. Previously it had used p[1],
directly conflicting with the use of p[1] to store RX_NPARENS().

The code is too intertwined to meaningfully do these as separate commits.
Nicholas Clark Fix compilation for -DPERL_POISON and -DPERL_OLD_COPY_ON_WRITE together.
These have been present since PERL_POISON was added in June 2005 by commit
94010e7. It seems that no-one has tried compiling with both defined
@iabyn iabyn eliminate PL_reginput
PL_reginput (which is actually #defined to PL_reg_state.re_state_reginput)
is, to all intents and purposes, state that is only used within

The only other places it is referenced are in S_regtry() and S_regrepeat(),
where it is used to pass the current match position back and forth between
the subs.

Do this passing instead via function args, and bingo! PL_reginput is now
just a local var of S_regmatch().
@iabyn iabyn regmatch(): make PUSH_STATE_GOTO dest explicit
Currently, the string position from where matching continues after a PUSH
is implicitly specified by the value of reginput, which is usually just
equal to locinput. Make this explicit by adding an extra argument to

This is part of a campaign to eliminate the reginput variable.
@iabyn iabyn regmatch(): remove reginput from TRIE_next_fail:
It was being used essentially as a temporary var within the branch,
so replace it with a temp var in a new block scope.

This is part of a campaign to eliminate the reginput variable.
@iabyn iabyn regmatch(): remove reginput from IFMATCH etc
It was being used essentially as a temporary var within the branch,
so replace it with a temp var in a new block scope.

On return in IFMATCH_A / IFMATCH_A_fail, there's no need to set reginput
any more, so don't. The SUSPEND case used to set locinput = reginput, but
at that point, the two variables already always had the same value anyway.

This is part of a campaign to eliminate the reginput variable.
@iabyn iabyn regmatch(): remove reginput from CURLYM
reginput, locinput and st->locinput were being used in a little
ballet to determine the length of the first match.
This is now simply locinput - st->locinput, or its unicode equivalent;
so the code can be simplified.

Elsewhere in the block: where reginput was being used, locinput and/or
nextchr already contain the same info, so use them instead.

This is part of a campaign to eliminate the reginput variable.
@iabyn iabyn regmatch(): remove reginput from CURLY etc
reginput mostly tracked locinput, except when regrepeat() was called.
With a bit of jiggling, it could be eliminated for these blocks of code.

This is part of a campaign to eliminate the reginput variable.
@iabyn iabyn regmatch(): remove remaining reads of reginput
In the remaining place where the value of reginput is used, its value
should always be equal to locinput, so it can be eliminated there.

This is part of a campaign to eliminate the reginput variable.
@iabyn iabyn regmatch(): eliminate reginput variable
The remaining uses of reginput are all assignments; its value is
never used. So eliminate it.

Also, update the description of S_regrepeat(), which was woefully out of
date (but mentioned reginput).
@iabyn iabyn [MERGE] eliminate PL_reginput
The variable PL_reginput (which is actually part of the
global/per-interpreter variable PL_reg_state), is mainly used just
locally within the S_regmatch() function. In this role, it effectively
competes with the local-to-regmatch() variable locinput, as a pointer
that tracks the current match position.

Having two variables that do this is less efficient,and makes the code
harder to understand. So this series of commits:

1) removes PL_reginput, and replaces it with a var, reginput, local to
2) successively removes more and uses of the reginput variable, until
3) it is eliminated altogether, leaving locinput as the sole 'here we are'

Looking at the CPU usage of running the t/re/*.t tests on a -O2,
non-threaded build, running each test suite 3 times, gives:

before: 55.35 55.66 55.69
after:  55.10 55.13 55.33

which indicates a small performance improvement of around 0.5%.

(The CPU usage of a single run of the whole perl test suite dropped from
783.31s to 777.23s).