… be allowed to access the corresponding string buffer byte
In amagic_call(), the 'method' arg comes the overload enum in overload.h, but is expected to match the bit set from %overloading::numbers::names. It values wrongly start at 1, differing by 1 from the enum indexes. This didn't appear in the tests because 'method' was reduced modulo 7 instead of 8.
…ched (and run "make regen")
Consider what currently happens when the tokenizer is scanning a string. It looks through it byte-by-byte until it finds a character that forces it to decide to go to utf8. It then calls sv_utf8_upgrade() with the portion of the string scanned so far. sv_utf8_upgrade() starts over from the beginning, and scans the string byte-by-byte until it finds a character that varies between non-utf8 and utf8. It then calls bytes_to_utf8(). bytes_to_utf8() allocates a new string that can handle the worst case expansion, 2n+1, of the entire string, and starts over from the beginning, and scans the input string byte-by-byte copying and converting each character to the output string as it goes. It doesn't return the size of the new string, so sv_utf8_upgrade() assumes it is only as big as what actually got converted, throwing away knowledge of any spare. It then returns to the tokenizer, which immediately does a grow to get space for the unparsed input. This is likely to cause a new string to be allocated and copied from the one we had just created, even if that string in actuality had enough space in it. Thus, the invariant head portion of the string is scanned 3 times, and probably 2 strings will be allocated and copied. My solution to cutting this down is to do several things. First, I added an extra flag for sv_utf8_upgrade that says don't bother to check if the string needs to be converted to utf8, just assume it does. This eliminates one of the passes. I also added a new parameter to sv_utf8_upgrade that says when you return, I want this much unused space in the string. That eliminates the extra grow. This was all done by renaming the current work-horse function from sv_utf8_upgrade_flags to be sv_utf8_upgrade_flags_grow() and making the current function name be a macro which calls the revised one with a 0 grow parameter. I also improved the internal efficiency of sv_utf8_upgrade so that when it does scan the string, it doesn't call bytes_to_utf8, but does the conversion itself, using a fast memory copy instead of the byte-oriented one for the invariant header, and it uses that header to get a better estimate of the needed size of the new string, and it doesn't throw away the knowledge of the allocated size. And, if it is clear without scanning the whole string that the conversion will fit in the already allocated string, it just uses that instead of allocating and copying a new one, using the algorithm I copied from the tokenizer. (In this case it does have to finish scanning the whole string to get the correct size.) The comments have details. It still is byte-oriented. Vectorization et. al. could yield performance improvements. One idea for that is in the comments. The patch also includes a new synonym I created which is a more accurate name than NATIVE_TO_ASCII.
This fixes the following problem: -e 'my $re = qr/x/; $re |= "y"' assert failure under 5.10.0, 10-maint, bleed, but not 5.8.8
Plus a comment by Nicholas
… for threads, so have to be written longhand as Perl_sv_catpvf(aTHX_ ...) :-(
… process The net result of this patch is to make available via Config.pm and -v/-V the details about the git version info we have available for the build. When built within a git repository git is queried directly. When built from a snapshot or bundle it is assumed that the source is unchanged, and that the required details are avaialble in a file called .patch, whose format current is a four field string in the following format: "$branchname $date.$time $sha1 $describe". The generator of these files currently resides on camel.booking.com. * git-describe is now used more directly with -v. When the prefix of git-describe matches the version number as determined by the defines in patchlevel.h then we use ONLY the git-describe output, otherwise we include the git describe in parenthesis after the version number. Either way the describe text is optionally followed by a star should there be uncommitted changes. eg: This is perl, v5.11.0 (GitLive-blead-136-g58ca560) built for i686-linux or: This is perl, v5.11.0-1-g58ca560 built for i686-linux or: This is perl, v5.11.0 built for i686-linux * include the SHA1 in perl -V summary, and automatically include unpushed commits in the registered patches list * include various git/version/.patch details in %Config, as follows: git_commit_id # sha1 of HEAD git_ancestor # ancestor in $remote/$branch (presumably canonical) git_describe # git describe git_branch # current branch git_uncommitted_changes # "true" if there are any, empty otherwise git_unpushed_commits # List of sha1's of unpushed commits git_commit_id_title # Used to make the perl -V summary output Additionally one more value is added depending on build process used: when building from an rsynced snapshot (or any dist including a file called .patch) then the second field will be used to populate the "git_snapshot_date" field. Otherwise if built in a git directory (as is hopefully recommended these day) then the field will be "git_commit_date" which will be the commit date of HEAD. This patch introduces two new files (on top of .patchnum) that will be generated by make_patchnum.sh: "lib/Config_git.pl" and "unpushed.h", the former is used to make git data available to Config.pm/%Config without rebuilding everything else, and the second is used to expose unpushed commits (if any) via the registered patch facility of patchlevel.h
… mro_alg when generating an SV containing the name.
…never contain more than "dfs", and even if C3 is loaded, 2 buckets are less than the default of 8.