Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pcre from 8.39 to 8.44 #2246

Merged
merged 3 commits into from Jun 11, 2021
Merged

Update pcre from 8.39 to 8.44 #2246

merged 3 commits into from Jun 11, 2021

Conversation

patrikjuvonen
Copy link
Contributor

@patrikjuvonen patrikjuvonen commented May 30, 2021

Summary

Tests

TBD

Validation

To help validate the integrity of the update I have created the following bash script that diffs between my PR branch and the official package provided from the PCRE website.

#!/bin/bash

PCRE_UPDATE_VERSION=8.44
PCRE_PATH_NAME=pcre-$PCRE_UPDATE_VERSION

GIT_REPO_BRANCH=vendor/pcre-$PCRE_UPDATE_VERSION
GIT_REPO_URL=https://github.com/patrikjuvonen/mtasa-blue.git
GIT_REPO_PCRE_PATH=vendor/pcre/

echo 1. Download and extract $PCRE_PATH_NAME...
curl https://ftp.pcre.org/pub/pcre/$PCRE_PATH_NAME.tar.gz | tar -xz

echo 2. Fetch and checkout the vendor update branch $GIT_REPO_BRANCH from $GIT_REPO_URL...
git fetch $GIT_REPO_URL $GIT_REPO_BRANCH:$GIT_REPO_BRANCH
git checkout $GIT_REPO_BRANCH

echo 3. Start checking integrity...
diff -r --strip-trailing-cr $GIT_REPO_PCRE_PATH $PCRE_PATH_NAME

echo 4. Completed.
exec $SHELL

Past PCRE updates in MTA

Date From To Link
September 2016 7.9 8.39 (current) 0d19a05

Changelog

Version 8.44 12 February-2020
-----------------------------

1. Setting --enable-jit=auto for an out-of-tree build failed because the
source directory wasn't in the search path for AC_TRY_COMPILE always. Patch
from Ross Burton.

2. Applied a patch from Michael Shigorin to fix 8.43 build on e2k arch
with lcc compiler (EDG frontend based); the problem it fixes is:

  lcc: "pcrecpp.cc", line 74: error: declaration aliased to undefined entity
       "_ZN7pcrecpp2RE6no_argE" [-Werror]

3. Change 2 for 8.43 omitted (*LF) from the list of start-of-pattern items. Now
added.

4. Fix ARMv5 JIT improper handling of labels right after a constant pool.

5. Small patch to pcreposix.c to set the erroroffset field to -1 immediately
after a successful compile, instead of at the start of matching to avoid a
sanitizer complaint (regexec is supposed to be thread safe).

6. Check the size of the number after (?C as it is read, in order to avoid
integer overflow.

7. Tidy up left shifts to avoid sanitize warnings; also fix one NULL deference
in pcretest.


Version 8.43 23-February-2019
-----------------------------

1. Some time ago the config macro SUPPORT_UTF8 was changed to SUPPORT_UTF
because it also applies to UTF-16 and UTF-32. However, this change was not made
in the pcre2cpp files; consequently the C++ wrapper has from then been compiled
with a bug in it, which would have been picked up by the unit test except that
it also had its UTF8 code cut out. The bug was in a global replace when moving
forward after matching an empty string.

2. The C++ wrapper got broken a long time ago (version 7.3, August 2007) when
(*CR) was invented (assuming it was the first such start-of-pattern option).
The wrapper could never handle such patterns because it wraps patterns in
(?:...)\z in order to support end anchoring. I have hacked in some code to fix
this, that is, move the wrapping till after any existing start-of-pattern
special settings.

3. "pcre2grep" (sic) was accidentally mentioned in an error message (fix was
ported from PCRE2).

4. Typo LCC_ALL for LC_ALL fixed in pcregrep.

5. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
negative class with no characters less than 0x100 followed by a positive class
with only characters less than 0x100, the first class was incorrectly being
auto-possessified, causing incorrect match failures.

6. If the only branch in a conditional subpattern was anchored, the whole
subpattern was treated as anchored, when it should not have been, since the
assumed empty second branch cannot be anchored. Demonstrated by test patterns
such as /(?(1)^())b/ or /(?(?=^))b/.

7. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
a greater than 1 fixed quantifier. This issue was found by Yunho Kim.

8. If a pattern started with a subroutine call that had a quantifier with a
minimum of zero, an incorrect "match must start with this character" could be
recorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to
be the first character of a match.

9. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel.


Version 8.42 20-March-2018
--------------------------

1.  Fixed a MIPS issue in the JIT compiler reported by Joshua Kinard.

2.  Fixed outdated real_pcre definitions in pcre.h.in (patch by Evgeny Kotkov).

3.  pcregrep was truncating components of file names to 128 characters when
processing files with the -r option, and also (some very odd code) truncating
path names to 512 characters. There is now a check on the absolute length of
full path file names, which may be up to 2047 characters long.

4.  Using pcre_dfa_exec(), in UTF mode when UCP support was not defined, there
was the possibility of a false positive match when caselessly matching a "not
this character" item such as [^\x{1234}] (with a code point greater than 127)
because the "other case" variable was not being initialized.

5. Although pcre_jit_exec checks whether the pattern is compiled
in a given mode, it was also expected that at least one mode is available.
This is fixed and pcre_jit_exec returns with PCRE_ERROR_JIT_BADOPTION
when the pattern is not optimized by JIT at all.

6. The line number and related variables such as match counts in pcregrep
were all int variables, causing overflow when files with more than 2147483647
lines were processed (assuming 32-bit ints). They have all been changed to
unsigned long ints.

7. If a backreference with a minimum repeat count of zero was first in a
pattern, apart from assertions, an incorrect first matching character could be
recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
as the first character of a match.

8. Fix out-of-bounds read for partial matching of /./ against an empty string
when the newline type is CRLF.

9. When matching using the the REG_STARTEND feature of the POSIX API with a
non-zero starting offset, unset capturing groups with lower numbers than a
group that did capture something were not being correctly returned as "unset"
(that is, with offset values of -1).

10. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string
containing multi-code-unit characters caused bad behaviour and possibly a
crash. This issue was fixed for other kinds of repeat in release 8.37 by change
38, but repeating character classes were overlooked.

11. A small fix to pcregrep to avoid compiler warnings for -Wformat-overflow=2.

12. Added --enable-jit=auto support to configure.ac.

13. Fix misleading error message in configure.ac.


Version 8.41 05-July-2017
-------------------------

1.  Fixed typo in CMakeLists.txt (wrong number of arguments for
PCRE_STATIC_RUNTIME (affects MSVC only).

2.  Issue 1 for 8.40 below was not correctly fixed. If pcregrep in multiline
mode with --only-matching matched several lines, it restarted scanning at the
next line instead of moving on to the end of the matched string, which can be
several lines after the start.

3.  Fix a missing else in the JIT compiler reported by 'idaifish'.

4.  A (?# style comment is now ignored between a basic quantifier and a
following '+' or '?' (example: /X+(?#comment)?Y/.

5.  Avoid use of a potentially overflowing buffer in pcregrep (patch by Petr
Pisar).

6.  Fuzzers have reported issues in pcretest. These are NOT serious (it is,
after all, just a test program). However, to stop the reports, some easy ones
are fixed:

    (a) Check for values < 256 when calling isprint() in pcretest.
    (b) Give an error for too big a number after \O.

7.  In the 32-bit library in non-UTF mode, an attempt to find a Unicode
property for a character with a code point greater than 0x10ffff (the Unicode
maximum) caused a crash.

8. The alternative matching function, pcre_dfa_exec() misbehaved if it
encountered a character class with a possessive repeat, for example [a-f]{3}+.

9. When pcretest called pcre_copy_substring() in 32-bit mode, it set the buffer
length incorrectly, which could result in buffer overflow.

10. Remove redundant line of code (accidentally left in ages ago).

11. Applied C++ patch from Irfan Adilovic to guard 'using std::' directives
with namespace pcrecpp (Bugzilla #2084).

12. Remove a duplication typo in pcre_tables.c.

13. Fix returned offsets from regexec() when REG_STARTEND is used with a
starting offset greater than zero.


Version 8.40 11-January-2017
----------------------------

1.  Using -o with -M in pcregrep could cause unnecessary repeated output when
    the match extended over a line boundary.

2.  Applied Chris Wilson's second patch (Bugzilla #1681) to CMakeLists.txt for
    MSVC static compilation, putting the first patch under a new option.

3.  Fix register overwite in JIT when SSE2 acceleration is enabled.

4.  Ignore "show all captures" (/=) for DFA matching.

5.  Fix JIT unaligned accesses on x86. Patch by Marc Mutz.

6.  In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode),
    without PCRE_UCP set, a negative character type such as \D in a positive
    class should cause all characters greater than 255 to match, whatever else
    is in the class. There was a bug that caused this not to happen if a
    Unicode property item was added to such a class, for example [\D\P{Nd}] or
    [\W\pL].

7.  When pcretest was outputing information from a callout, the caret indicator
    for the current position in the subject line was incorrect if it was after
    an escape sequence for a character whose code point was greater than
    \x{ff}.

8.  A pattern such as (?<RA>abc)(?(R)xyz) was incorrectly compiled such that
    the conditional was interpreted as a reference to capturing group 1 instead
    of a test for recursion. Any group whose name began with R was
    misinterpreted in this way. (The reference interpretation should only
    happen if the group's name is precisely "R".)

9.  A number of bugs have been mended relating to match start-up optimizations
    when the first thing in a pattern is a positive lookahead. These all
    applied only when PCRE_NO_START_OPTIMIZE was *not* set:

    (a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed
        both an initial 'X' and a following 'X'.
    (b) Some patterns starting with an assertion that started with .* were
        incorrectly optimized as having to match at the start of the subject or
        after a newline. There are cases where this is not true, for example,
        (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that
        start with spaces. Starting .* in an assertion is no longer taken as an
        indication of matching at the start (or after a newline).

@patrikjuvonen patrikjuvonen added enhancement New feature or request upstream Related to vendor library labels May 30, 2021
@patrikjuvonen patrikjuvonen added this to the Next Release (1.5.9) milestone May 30, 2021
@patrikjuvonen patrikjuvonen added this to In progress in Vendor upgrades via automation May 30, 2021
@patrikjuvonen patrikjuvonen marked this pull request as ready for review May 30, 2021 11:51
@Dutchman101
Copy link
Member

As discussed, merging it so the planned testing cycle can start on upcoming nightly build

@Dutchman101 Dutchman101 merged commit 36a4ff7 into multitheftauto:master Jun 11, 2021
Vendor upgrades automation moved this from In progress to Done Jun 11, 2021
@Dutchman101
Copy link
Member

Dutchman101 commented Jun 11, 2021

Tests

TBD

Stability results will be posted as a comment in 5 days. This one is a little trickier than the other PR's that just got merged for a testing cycle.

UPD: fully stable

@patrikjuvonen patrikjuvonen deleted the vendor/pcre-8.44 branch September 16, 2021 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request upstream Related to vendor library
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants