Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 104 additions & 6 deletions ext/pcre/pcrelib/ChangeLog
Original file line number Diff line number Diff line change
@@ -1,6 +1,104 @@
ChangeLog for PCRE
------------------

Version 8.36 26-September-2014
------------------------------

1. Got rid of some compiler warnings in the C++ modules that were shown up by
-Wmissing-field-initializers and -Wunused-parameter.

2. The tests for quantifiers being too big (greater than 65535) were being
applied after reading the number, and stupidly assuming that integer
overflow would give a negative number. The tests are now applied as the
numbers are read.

3. Tidy code in pcre_exec.c where two branches that used to be different are
now the same.

4. The JIT compiler did not generate match limit checks for certain
bracketed expressions with quantifiers. This may lead to exponential
backtracking, instead of returning with PCRE_ERROR_MATCHLIMIT. This
issue should be resolved now.

5. Fixed an issue, which occures when nested alternatives are optimized
with table jumps.

6. Inserted two casts and changed some ints to size_t in the light of some
reported 64-bit compiler warnings (Bugzilla 1477).

7. Fixed a bug concerned with zero-minimum possessive groups that could match
an empty string, which sometimes were behaving incorrectly in the
interpreter (though correctly in the JIT matcher). This pcretest input is
an example:

'\A(?:[^"]++|"(?:[^"]*+|"")*+")++'
NON QUOTED "QUOT""ED" AFTER "NOT MATCHED

the interpreter was reporting a match of 'NON QUOTED ' only, whereas the
JIT matcher and Perl both matched 'NON QUOTED "QUOT""ED" AFTER '. The test
for an empty string was breaking the inner loop and carrying on at a lower
level, when possessive repeated groups should always return to a higher
level as they have no backtrack points in them. The empty string test now
occurs at the outer level.

8. Fixed a bug that was incorrectly auto-possessifying \w+ in the pattern
^\w+(?>\s*)(?<=\w) which caused it not to match "test test".

9. Give a compile-time error for \o{} (as Perl does) and for \x{} (which Perl
doesn't).

10. Change 8.34/15 introduced a bug that caused the amount of memory needed
to hold a pattern to be incorrectly computed (too small) when there were
named back references to duplicated names. This could cause "internal
error: code overflow" or "double free or corruption" or other memory
handling errors.

11. When named subpatterns had the same prefixes, back references could be
confused. For example, in this pattern:

/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/

the reference to 'Name' was incorrectly treated as a reference to a
duplicate name.

12. A pattern such as /^s?c/mi8 where the optional character has more than
one "other case" was incorrectly compiled such that it would only try to
match starting at "c".

13. When a pattern starting with \s was studied, VT was not included in the
list of possible starting characters; this should have been part of the
8.34/18 patch.

14. If a character class started [\Qx]... where x is any character, the class
was incorrectly terminated at the ].

15. If a pattern that started with a caseless match for a character with more
than one "other case" was studied, PCRE did not set up the starting code
unit bit map for the list of possible characters. Now it does. This is an
optimization improvement, not a bug fix.

16. The Unicode data tables have been updated to Unicode 7.0.0.

17. Fixed a number of memory leaks in pcregrep.

18. Avoid a compiler warning (from some compilers) for a function call with
a cast that removes "const" from an lvalue by using an intermediate
variable (to which the compiler does not object).

19. Incorrect code was compiled if a group that contained an internal recursive
back reference was optional (had quantifier with a minimum of zero). This
example compiled incorrect code: /(((a\2)|(a*)\g<-1>))*/ and other examples
caused segmentation faults because of stack overflows at compile time.

20. A pattern such as /((?(R)a|(?1)))+/, which contains a recursion within a
group that is quantified with an indefinite repeat, caused a compile-time
loop which used up all the system stack and provoked a segmentation fault.
This was not the same bug as 19 above.

21. Add PCRECPP_EXP_DECL declaration to operator<< in pcre_stringpiece.h.
Patch by Mike Frysinger.


Version 8.35 04-April-2014
--------------------------

Expand All @@ -27,9 +125,9 @@ Version 8.35 04-April-2014

6. Improve character range checks in JIT. Characters are read by an inprecise
function now, which returns with an unknown value if the character code is
above a certain treshold (e.g: 256). The only limitation is that the value
must be bigger than the treshold as well. This function is useful, when
the characters above the treshold are handled in the same way.
above a certain threshold (e.g: 256). The only limitation is that the value
must be bigger than the threshold as well. This function is useful when
the characters above the threshold are handled in the same way.

7. The macros whose names start with RAWUCHAR are placeholders for a future
mode in which only the bottom 21 bits of 32-bit data items are used. To
Expand Down Expand Up @@ -1544,7 +1642,7 @@ Version 8.10 25-Jun-2010

7. Minor change to pcretest.c to avoid a compiler warning.

8. Added four artificial Unicode properties to help with an option to make
8. Added four artifical Unicode properties to help with an option to make
\s etc use properties (see next item). The new properties are: Xan
(alphanumeric), Xsp (Perl space), Xps (POSIX space), and Xwd (word).

Expand Down Expand Up @@ -4169,7 +4267,7 @@ Version 4.3 21-May-03
(i) The utf8_table... variables are now declared "const".

(ii) The code for \cx, which used the "case flipping" table to upper case
lower case letters, now just subtracts 32. This is ASCII-specific,
lower case letters, now just substracts 32. This is ASCII-specific,
but the whole concept of \cx is ASCII-specific, so it seems
reasonable.

Expand Down Expand Up @@ -5431,7 +5529,7 @@ by an auxiliary program - but can then be edited by hand if required. There are
now no calls to isalnum(), isspace(), isdigit(), isxdigit(), tolower() or
toupper() in the code.

7. Turn the malloc/free functions variables into pcre_malloc and pcre_free and
7. Turn the malloc/free funtions variables into pcre_malloc and pcre_free and
make them global. Abolish the function for setting them, as the caller can now
set them directly.

Expand Down
2 changes: 1 addition & 1 deletion ext/pcre/pcrelib/HACKING
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@ reference number if the reference is to a unique capturing group (either by
number or by name). When named groups are used, there may be more than one
group with the same name. In this case, a reference by name generates OP_DNREF
or OP_DNREFI. These are followed by two counts: the index (not the byte offset)
in the group name table of the first entry for the required name, followed by
in the group name table of the first entry for the requred name, followed by
the number of groups with the same name.


Expand Down
7 changes: 7 additions & 0 deletions ext/pcre/pcrelib/NEWS
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
News about PCRE releases
------------------------

Release 8.36 26-September-2014
------------------------------

This is primarily a bug-fix release. However, in addition, the Unicode data
tables have been updated to Unicode 7.0.0.


Release 8.35 04-April-2014
--------------------------

Expand Down
20 changes: 11 additions & 9 deletions ext/pcre/pcrelib/README
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,16 @@ the 16-bit library, which processes strings of 16-bit values, and one for the
32-bit library, which processes strings of 32-bit values. The distribution also
includes a set of C++ wrapper functions (see the pcrecpp man page for details),
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
C++.
C++. Other C++ wrappers have been created from time to time. See, for example:
https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
style to the C API.

In addition, there is a set of C wrapper functions (again, just for the 8-bit
library) that are based on the POSIX regular expression API (see the pcreposix
man page). These end up in the library called libpcreposix. Note that this just
provides a POSIX calling interface to PCRE; the regular expressions themselves
still follow Perl syntax and semantics. The POSIX API is restricted, and does
not give full access to all of PCRE's facilities.
The distribution also contains a set of C wrapper functions (again, just for
the 8-bit library) that are based on the POSIX regular expression API (see the
pcreposix man page). These end up in the library called libpcreposix. Note that
this just provides a POSIX calling interface to PCRE; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE's facilities.

The header file for the POSIX-style functions is called pcreposix.h. The
official POSIX name is regex.h, but I did not want to risk possible problems
Expand Down Expand Up @@ -392,7 +394,7 @@ library. They are also documented in the pcrebuild man page.
avoided by linking with libedit (which has a BSD licence) instead.

Enabling libreadline causes the -lreadline option to be added to the pcretest
build. In many operating environments with a system-installed readline
build. In many operating environments with a sytem-installed readline
library this is sufficient. However, in some environments (e.g. if an
unmodified distribution version of readline is in use), it may be necessary
to specify something like LIBS="-lncurses" as well. This is because, to quote
Expand Down Expand Up @@ -988,4 +990,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 17 January 2014
Last updated: 24 October 2014
10 changes: 5 additions & 5 deletions ext/pcre/pcrelib/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_NAME "PCRE"

/* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE 8.35"
#define PACKAGE_STRING "PCRE 8.36"

/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre"
Expand All @@ -271,7 +271,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_URL ""

/* Define to the version of this package. */
#define PACKAGE_VERSION "8.35"
#define PACKAGE_VERSION "8.36"

/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
parentheses (of any kind) in a pattern. This limits the amount of system
Expand Down Expand Up @@ -322,7 +322,7 @@ sure both macros are undefined; an emulation function will then be used. */
/* #undef STDC_HEADERS */

/* Define to any value to enable support for Just-In-Time compiling. */
#define SUPPORT_JIT
/* #undef SUPPORT_JIT */

/* Define to any value to allow pcregrep to be linked with libbz2, so that it
is able to handle .bz2 files. */
Expand All @@ -348,7 +348,7 @@ sure both macros are undefined; an emulation function will then be used. */
/* #undef SUPPORT_PCRE8 */

/* Define to any value to enable JIT support in pcregrep. */
#define SUPPORT_PCREGREP_JIT
/* #undef SUPPORT_PCREGREP_JIT */

/* Define to any value to enable support for Unicode properties. */
/* #undef SUPPORT_UCP */
Expand All @@ -363,7 +363,7 @@ sure both macros are undefined; an emulation function will then be used. */
/* #undef SUPPORT_VALGRIND */

/* Version number of package */
#define VERSION "8.35"
#define VERSION "8.36"

/* Define to empty if `const' does not conform to ANSI C. */
/* #undef const */
Expand Down
70 changes: 39 additions & 31 deletions ext/pcre/pcrelib/doc/pcre.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1242,7 +1242,7 @@ PCRETEST OPTION FOR LIBREADLINE SUPPORT
pcretest linked in this way, there may be licensing issues.

Setting this option causes the -lreadline option to be added to the
pcretest build. In many operating environments with a system-installed
pcretest build. In many operating environments with a sytem-installed
libreadline this is sufficient. However, in some environments (e.g. if
an unmodified distribution version of readline is in use), some extra
configuration may be necessary. The INSTALL file for libreadline says
Expand Down Expand Up @@ -5326,21 +5326,25 @@ BACKSLASH
Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is:

Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,
Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma,
Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic,
Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-
gana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian,
Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko,
Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic,
Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari-
tan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese,
Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,
Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,
Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali,
Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car-
ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei-
form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero-
glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha,
Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana,
Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin-
ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic,
Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean,
New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian,
Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya,
Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac,
Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi,
Yi.

Each character has exactly one Unicode general category property, spec-
Expand Down Expand Up @@ -7777,21 +7781,25 @@ PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P

SCRIPT NAMES FOR \p AND \P

Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,
Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma,
Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic,
Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-
gana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian,
Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko,
Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic,
Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari-
tan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese,
Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,
Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,
Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali,
Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car-
ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei-
form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero-
glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha,
Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana,
Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin-
ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic,
Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean,
New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian,
Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya,
Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac,
Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi,
Yi.


Expand Down
8 changes: 4 additions & 4 deletions ext/pcre/pcrelib/pcre.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
/* This is the public header file for the PCRE library, to be #included by
applications that call the PCRE functions.

Copyright (c) 1997-2015 University of Cambridge
Copyright (c) 1997-2014 University of Cambridge

-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
/* The current PCRE version information. */

#define PCRE_MAJOR 8
#define PCRE_MINOR 35
#define PCRE_PRERELEASE
#define PCRE_DATE 2014-04-04
#define PCRE_MINOR 36
#define PCRE_PRERELEASE
#define PCRE_DATE 2014-09-26

/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE, the appropriate
Expand Down
Loading