Final file tidies for 10.32

rurban · Sep 11, 2018 · ebc9edb · ebc9edb
1 parent e1215b2
commit ebc9edb
Show file tree

Hide file tree

Showing 14 changed files with 680 additions and 641 deletions.
diff --git a/ChangeLog b/ChangeLog
@@ -155,37 +155,37 @@ bcopy() doesn't return a result. This feature is now refactored always to call
 an emulation function when there is no memmove(). The emulation makes use of
 bcopy() when available.
 
-34. When serializing a pattern, set the memctl, executable_jit, and tables 
-fields (that is, all the fields that contain pointers) to zeros so that the 
-result of serializing is always the same. These fields are re-set when the 
+34. When serializing a pattern, set the memctl, executable_jit, and tables
+fields (that is, all the fields that contain pointers) to zeros so that the
+result of serializing is always the same. These fields are re-set when the
 pattern is deserialized.
 
 35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
 negative class with no characters less than 0x100 followed by a positive class
-with only characters less than 0x100, the first class was incorrectly being 
+with only characters less than 0x100, the first class was incorrectly being
 auto-possessified, causing incorrect match failures.
 
-36. Removed the character type bit ctype_meta, which dates from PCRE1 and is 
+36. Removed the character type bit ctype_meta, which dates from PCRE1 and is
 not used in PCRE2.
 
 37. Tidied up unnecessarily complicated macros used in the escapes table.
 
-38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted 
-from distribution tarballs, owing to a typo in Makefile.am which had 
+38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted
+from distribution tarballs, owing to a typo in Makefile.am which had
 testoutput8-16-3 twice. Now fixed.
 
-39. If the only branch in a conditional subpattern was anchored, the whole 
-subpattern was treated as anchored, when it should not have been, since the 
-assumed empty second branch cannot be anchored. Demonstrated by test patterns 
+39. If the only branch in a conditional subpattern was anchored, the whole
+subpattern was treated as anchored, when it should not have been, since the
+assumed empty second branch cannot be anchored. Demonstrated by test patterns
 such as /(?(1)^())b/ or /(?(?=^))b/.
 
-40. A repeated conditional subpattern that could match an empty string was 
+40. A repeated conditional subpattern that could match an empty string was
 always assumed to be unanchored. Now it it checked just like any other
-repeated conditional subpattern, and can be found to be anchored if the minimum 
+repeated conditional subpattern, and can be found to be anchored if the minimum
 quantifier is one or more. I can't see much use for a repeated anchored
 pattern, but the behaviour is now consistent.
 
-41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint 
+41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint
 (for an event that could never occur but you had to have external information
 to know that).
 
@@ -194,7 +194,7 @@ there was a line that was sufficiently long to cause the input buffer to be
 expanded, the variable holding the location of the end of the previous match
 was being adjusted incorrectly, and could cause an overflow warning from a code
 sanitizer. However, as the value is used only to print pending "after" lines
-when the next match is reached (and there are no such lines in this case) this 
+when the next match is reached (and there are no such lines in this case) this
 bug could do no damage.
 
 

diff --git a/LICENCE b/LICENCE
@@ -4,11 +4,11 @@ PCRE2 LICENCE
 PCRE2 is a library of functions to support regular expressions whose syntax
 and semantics are as close as possible to those of the Perl 5 language.
 
-Release 10 of PCRE2 is distributed under the terms of the "BSD" licence, as
-specified below, with one exemption for certain binary redistributions. The
-documentation for PCRE2, supplied in the "doc" directory, is distributed under
-the same terms as the software itself. The data in the testdata directory is
-not copyrighted and is in the public domain.
+Releases 10.00 and above of PCRE2 are distributed under the terms of the "BSD"
+licence, as specified below, with one exemption for certain binary
+redistributions. The documentation for PCRE2, supplied in the "doc" directory,
+is distributed under the same terms as the software itself. The data in the
+testdata directory is not copyrighted and is in the public domain.
 
 The basic library functions are written in C and are freestanding. Also
 included in the distribution is a just-in-time compiler that can be used to

diff --git a/NEWS b/NEWS
@@ -1,11 +1,12 @@
 News about PCRE2 releases
 -------------------------
 
-Version 10.32 13-August-2018
-----------------------------
+
+Version 10.32 10-September-2018
+-------------------------------
 
 This is another mainly bugfix and tidying release with a few minor
-enhancements.
+enhancements. These are the main ones:
 
 1. pcre2grep now supports the inclusion of binary zeros in patterns that are
 read from files via the -f option.
@@ -22,7 +23,7 @@ parameter now applies to pcre2_dfa_match().
 
 5. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
 
-6. Added support for \N{U+dddd}, but not in EBCDIC environments.
+6. Added support for \N{U+dddd}, but only in Unicode mode.
 
 7. Added support for (?^) to unset all imnsx options.
 

diff --git a/configure.ac b/configure.ac
@@ -10,8 +10,8 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
 
 m4_define(pcre2_major, [10])
 m4_define(pcre2_minor, [32])
-m4_define(pcre2_prerelease, [-RC1])
-m4_define(pcre2_date, [2018-08-13])
+m4_define(pcre2_prerelease, [])
+m4_define(pcre2_date, [2018-09-10])
 
 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # 50 lines of this file. Please update that if the variables above are moved.
@@ -839,7 +839,7 @@ AC_SUBST(EXTRA_LIBPCRE2_POSIX_LDFLAGS)
 
 # When we run 'make distcheck', use these arguments. Turning off compiler
 # optimization makes it run faster.
-DISTCHECK_CONFIGURE_FLAGS="CFLAGS='' CXXFLAGS='' --enable-pcre2-16 --enable-pcre2-32 --enable-jit --enable-utf"
+DISTCHECK_CONFIGURE_FLAGS="CFLAGS='' CXXFLAGS='' --enable-pcre2-16 --enable-pcre2-32 --enable-jit"
 AC_SUBST(DISTCHECK_CONFIGURE_FLAGS)
 
 # Check that, if --enable-pcre2grep-libz or --enable-pcre2grep-libbz2 is

diff --git a/doc/html/pcre2api.html b/doc/html/pcre2api.html
@@ -1804,7 +1804,8 @@ <h1>pcre2api man page</h1>
 the use of this option provokes an error. Details of how PCRE2_UTF changes the
 behaviour of PCRE2 are given in the
 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
-page.
+page. In particular, note that it changes the way PCRE2_CASELESS handles
+characters with code points greater than 127.
 <a name="extracompileoptions"></a></P>
 <br><b>
 Extra compile options
@@ -2776,7 +2777,7 @@ <h1>pcre2api man page</h1>
 pattern are never changed. That is, if a pattern contains <i>n</i> capturing
 parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
 <b>pcre2_match()</b>. The other elements retain whatever values they previously
-had.
+had. After a failed match attempt, the contents of the ovector are unchanged.
 <a name="matchotherdata"></a></P>
 <br><a name="SEC30" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
 <P>
@@ -3192,6 +3193,12 @@ <h1>pcre2api man page</h1>
 allocate memory for the compiled code.
 </P>
 <P>
+If an external <i>match_data</i> block is provided, its contents afterwards
+are those set by the final call to <b>pcre2_match()</b>, which will have
+ended in a matching error. The contents of the ovector within the match data
+block may or may not have been changed.
+</P>
+<P>
 The <i>outlengthptr</i> argument must point to a variable that contains the
 length, in code units, of the output buffer. If the function is successful, the
 value is updated to contain the length of the new string, excluding the
@@ -3658,7 +3665,7 @@ <h1>pcre2api man page</h1>
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 03 August 2018
+Last updated: 07 September 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>

diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
@@ -399,14 +399,15 @@ <h1>pcre2pattern man page</h1>
   \ddd        character with octal code ddd, or backreference
   \o{ddd..}   character with octal code ddd..
   \xhh        character with hex code hh
-  \x{hhh..}   character with hex code hhh.. (default mode)
-  \N{U+hhh..} character with Unicode code point hhh..
+  \x{hhh..}   character with hex code hhh..
+  \N{U+hhh..} character with Unicode hex code point hhh..
   \uhhhh      character with hex code hhhh (when PCRE2_ALT_BSUX is set)
 </pre>
+The \N{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
+is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
+\N{name} to specify characters by Unicode name; PCRE2 does not support this.
 Note that when \N is not followed by an opening brace (curly bracket) it has
 an entirely different meaning, matching any character that is not a newline.
-Perl also uses \N{name} to specify characters by Unicode name; PCRE2 does not
-support this.
 </P>
 <P>
 The precise effect of \cx on ASCII characters is as follows: if x is a lower
@@ -530,7 +531,8 @@ <h1>pcre2pattern man page</h1>
 Invalid Unicode code points are all those in the range 0xd800 to 0xdfff (the
 so-called "surrogate" code points). The check for these can be disabled by the
 caller of <b>pcre2_compile()</b> by setting the option
-PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
+PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. However, this is possible only in UTF-8
+and UTF-32 modes, because these values are not representable in UTF-16.
 </P>
 <br><b>
 Escape sequences in character classes
@@ -3595,13 +3597,16 @@ <h1>pcre2pattern man page</h1>
 an immediate backtrack.
 </P>
 <P>
-(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
-the subroutine match to fail.
+(*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail when
+triggered by being backtracked to in a subpattern called as a subroutine. There
+is then a backtrack at the outer level.
 </P>
 <P>
-(*THEN) skips to the next alternative in the innermost enclosing group within
-the subpattern that has alternatives. If there is no such group within the
-subpattern, (*THEN) causes the subroutine match to fail.
+(*THEN), when triggered, skips to the next alternative in the innermost
+enclosing group within the subpattern that has alternatives (its normal
+behaviour). However, if there is no such group within the subroutine
+subpattern, the subroutine match fails and there is a backtrack at the outer
+level.
 </P>
 <br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
 <P>
@@ -3619,7 +3624,7 @@ <h1>pcre2pattern man page</h1>
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 03 August 2018
+Last updated: 04 September 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>

diff --git a/doc/html/pcre2syntax.html b/doc/html/pcre2syntax.html
@@ -70,7 +70,7 @@ <h1>pcre2syntax man page</h1>
   \ddd       character with octal code ddd, or backreference
   \o{ddd..}  character with octal code ddd..
   \U         "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
-  \N{U+hh..} character with Unicode code point hh..
+  \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
   \uhhhh     character with hex code hhhh (if PCRE2_ALT_BSUX is set)
   \xhh       character with hex code hh
   \x{hh..}   character with hex code hh..
@@ -634,7 +634,7 @@ <h1>pcre2syntax man page</h1>
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 01 August 2018
+Last updated: 02 September 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>

diff --git a/doc/html/pcre2unicode.html b/doc/html/pcre2unicode.html
@@ -26,7 +26,8 @@ <h1>pcre2unicode man page</h1>
 with the PCRE2_UTF option flag, or the pattern must start with the sequence
 (*UTF). When either of these is the case, both the pattern and any subject
 strings that are matched against it are treated as UTF strings instead of
-strings of individual one-code-unit characters.
+strings of individual one-code-unit characters. There are also some other
+changes to the way characters are handled, as documented below.
 </P>
 <P>
 If you do not need Unicode support you can build PCRE2 without it, in which
@@ -59,6 +60,11 @@ <h1>pcre2unicode man page</h1>
 also recognized; larger ones can be coded using \o{...}.
 </P>
 <P>
+The escape sequence \N{U+&#60;hex digits&#62;} is recognized as another way of
+specifying a Unicode character by code point in a UTF mode. It is not allowed
+in non-UTF modes.
+</P>
+<P>
 In UTF modes, repeat quantifiers apply to complete UTF characters, not to
 individual code units.
 </P>
@@ -294,9 +300,9 @@ <h1>pcre2unicode man page</h1>
 REVISION
 </b><br>
 <P>
-Last updated: 17 May 2017
+Last updated: 02 September 2018
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.