Skip to content
This repository
Browse code

Import from TRE 0.7.2 CVS tree.

I'm starting to use darcs for revision control in the TRE project.
This patch contains the CVS tree of the 0.7.2 release of TRE (more or
less).

Previously I was using CVS which was OK, but with darcs I get
rid of some problems I've been having with CVS:

  - Renaming or moving files and directories does not work in CVS.

  - With CVS it's difficult to commit patches as a logical whole,
    because there's no way to choose which parts of changes to a
    file I want to include in a change.

  - With CVS, I used to use a ChangeLog file and leave all CVS
    commit logs empty.  I've now realized that was stupid.  I
    intend to get rid of the ChangeLog file in favor of patch
    summary logs generated by darcs.

  - Making a CVS repository public is a big hassle compared to
    doing the same with a darcs repository.

darcs-hash:20050328151501-ced27-833e33c1850657e8f1abe8a1bdc38f83edc84502.gz
  • Loading branch information...
commit 68a5a91b4367abe5e9d1bb0d6be492aa833ac1a8 0 parents
authored March 28, 2005

Showing 84 changed files with 17,951 additions and 0 deletions. Show diff stats Hide diff stats

  1. 1  AUTHORS
  2. 815  ChangeLog
  3. 12  LICENSE
  4. 25  Makefile.am
  5. 166  NEWS
  6. 226  README
  7. 18  THANKS
  8. 50  TODO
  9. 247  config.h.in
  10. 526  configure.ac
  11. 6  doc/Makefile.am
  12. 167  doc/agrep.1.in
  13. 35  doc/default.css
  14. 799  doc/tre-api.html
  15. 403  doc/tre-syntax.html
  16. 37  lib/Makefile.am
  17. 69  lib/README
  18. 141  lib/regcomp.c
  19. 86  lib/regerror.c
  20. 250  lib/regex.h
  21. 374  lib/regexec.c
  22. 238  lib/tre-ast.c
  23. 139  lib/tre-ast.h
  24. 2,336  lib/tre-compile.c
  25. 38  lib/tre-compile.h
  26. 43  lib/tre-config.h.in
  27. 85  lib/tre-filter.c
  28. 19  lib/tre-filter.h
  29. 299  lib/tre-internal.h
  30. 837  lib/tre-match-approx.c
  31. 667  lib/tre-match-backtrack.c
  32. 529  lib/tre-match-parallel.c
  33. 215  lib/tre-match-utils.h
  34. 167  lib/tre-mem.c
  35. 78  lib/tre-mem.h
  36. 1,709  lib/tre-parse.c
  37. 61  lib/tre-parse.h
  38. 119  lib/tre-stack.c
  39. 80  lib/tre-stack.h
  40. 358  lib/xmalloc.c
  41. 87  lib/xmalloc.h
  42. 7  m4/Makefile.am
  43. 55  m4/ac_libtool_tags.m4
  44. 32  m4/ax_check_funcs_comp.m4
  45. 39  m4/ax_check_sign.m4
  46. 38  m4/ax_decl_wchar_max.m4
  47. 30  m4/tre_prog_cc_optimizations.m4
  48. 128  m4/vl_prog_cc_warnings.m4
  49. 2  po/LINGUAS
  50. 41  po/Makevars
  51. 7  po/POTFILES.in
  52. 246  po/fi.po
  53. 189  po/tre.pot
  54. 13  python/example.py
  55. 50  python/setup.py.in
  56. 558  python/tre-python.c
  57. 15  src/Makefile.am
  58. 749  src/agrep.c
  59. 54  tests/Makefile.am
  60. 478  tests/bench.c
  61. 1  tests/build-hosts/ahma
  62. 1  tests/build-hosts/hemuli
  63. 2  tests/build-hosts/hutcs
  64. 14  tests/build-hosts/jolly
  65. 47  tests/build-on-hosts.sh
  66. 23  tests/build-run.sh
  67. 32  tests/build-tests.sh
  68. 88  tests/randtest.c
  69. 1,481  tests/retest.c
  70. 133  tests/test-str-source.c
  71. 10  tre.pc.in
  72. 106  tre.spec.in
  73. 1  utils/Makefile.am
  74. 21  utils/autogen.sh
  75. 16  utils/build-release.sh
  76. 31  utils/build-rpm.sh
  77. 24  utils/build-sources.sh
  78. 30  utils/replace-vars.sh
  79. 178  win32/config.h
  80. 94  win32/retest.dsp
  81. 52  win32/tre-config.h.in
  82. 23  win32/tre.def
  83. 212  win32/tre.dsp
  84. 43  win32/tre.dsw
1  AUTHORS
... ...
@@ -0,0 +1 @@
  1
+Ville Laurikari <vl@iki.fi>
815  ChangeLog
... ...
@@ -0,0 +1,815 @@
  1
+Fri Dec 10 21:15:14 2004  Ville Laurikari  <vl@iki.fi>
  2
+
  3
+	* Released tre-0.7.2.
  4
+
  5
+Sat Dec  4 12:04:29 2004  Ville Laurikari  <vl@iki.fi>
  6
+
  7
+	* lib/tre-compile.c (tre_expand_ast): Bugfix.  If a back reference
  8
+	occurred after {m,n} in a regexp, its position was not updated
  9
+	causing incorrect match results.
  10
+
  11
+	* lib/tre-compile.c (tre_make_trans): Bugfix.  If a back reference
  12
+	was immediately followed by $ or ^, it triggered an assertion
  13
+	failure when compiling the regexp.
  14
+
  15
+	* tre/retest.c: Added regression tests to catch the above bugs.
  16
+
  17
+	* lib/tre-compile.c (tre_version): Changed to return a better
  18
+	human-readable string instead of just the version number.
  19
+
  20
+	* src/agrep.c (tre_agrep_handle_file): Bugfix.  The read buffer
  21
+	must be reset when starting to read a new file because the read
  22
+	loop may bail out without reading the full file.
  23
+	
  24
+Sun Nov 21 18:22:26 2004  Ville Laurikari  <vl@iki.fi>
  25
+
  26
+	* Released tre-0.7.1.
  27
+
  28
+Sat Nov 20 10:10:12 2004  Ville Laurikari  <vl@iki.fi>
  29
+
  30
+	* src/agrep.c: Added the --delimiter-after command line option.
  31
+	It can be used to output the record delimiter after the matching
  32
+	record when a custom delimiter regex has been given instead of
  33
+	before the matching record, which is the default.
  34
+
  35
+	* src/agrep.c: Added the --color (and --colour) command line
  36
+	option.  It highlights the matching part of the text with a color
  37
+	code from the GREP_COLOR environment variable, or red by default.
  38
+
  39
+	* src/agrep.c: Made some changes which hopefully make agrep faster
  40
+	in certain conditions.
  41
+	
  42
+	* win32/tre.def: Added reguexec.
  43
+
  44
+Sun Nov  7 17:26:54 2004  Ville Laurikari  <vl@iki.fi>
  45
+
  46
+	* Makefile.am: Fixed to include all files under the python
  47
+	directory to distributions.
  48
+
  49
+	* lib/*: Divided tre-compile.c to several smaller files, to make
  50
+	things easier to maintain.
  51
+
  52
+	* doc/agrep.1.in: Added this man page for agrep.
  53
+
  54
+Fri Sep 10 21:47:23 2004  Ville Laurikari  <vl@iki.fi>
  55
+
  56
+	* Released tre-0.7.0.
  57
+	
  58
+Sat Sep  4 14:55:00 2004  Ville Laurikari  <vl@iki.fi>
  59
+
  60
+	* lib/tre-compile.c (tre_parse): Added support for the \x1B and
  61
+	\x{263a} extensions for entering 8 bit and wide characters in
  62
+	hexadecimal.
  63
+
  64
+	* tests/retest.c: Added tests for the above.
  65
+
  66
+	* python/{tre-python.c, setup.py.in, example.py}, configure.ac:
  67
+	Added Python language bindings contributed by Nikolai SAOUKH.
  68
+	Thanks!
  69
+
  70
+	* lib/regex.c (tre_have_backrefs, tre_have_approx): Added these
  71
+	functions to query from a compiled regexp whether it uses back
  72
+	references or approximate matching, respectively.
  73
+
  74
+Sun Aug 29 19:30:01 2004  Ville Laurikari  <vl@iki.fi>
  75
+
  76
+	* Added the reguexec() function.  It can be used to match regexps
  77
+	over arbitrary data structures, since characters are fed to the
  78
+	matcher loop one by one with a user specified function.  Unless
  79
+	the backtracking matcher is used, the user specified function does
  80
+	not even need to keep the whole string in memory at once.
  81
+
  82
+	* tests/test-str-source.c: Test program for the above.
  83
+
  84
+Tue Aug  3 12:59:15 2004  Ville Laurikari  <vl@iki.fi>
  85
+
  86
+	* lib/regex.h: Added the REG_APPROX_MATCHER and
  87
+	REG_BACKTRACKING_MATCHER execution flags to force using the
  88
+	approximate matcher and backtracking matcher, respectively.
  89
+
  90
+	* tests/retest.c: Rewrote to run tests with different pmatch[] and
  91
+	nmatch arguments, different compilation flags and different
  92
+	matcher loops.
  93
+
  94
+	* lib/tre-match-approx.c (tre_tnfa_run_approx): Fixed to work
  95
+	correctly in multibyte mode.  Before the approximate matcher did
  96
+	not find matches if there were characters more than one byte long
  97
+	in the string.
  98
+
  99
+Sun Aug  1 19:42:59 2004  Ville Laurikari  <vl@iki.fi>
  100
+
  101
+	* lib/tre-compile.c (tre_parse): Added support for \Q and \E for
  102
+	turning REG_LITERAL on for parts of a regexp.
  103
+
  104
+Mon Jul  5 16:22:11 2004  Ville Laurikari  <vl@iki.fi>
  105
+
  106
+	* configure.ac: Fixed to prepend "-lgnugetopt" to LIBS if
  107
+	gnugetopt is needed for getopt_long().
  108
+
  109
+Sat Jul  3 12:47:51 2004  Ville Laurikari  <vl@iki.fi>
  110
+
  111
+	* lib/tre-compile.c, lib/regex.h: Added a new compilation flag
  112
+	REG_RIGHT_ASSOC.  It can be used to change concatenation
  113
+	associativity from left associative (the default) to right
  114
+	associative.
  115
+
  116
+	* lib/tre-compile.c (tre_parse): Added support for (?inr-inr)
  117
+	and (?inr-inr:regex) extensions which work like in Perl.  
  118
+	The (?inr-inr) extension allows turning the REG_ICASE,
  119
+	REG_NEWLINE, and REG_RIGHT_ASSOC flags on and off for chosen parts
  120
+	of a regexp, and the (?:regex) extension can be used to
  121
+	parenthesize a subexpression without capturing a submatch for it.
  122
+
  123
+	* lib/Makefile.am, lib/regexec.c, tests/Makefile.am: Fixed to
  124
+	compile with --disable-approx.
  125
+
  126
+	* lib/tre-match-approx.c: Bugfix.  There was a place where the
  127
+	tags array was modified even if it was empty, causing semi-random
  128
+	crashes.
  129
+	
  130
+Mon Jun 28 16:51:38 2004  Ville Laurikari  <vl@iki.fi>
  131
+
  132
+	* configure.ac: Added AC_SYS_LARGEFILE so that large files work
  133
+	with agrep.
  134
+
  135
+	* configure.ac: Added a call to AC_FUNC_ALLOCA unless
  136
+	--without-alloca is used.
  137
+
  138
+	* tests/retest.c: Changed to run all tests with all matcher
  139
+	backends when possible.  This makes the tests a lot more
  140
+	comprehensive, and already caught a crashbug in the approximate
  141
+	matcher.
  142
+
  143
+	* tre/win32/tre-config.h: Added version information so compilation
  144
+	on Windows will work again.
  145
+	
  146
+Wed May 26 20:11:15 2004  Ville Laurikari  <vl@iki.fi>
  147
+
  148
+	* Released tre-0.6.8.
  149
+
  150
+Tue May 25 20:34:12 2004  Ville Laurikari  <vl@iki.fi>
  151
+
  152
+	* configure.ac: Define _GNU_SOURCE so all GNU extensions get
  153
+	detected (such as iswblank()).
  154
+
  155
+	* lib/tre-internal.h: Removed an "#undef TRE_USE_SYSTEM_WCTYPE"
  156
+	which was not meant to be left in the release version.
  157
+
  158
+Mon May 10 19:45:11 2004  Ville Laurikari  <vl@iki.fi>
  159
+
  160
+	* m4/ax_check_funcs_comp.m4, m4/ax_check_sign.m4: Fixed to use
  161
+	"tr [a-z] [A-Z]" which works with Solaris /bin/tr.
  162
+
  163
+Sun May  9 19:37:10 2004  Ville Laurikari  <vl@iki.fi>
  164
+
  165
+	* Released tre-0.6.7.
  166
+
  167
+Sat May  8 21:50:49 2004  Ville Laurikari  <vl@iki.fi>
  168
+
  169
+	* src/agrep.c: Added the command line option -y.  It does nothing,
  170
+	but is needed for compatibility with the non-free agrep.
  171
+
  172
+	* lib/tre-compile.c (tre_parse): Fixed a bug which caused memory
  173
+	to be used exponentially with the number of macros (e.g. \s or \d)
  174
+	in a regexp.
  175
+
  176
+Sat May  1 11:47:27 2004  Ville Laurikari  <vl@iki.fi>
  177
+
  178
+	* lib/tre-match-utils.h (tre_neg_char_classes_match): Fixed to
  179
+	handle null bytes in multibyte strings (when string length
  180
+	explicitly given).  The previous version did not advance in the
  181
+	input string if a null byte was encountered, effectively leaving
  182
+	the matchers in an infinite loop.
  183
+
  184
+Sat Apr 24 12:58:19 2004  Ville Laurikari  <vl@iki.fi>
  185
+
  186
+	* configure.ac: Added --with-libutf8 and --without-libutf8 and
  187
+	checks for libutf8.  Now libutf8 is searched for if mbrtowc is not
  188
+	found elsewhere.  This means wide character support can be used on
  189
+	any platform where libutf8 works.
  190
+
  191
+	* m4/ax_check_funcs_comp.m4 (AX_CHECK_FUNCS_COMP): New macro
  192
+	working very much like AC_CHECK_FUNCS, but can be used to check
  193
+	for the existence of functions which are renamed with macros
  194
+	(libutf8 does this).
  195
+
  196
+	* m4/vl_*.m4: Renamed to most of these to ax_*.m4.
  197
+
  198
+	* lib/tre-compile.c: Removed wide L"..." string constants, which
  199
+	may not work with libutf8.  Replaced with 8 bit "..." strings and
  200
+	code to convert to wide character strings when needed.
  201
+
  202
+	* lib/tre-compile.c (tre_config): New function to check which
  203
+	optional features have been compiled into the library.  Useful
  204
+	especially when linking dynamically with libtre.
  205
+	
  206
+	* lib/tre-compile.c (tre_version): New function to get the version
  207
+	of the library.  This is just a convenience function for
  208
+	tre_config().
  209
+
  210
+Thu Apr 15 07:36:27 2004  Ville Laurikari  <vl@iki.fi>
  211
+
  212
+	* Changed to use iswalpha(), iswalnum(), etc. if iswctype() and
  213
+	wctype() are not available.  Now wide character support should
  214
+	work on systems where wctype() and/or iswctype() are not
  215
+	available, but the other functions are (old FreeBSD versions).
  216
+
  217
+	* configure.ac: Changed accordingly (iswctype and wctype no longer
  218
+	a requirement for wchar support).
  219
+
  220
+	* tests/Makefile.am: Use LTLIBINTL for linking the test programs.
  221
+	Now the tests should compile on hosts which have a separate
  222
+	libintl installed and a non-GNU C library.
  223
+
  224
+	* src/agrep.c: Fixed not to always print the filenames.
  225
+
  226
+Sun Apr  4 21:02:57 2004  Ville Laurikari  <vl@iki.fi>
  227
+
  228
+	* lib/tre-compile.c (tre_expand_ast): Fixed yet more bugs.  Sigh.
  229
+
  230
+Sun Mar 21 16:39:58 2004  Ville Laurikari  <vl@iki.fi>
  231
+
  232
+	* Released tre-0.6.6.
  233
+
  234
+Sun Mar 21 14:08:39 2004  Ville Laurikari  <vl@iki.fi>
  235
+
  236
+	* src/agrep.c: Added the command line option -H (--with-filename)
  237
+	to always print the filename for each match.
  238
+
  239
+Sun Mar 21 12:24:11 2004  Ville Laurikari  <vl@iki.fi>
  240
+
  241
+	* lib/tre-compile.c (tre_expand_ast): Fixed bugs which occurred
  242
+	sometimes when *, +, or ? repeats were used after {m,n} repeats in
  243
+	a regexp.
  244
+
  245
+	* tests/retest.c: Added some regression tests which catch the bug
  246
+	fixed above.
  247
+
  248
+	* tre.pc.in: Include @LIBINTL@ in the Libs field.
  249
+       
  250
+Fri Mar  5 23:49:38 2004  Ville Laurikari  <vl@iki.fi>
  251
+
  252
+	* Released tre-0.6.5.
  253
+
  254
+Fri Mar  5 23:16:57 2004  Ville Laurikari  <vl@iki.fi>
  255
+
  256
+	* tests/retest.c: Changed to run all regexec tests also with a
  257
+	NULL pmatch[] array.
  258
+
  259
+	* lib/tre-match-*.c: Fixed bugs related to NULL pmatch[] arrays.
  260
+
  261
+Fri Mar  5 20:40:25 2004  Ville Laurikari  <vl@iki.fi>
  262
+
  263
+	* lib/tre-compile.c (tre_expand_ast): Fixed a bug which caused too
  264
+	large indexes to be used for states if more than one (non-nested)
  265
+	{m,n} repeats were used in a regexp.
  266
+
  267
+	* doc/tre-syntax.html: Merged in additions from Dominick Meglio,
  268
+	thank you!
  269
+
  270
+	* tre/m4/ac_libtool_tags.m4: Backport of AC_LIBTOOL_TAGS from
  271
+	Libtool 1.6 to Libtool 1.5.x.
  272
+
  273
+	* configure.ac: Added AC_LIBTOOL_TAGS([]) so TRE can be compiled
  274
+	without working C++ and Fortran compilers.
  275
+	
  276
+Mon Jan  5 13:34:19 2004  Ville Laurikari  <vl@iki.fi>
  277
+
  278
+	* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Fixed
  279
+	bugs that caused crashes if REG_NOSUB was used.
  280
+
  281
+Fri Jan  2 16:53:10 2004  Ville Laurikari  <vl@iki.fi>
  282
+
  283
+	* Released tre-0.6.4.
  284
+
  285
+Fri Jan  2 16:14:37 2004  Ville Laurikari  <vl@iki.fi>
  286
+
  287
+	* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Fixed to
  288
+	compile if TRE_DEBUG is defined but TRE_WCHAR is not.  Thanks to
  289
+	Dominick Meglio for pointing this out.
  290
+
  291
+Fri Jan  2 09:53:02 2004  Ville Laurikari  <vl@iki.fi>
  292
+
  293
+	* lib/tre-compile.c (tre_copy_ast): Bugfix.  Did not recurse when
  294
+	handling an iteration node, and everything under the node was not
  295
+	genuinely copied but only referenced.  This caused things like
  296
+	"(a+){5}" not to work correctly; this example was basically
  297
+	equivalent to "a*" but in a very very slow way.
  298
+
  299
+Mon Dec 22 10:31:16 2003  Ville Laurikari  <vl@iki.fi>
  300
+
  301
+	* Released tre-0.6.3.
  302
+
  303
+Sun Dec 14 11:22:58 2003  Ville Laurikari  <vl@iki.fi>
  304
+
  305
+	* Bugfix.  Back references did not work if REG_NOSUB was used when
  306
+	compiling the regexp.
  307
+
  308
+Tue Dec  2 16:06:37 2003  Ville Laurikari  <vl@iki.fi>
  309
+
  310
+	* Small fixes and changes here and there to avoid compiler
  311
+	warnings.
  312
+
  313
+	* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Fixed to
  314
+	compile if TRE_WCHAR is not defined.
  315
+
  316
+	* lib/tre-mem.c (tre_mem_alloc_impl): Bugfix.  When a new block was
  317
+	allocated with malloc() the next tre_mem_alloc() returned a
  318
+	possibly unaligned pointer.
  319
+	
  320
+Sun Nov 23 21:52:58 2003  Ville Laurikari  <vl@iki.fi>
  321
+
  322
+	* Released tre-0.6.2.
  323
+
  324
+Sun Nov 23 18:40:57 2003  Ville Laurikari  <vl@iki.fi>
  325
+
  326
+	* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Bugfix.
  327
+	If the TNFA has a loop with an empty back reference, the matcher
  328
+	went to an infinite loop.   This happened e.g. with the regexp
  329
+	"()(\1)*".
  330
+
  331
+	* lib/regexec.c (tre_match_approx): Fixed to return error if the
  332
+	regexp has back references.
  333
+
  334
+	* lib/tre-compile.c (tre_parse): Bugfix in parsing empty
  335
+	expressions and missing closing parentheses.
  336
+
  337
+Sun Nov 16 20:27:17 2003  Ville Laurikari  <vl@iki.fi>
  338
+
  339
+	* lib/tre-match-approx.c (tre_tnfa_run_approx): Fixed a bug which
  340
+	caused non-optimal matches to be returned in some cases.
  341
+
  342
+	* tests/retest.c: Added a couple of tests.
  343
+	
  344
+Sat Nov 15 15:21:23 2003  Ville Laurikari  <vl@iki.fi>
  345
+
  346
+	* lib/tre-compile.c (tre_expand_ast): Fixed to handle nested
  347
+	repeats correctly.
  348
+
  349
+Wed Nov 12 21:45:44 2003  Ville Laurikari  <vl@iki.fi>
  350
+
  351
+	* lib/tre-compile.c: Fixed to compile if REG_LITERAL is not
  352
+	defined.
  353
+
  354
+	* lib/tre-match-backtrack.c: Fixed to compile without wide
  355
+	character support.
  356
+
  357
+Thu Nov  6 22:23:03 2003  Ville Laurikari  <vl@iki.fi>
  358
+
  359
+	* lib/tre-match-parallel.c: Bugfix.  If pmatch[] was null, the
  360
+	matcher loop referred past an array, sometimes crashing.
  361
+	
  362
+Mon Nov  3 17:41:37 2003  Ville Laurikari  <vl@iki.fi>
  363
+
  364
+	* Released tre-0.6.0.
  365
+
  366
+Sun Nov  2 20:27:37 2003  Ville Laurikari  <vl@iki.fi>
  367
+
  368
+	* lib/tre-compile.c (tre_parse): Implemented support for
  369
+	REG_LITERAL.  If REG_LITERAL is used, the entire regexp is
  370
+	interpreted as a literal word.
  371
+
  372
+Tue Oct 14 21:20:35 2003  Ville Laurikari  <vl@iki.fi>
  373
+
  374
+	* lib/tre-compile.c: Changed the parser to use a context object
  375
+	which contains all parse state instead of passing each state
  376
+	variable separately.  Fixed bug that caused `have_backrefs' to be
  377
+	reset if macros were used (this caused the wrong matcher to be
  378
+	used and back references not to work).
  379
+
  380
+Mon Sep 29 21:43:35 2003  Ville Laurikari  <vl@iki.fi>
  381
+
  382
+	* Added a "tre_" prefix to all functions that did not yet have it.
  383
+
  384
+	* lib/tre-compile.c: Separated regexp compilation from regcomp.c
  385
+	to this file.  Now all actual functionality is implemeted in
  386
+	lib/tre-*.c, and lib/reg*.c have the POSIX API wrappers.
  387
+
  388
+	* Implemented new syntax to control approximate matching
  389
+	parameters dynamically during matching.  Thanks to Bill Yerazunis
  390
+	for the suggestions!
  391
+
  392
+Wed Sep  3 19:41:40 2003  Ville Laurikari  <vl@iki.fi>
  393
+
  394
+	* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Bugfix.  Now
  395
+	matching back references works correctly in wide character mode.
  396
+
  397
+Thu Jul  3 19:47:33 2003  Ville Laurikari  <vl@iki.fi>
  398
+
  399
+	* configure.ac: Made --disable-system-abi the default.
  400
+
  401
+	* configure.ac: alloca() is no longer required.  Unless
  402
+	--without-alloca is specified, alloca() will be used if found.
  403
+
  404
+Fri May 15 09:39:34 2003  Ville Laurikari  <vl@iki.fi>
  405
+
  406
+	* tre/tre-match-approx.c (tre_tnfa_run_approx): Bugfix in handling
  407
+	insertions.  If an insertion was found that had better cost than a
  408
+	previous path, the tag values were not copied resulting in
  409
+	incorrect match and submatch positions being reported.
  410
+
  411
+Wed May 14 21:14:47 2003  Ville Laurikari  <vl@iki.fi>
  412
+
  413
+	* Released tre-0.5.3.
  414
+
  415
+Wed May 14 20:11:43 2003  Ville Laurikari  <vl@iki.fi>
  416
+
  417
+	* tre/tre-mem.c (tre_mem_alloc_impl): Bugfix.  The returned
  418
+	pointer was not always properly aligned.
  419
+
  420
+Tue May 13 19:55:30 2003  Ville Laurikari  <vl@iki.fi>
  421
+
  422
+	* configure.ac, lib/regex.h, lib/tre-internal.h: Fixed to compile
  423
+	if --disable-system-abi is used.
  424
+
  425
+	* lib/Makefile.am: Fixed to use $(LTLIBINTL), so gettext is found
  426
+	on systems where it is not in libc (e.g. FreeBSD has it in
  427
+	libintl).
  428
+
  429
+	Thanks to Dominick Meglio <codemstr@ptdprolog.net> for the above!
  430
+	
  431
+Thu May  8 21:23:30 2003  Ville Laurikari  <vl@iki.fi>
  432
+
  433
+	* win32/config.h, win32/tre-config.h: Updated and fixed
  434
+	compilation errors on Windows.  Enabled wide character and
  435
+	multibyte support.
  436
+
  437
+	* win32/tre.dsp, win32/retest.dsp: Link against msvcprt.lib to get
  438
+	wide character functions.
  439
+
  440
+	* src/retest.c: Don't try to call setlocale() on Windows (it seems
  441
+	to crash).
  442
+
  443
+	* lib/regcomp.c (parse_re): Fixed bugs in the regexp parser when
  444
+	wide character support is not used.  Also fixed some references
  445
+	past the end of the input string.
  446
+
  447
+	* lib/regex.h: regcomp, regexec, regerror, and regfree weren't
  448
+	defined if TRE_WCHAR was not defined.  Fixed.
  449
+	
  450
+Tue Apr 15 22:37:48 2003  Ville Laurikari  <vl@iki.fi>
  451
+
  452
+	* lib/tre-match-approx.c (tre_tnfa_run_approx): Fixed bugs.  A
  453
+	match starting earlier was sometimes preferred over a match with a
  454
+	smaller cost, and insertions were not handled correctly.
  455
+
  456
+	* src/agrep.c: Implemented the -B (best match) mode.  It scans the
  457
+	input files twice; first to find out what is the cost of the best
  458
+	matching record(s), and another time to output all records that
  459
+	match with that cost.
  460
+
  461
+	* test/test-approx.c: Added some simple test cases.
  462
+	
  463
+Sun Apr 13 13:53:00 2003  Ville Laurikari  <vl@iki.fi>
  464
+
  465
+	* doc/tre-api.html, doc/tre-syntax.html: Beginnings of
  466
+	API and regexp syntax documentation.
  467
+
  468
+	* lib/tre-mem.c: Changed to allocate blocks bigger than
  469
+	TRE_MEM_BLOCK_SIZE if the requested amount is large.  This fixes
  470
+	REG_ESPACE problems when trying to compile large regexps,
  471
+	especially ones with a lot of "|".
  472
+
  473
+	* Released tre-0.5.2.
  474
+
  475
+Mon Apr  7 18:39:05 2003  Ville Laurikari  <vl@iki.fi>
  476
+
  477
+	* lib/regcomp.c, lib/tre-match-parallel.c: Added support for
  478
+	non-greedy repetition operators "*?", "+?", "??", and "{m,n}?".
  479
+	They work similarly to the ones in Perl.
  480
+
  481
+	* tests/retest.c: Added tests for minimal repetition operators.
  482
+	
  483
+	* tre.pc.in: Added pkgconfig file.
  484
+
  485
+Tue Apr  1 20:20:37 2003  Ville Laurikari  <vl@iki.fi>
  486
+
  487
+	* lib/tre-match-parallel.c, lib/tre-match-approx.c: Fixed
  488
+	alignment bugs when allocating pointers from a buffer.
  489
+
  490
+Sat Mar 15 20:35:11 2003  Ville Laurikari  <vl@iki.fi>
  491
+
  492
+	* lib/regcomp.c (parse_re): Fixed to allow the empty regexp.
  493
+	These were already allowed inside parentheses (e.g. "(a|)"), but
  494
+	e.g. "a|" caused REG_EPAREN to be returned.  Now "", "a|", "|a",
  495
+	"*", "?", etc. work as expected.
  496
+
  497
+Thu Mar 13 19:49:20 2003  Ville Laurikari  <vl@iki.fi>
  498
+
  499
+	* lib/tre-match-backtrack.c: Bugfix.  Stopped too early when
  500
+	scanning asciiz strings.
  501
+
  502
+	* configure.in: System ABI support: added checks for absolute path
  503
+	to regex.h and a field in the system defined regex_t suitable for
  504
+	storing a pointer to a TNFA.  TRE is now configured by default to
  505
+	be compatible with the system regex ABI, unless
  506
+	--disable-system-abi is used.
  507
+
  508
+	* lib/regex.h, lib/tre-config.h.in: System ABI support: if
  509
+	TRE_USE_SYSTEM_REGEX_H if defined, include system regex.h instead
  510
+	of defining everything here. 
  511
+
  512
+	* lib/regcomp.c, lib/regexec.c, lib/tre-config.h.in:
  513
+	System ABI support: use the configured field in regex_t struct for
  514
+	getting and setting the pointer to the TFNA.
  515
+
  516
+Thu Feb 27 19:32:43 2003  Ville Laurikari  <vl@iki.fi>
  517
+
  518
+	* lib/tre-match-*.[ch]: Fixed several references past the end of
  519
+	the input string.
  520
+
  521
+	* lib/tre-match-approx.c: Fixed bugs in submatch tracking.
  522
+
  523
+	* configure.in: Added flag --disable-agrep to disable building and
  524
+	installing agrep.
  525
+	
  526
+	* Released tre-0.5.1.
  527
+	
  528
+Sun Feb 23 14:43:06 2003  Ville Laurikari  <vl@iki.fi>
  529
+
  530
+	* Released tre-0.5.0.
  531
+
  532
+Fri Feb 21 19:10:30 2003  Ville Laurikari  <vl@iki.fi>
  533
+
  534
+	* win32/: New directory, contains project and workspace files for
  535
+	compiling TRE and `retest' for Windows with MS Visual C++.
  536
+	Original version contributed by Aymeric Moizard <jack@atosc.org>,
  537
+	thank you!
  538
+
  539
+Sun Feb 16 12:55:13 2003  Ville Laurikari  <vl@iki.fi>
  540
+
  541
+	* lib/regcomp.c: Rewrote code that adds tags in the AST for
  542
+	submatch addressing.  Changes include:
  543
+	  - Submatch boundaries now all have a tag with offset zero.  This
  544
+	    makes it possible to get correct submatches for approximate
  545
+	    matches.
  546
+	  - Removed marker and boundary tags.  Now nested submatches are
  547
+	    tracked and that information is used to reset submatches from
  548
+	    old repetitions.
  549
+	  - Bounded iterations are now expanded after adding tags instead
  550
+	    of at parse time.  This makes the code a lot cleaner.
  551
+	  
  552
+	* lib/regexec.c, lib/tre-internal.h: Related changes (no more
  553
+	marker and boundary tags).
  554
+
  555
+	* tests/test-approx.c: Small test program for approximate
  556
+	matcher.
  557
+
  558
+Mon Jan 13 20:28:52 2003  Ville Laurikari  <vl@iki.fi>
  559
+
  560
+	* lib/tre-match-approx.c: Now returns submatches of approximate
  561
+	matches in the `pmatch[]' array of the `regamatch_t' struct.
  562
+
  563
+Sun Jan 12 13:52:37 2003  Ville Laurikari  <vl@iki.fi>
  564
+
  565
+	* lib/regexec.c, lib/regex.h: Changed API of approximate matching
  566
+	functions.  This API is easier to extend without having to change
  567
+	the applications using the API at all.
  568
+
  569
+	* src/agrep.c: New command line option --show-cost (-s) to prefix
  570
+	the cost of the match found to each output line.
  571
+
  572
+	* tests/retest.c: Added tests for back referencing.
  573
+
  574
+	* lib/*: Rearranged stuff.  Split all three matchers (parallel,
  575
+	approximate, backtracking) into separate files.  Put tre-mem into
  576
+	its own files.
  577
+
  578
+Mon Jan  6 21:18:12 2003  Ville Laurikari  <vl@iki.fi>
  579
+
  580
+	* utils/autogen.sh: Fixed (must run aclocal before automake).
  581
+
  582
+	* m4/vl_check_sign.m4, m4/vl_decl_wchar_max.m4, configure.in:
  583
+	Updated for new autoconf style (AC_TRY_COMPILE ->
  584
+	AC_COMPILE_IFELSE, AC_ERROR -> AC_MSG_ERROR, etc.)
  585
+
  586
+	* lib/regcomp.c, lib/regexec.c, lib/regexec-bt.c, lib/tre.h:
  587
+	Implemented support for back references.  A backtracking routine
  588
+	implemented in `regex-bt.c' is used instead of the parallel
  589
+	matcher if back references are used is the regexp.
  590
+
  591
+Fri Nov 29 20:57:52 2002  Ville Laurikari  <vl@iki.fi>
  592
+
  593
+	* configure.in: New options --disable-wchar and
  594
+	--disable-multibyte that disable wchar_t support (and requirement)
  595
+	and multibyte character support, respectively.
  596
+
  597
+	* lib/regcomp.c, lib/regex.h, lib/regexec.c, lib/tre.h: Related
  598
+	changes.
  599
+
  600
+Sun Oct 20 21:55:56 2002  Ville Laurikari  <vl@iki.fi>
  601
+
  602
+	* configure.in: Check getopt_long support.
  603
+
  604
+	* lib/regcomp.c (ast_compute_tag_info): Merged into ast_add_tags
  605
+	and removed this function.
  606
+	
  607
+	* lib/regcomp.c (ast_add_tags, parse_re, parse_bound): Bugfixes.
  608
+	Range repetition did not work correctly when applied to a 
  609
+	parenthesized subexpression.  For example, "a{5,6}" worked correctly,
  610
+	but "(a|b){5,6}" did not.
  611
+
  612
+Sun Oct 20 20:27:24 2002  Ville Laurikari  <vl@iki.fi>
  613
+
  614
+	* Changed the name of the package from "libtre" to just "TRE".
  615
+	
  616
+	* lib/regexec.c (tnfa_execute_approx): Implemented approximate
  617
+	regexp matching.
  618
+
  619
+	* lib/regex.h (regaexec, reganexec, regawexec, regawnexec):
  620
+	Added approximate matching API.
  621
+
  622
+	* src/agrep.c: First version of agrep (approximate grep).  Uses
  623
+	the new approximate matching feature in the matcher library.
  624
+
  625
+	* lib/regexec.c (tnfa_execute): Added a loop to quickly skip over
  626
+	characters that cannot possibly be the first character of a
  627
+	match.
  628
+	
  629
+	* lib/regcomp.c (regwncomp): Related changes.
  630
+	
  631
+Sat Aug  3 23:42:29 2002  Ville Laurikari  <vl@iki.fi>
  632
+
  633
+	* Moved the library part from src/ to lib/, and changed the name of
  634
+	macros/ to m4/.
  635
+
  636
+	* lib/regex.c: Split into `regcomp.c', `regexec.c', and
  637
+	`regerror.c'.
  638
+
  639
+	* lib/regerror.c, lib/regex.h: Threw away regwerror() since it was
  640
+	pretty useless.
  641
+
  642
+	* lib/regerror.c: Internationalized.  The error messages returned
  643
+	by regerror() are now localized through gettext() if found.
  644
+	Note that libintl is *not* included in the TRE package.
  645
+
  646
+	* po/fi.po: Finnish translation.
  647
+	
  648
+	* lib/regcomp.c (parse_re): Fixed bugs (there were references to
  649
+	before the start of the regexp string).
  650
+
  651
+	* lib/regcomp.c (parse_re): Fixed bugs in parsing BREs.
  652
+
  653
+	* tests/retest.c: Added test cases for the BRE stuff I fixed.
  654
+
  655
+	* lib/regexec.c (tnfa_execute): Fixed to work when the length of
  656
+	the input string is given (e.g. with regnexec()).
  657
+	
  658
+Sun Jul 28 00:19:04 2002  Ville Laurikari  <vl@iki.fi>
  659
+
  660
+	* lib/Makefile.am: Changed to install the header file `regex.h' to
  661
+	$(includedir)/tre to avoid accidental inclusion with
  662
+	"#include <regex.h>".
  663
+
  664
+Mon Jun 24 19:34:50 2002  Ville Laurikari  <vl@iki.fi>
  665
+
  666
+	* src/regex.c (ast_compute_nfl): Bugfix, did not mark an iteration
  667
+	node as nullable if the minimum number of iterations was above
  668
+	zero and the child was nullable.  As a result, e.g. "(a*)+" did not
  669
+	match the empty string.
  670
+
  671
+	* src/regex.c (ast_compute_tag_info): Bugfix, the tree was
  672
+	traversed in the wrong order resulting in incorrect num_tags
  673
+	counts for nodes in some cases.  The results ranged from missing
  674
+	submatches to segfaults. 
  675
+
  676
+	* src/regex.c (make_transitions): Bugfix, if a transition between
  677
+	two states was already handled then the code aborted the loop when
  678
+	it should have just skipped to the next iteration.  The result was
  679
+	that sometimes some transitions were not added to the NFA and
  680
+	matches were not found.
  681
+
  682
+	* src/regex.c (parse_bracket_items): Bugfix, referred one or two
  683
+	characters past the end of the string in several places.  E.g.
  684
+	compiling the regex "[a-" could cause a segfault.   
  685
+
  686
+	* src/regex.c (fill_pmatch): Bugfix.  If the marker boundary tag
  687
+	number is bigger than tnfa->num_tags, the marker boundary tag
  688
+	does not exist (or rather it does, but is the same as the match
  689
+	end point).  The code here still used marker_boundary even if it
  690
+	was bigger than num_tags, causing either segfaults, missing
  691
+	submatches, or no symptoms.
  692
+
  693
+	* src/regex.c (tnfa_execute): Bugfix.  When matches were found,
  694
+	the first tag value was checked to be smaller than for the
  695
+	previous match.  Firstly, this was a redundant and useless check.
  696
+	Secondly, it caused a segfault if REG_NOSUB was used when
  697
+	compiling the regexp since there are no tags in that case and the
  698
+	array is NULL.
  699
+	
  700
+	* tests/retest.c: Added tests for all of the above.  Thanks to
  701
+	Glenn Fowler for running into the bugs and providing the test
  702
+	cases.
  703
+
  704
+	* Released libtre-0.3.2.
  705
+
  706
+Wed Mar 27 21:48:48 2002  Ville Laurikari  <vl@iki.fi>
  707
+
  708
+	* src/regex.c: Added support for new zero-width assertions \b, \B,
  709
+	\<, and \>.  Fixed a bug in ^ and $.
  710
+
  711
+	* src/regex.c (parse_bound): Bugfix, had forgotten to handle
  712
+	boundaries of the form "{12,}" altogether.
  713
+
  714
+	* src/regex.c (ast_add_tags): Bugfix, set the direction of the
  715
+	current tag to MAXIMIZE at ADDTAGS_POST_CATENATION, but should not
  716
+	have.
  717
+
  718
+	* src/regex.c (parse_re): A `)' is now interpreted as an ordinary
  719
+	character in the absence of a matched `('.
  720
+
  721
+	* src/regex.c (regwncomp): Bugfix, did not set preg->re_nsub to
  722
+	the number of parenthesized subexpressions.
  723
+
  724
+	* tests/retest.c: Added tests for all of the above.
  725
+
  726
+	* src/regex.c: Fixed to be completely thread safe.  A single
  727
+	compiled regexp can now be used simultaneously in several
  728
+	contexts, e.g. in main() and a signal handler, or multiple
  729
+	threads.
  730
+	
  731
+Wed Mar 20 19:50:37 2002  Ville Laurikari  <vl@iki.fi>
  732
+
  733
+	* src/regex.c: Added support for Perl-compatible syntax
  734
+	extensions: \t, \n, \r, \f, \a, \e, \w, \W, \s, \S, \d, \D.
  735
+
  736
+	* src/regex.c: Now expands character classes when using 8 bit
  737
+	character sets so that iswctype() calls are avoided during
  738
+	matching.
  739
+
  740
+Sun Mar  3 17:50:14 2002  Ville Laurikari  <vl@iki.fi>
  741
+
  742
+	* macros/*: Updated all macros (they were renamed from AC_* to
  743
+	VL_*).
  744
+
  745
+	* macros/vl_check_sign.m4, macros/vl_decl_wchar_max.m4: Added.
  746
+
  747
+	* src/regex.c: Memory management cleanups.  Much of the small
  748
+	memory blocks, like AST nodes, are now allocated in large blocks
  749
+	instead of one by one using the `tre_mem_t' allocator.  This got
  750
+	rid of hundreds of lines of confusing memory management code.
  751
+	
  752
+Sat Feb 16 23:47:01 2002  Ville Laurikari  <vl@iki.fi>
  753
+
  754
+	* macros/ac_prog_cc_warnings.m4: Updated to version 1.3.
  755
+
  756
+	* macros/ac_decl_wchar_max.m4: Added this macro for checking
  757
+	whether WCHAR_MAX is defined, and defining it if it isn't.
  758
+
  759
+	* configure.in: Added some checks for wide character stuff.
  760
+
  761
+	* src/regex.c: Added `tre_' prefix to all local type names to
  762
+	avoid conflicts.
  763
+	
  764
+Mon Feb 11 21:48:03 2002  Ville Laurikari  <vl@iki.fi>
  765
+
  766
+	* src/regex.c (parse_bound, parse_re): Enabled support for
  767
+	bound expressions.  The iterated atom is duplicated by parsing it
  768
+	many times -- this seemed to be the simplest way to do it.
  769
+
  770
+	* src/Makefile.am: Changed library name from `libregx' to
  771
+	`libtre'.
  772
+
  773
+	* tests/retest.c: Added tests for bound expressions.
  774
+	
  775
+Sun Feb 10 18:47:23 2002  Ville Laurikari  <vl@iki.fi>
  776
+
  777
+	* src/regex.c (set_union): Bugfix, had `set1[s2].neg_classes' where
  778
+	should have been `set2[s2].neg_classes' (this caused crashes).
  779
+
  780
+	* src/regex.c (ast_to_efree_tnfa): Bugfix, didn't check for
  781
+	infinite maximum iteration count before making transitions for
  782
+	them.
  783
+
  784
+	* src/regex.c: Added interfaces regncomp() and regwncomp() which
  785
+	look only at the first `n' characters of the regexp pattern.  Null
  786
+	characters are allowed in the regexps when using these functions.
  787
+	
  788
+Sat Feb  9 22:39:08 2002  Ville Laurikari  <vl@iki.fi>
  789
+
  790
+	* src/regex.c: Added wide character interface: regwcomp(),
  791
+	regwexec(), and regwerror().  They work exactly like regcomp(),
  792
+	regexec() and regerror() except that the strings are
  793
+	`wchar_t *'.  Also added support for multibyte character sets.
  794
+	Fixed a lot of bugs (memory leaks, crashes) here and there.
  795
+
  796
+	* tests/retest.c: Added tests for multibyte character sets and
  797
+	regcomp() error reporting.
  798
+
  799
+	* tests/randtest.c: Makes random strings and tries to compile them
  800
+	with regcomp().  This can be used to find memory leaks and crashes
  801
+	in the regexp compiler.
  802
+	
  803
+Sun Jan 27 21:42:06 2002  Ville Laurikari  <vl@iki.fi>
  804
+
  805
+	* src/regex.c: Added support for bracket expressions
  806
+        (e.g. "[abc]").  Multicharacter collating elements are not
  807
+	supported, neither are equivalence classes.
  808
+
  809
+	* test: Renamed this directory to `tests'.
  810
+
  811
+Sun Dec  2 20:20:12 2001  Ville Laurikari  <vl@iki.fi>
  812
+
  813
+	* First public release.
  814
+	
  815
+
12  LICENSE
... ...
@@ -0,0 +1,12 @@
  1
+
  2
+This software is free; you can redistribute it and/or modify it under
  3
+the terms of the GNU General Public License version 2 (June 1991) as
  4
+published by the Free Software Foundation.  See the file `COPYING' for
  5
+the complete license.
  6
+
  7
+If you cannot accept the GNU GPL, it may be possible that you can
  8
+persuade me to give or sell you a version which is licensed
  9
+differently.  Please contact me for more information.
  10
+
  11
+  Ville Laurikari <vl@iki.fi>
  12
+  http://www.iki.fi/vl/
25  Makefile.am
... ...
@@ -0,0 +1,25 @@
  1
+## Process this file with automake to produce Makefile.in
  2
+
  3
+if TRE_AGREP
  4
+agrep_dirs = src doc
  5
+else
  6
+agrep_dirs =
  7
+endif
  8
+
  9
+SUBDIRS = lib $(agrep_dirs) tests utils po m4
  10
+
  11
+EXTRA_DIST = utils/config.rpath  \
  12
+	LICENSE \
  13
+	win32/tre-config.h win32/config.h \
  14
+	win32/tre.dsw \
  15
+	win32/tre.dsp win32/tre.def \
  16
+	win32/retest.dsp \
  17
+	python/tre-python.c \
  18
+	python/setup.py \
  19
+	python/example.py
  20
+
  21
+ACLOCAL_AMFLAGS = -I m4
  22
+AC_CONFIG_AUX_DIR = utils
  23
+
  24
+pkgconfigdir = $(libdir)/pkgconfig
  25
+pkgconfig_DATA = tre.pc
166  NEWS
... ...
@@ -0,0 +1,166 @@
  1
+Version 0.7.2
  2
+  - Bug fixes.
  3
+
  4
+Version 0.7.1
  5
+  - New command line options to agrep: --delimiter-after, --color
  6
+
  7
+  - Man page for agrep added.
  8
+
  9
+  - Some bugs fixed.
  10
+
  11
+Version 0.7.0
  12
+  - reguexec() added.
  13
+
  14
+  - tre_have_backrefs() and tre_have_approx() added.
  15
+
  16
+  - New syntax: \Q, \E, \x1B, \x{263a}, (?inr-inr), (?inr-inr:regex).
  17
+
  18
+  - New compilation flag REG_RIGHT_ASSOC.
  19
+
  20
+  - New execution flags REG_APPROX_MATCHER and REG_BACKTRACKING_MATCHER.
  21
+
  22
+  - Included Python language bindings contributed by Nikolai SAOUKH.
  23
+
  24
+  - Several bugs and compilation problems fixed.
  25
+
  26
+Version 0.6.8
  27
+  - Fixed to use iswctype() if found instead of always using
  28
+    iswalpha() and friends.  Now [[:blank:]] should work again on
  29
+    systems where iswctype() is available.
  30
+
  31
+Version 0.6.7
  32
+  - Fixed the -h (--no-filename) option of agrep to work again.
  33
+
  34
+  - Added the -y option to agrep.  It does nothing, and exists only
  35
+    for compatibility with the non-free version of agrep.
  36
+
  37
+  - Fixed bugs: handling null bytes in multibyte mode, exponential
  38
+    memory usage problem when using lots of macros (e.g. \s or \d) in
  39
+    a regexp, and bugs in expanding {m,n} repeats (still!).
  40
+
  41
+  - wctype() and iswctype() are no longer required for wchar support,
  42
+    iswalpha() and friends will be used instead if wctype() and
  43
+    iswctype() are not found.
  44
+
  45
+  - Added support for compiling against libutf8.
  46
+
  47
+  - Added the tre_config() function to get information about the
  48
+    optional features compiled in the TRE library.  Also added
  49
+    tre_version().
  50
+
  51
+  - Some documentation updates.
  52
+
  53
+Version 0.6.6
  54
+  - Fixed bugs which occurred sometimes when "{m,n}" repeats were used
  55
+    in conjunction with "*", "+", or "?".
  56
+
  57
+  - Added the -H (--with-filename) option to agrep.
  58
+
  59
+Version 0.6.5
  60
+  - Fixed bug which occurred whine several "{m,n}" repeats were used
  61
+    in one regex.
  62
+
  63
+  - Fixed several bugs related to REG_NOSUB or NULL pmatch[] arrays
  64
+    being used for regexec().
  65
+
  66
+  - C++ or Fortran compilers no longer checked by the configure
  67
+    script.
  68
+
  69
+  - Some documentation additions.
  70
+
  71
+Version 0.6.4
  72
+  - Fixed bug in handling iterations (like "+" and "*") inside "{m,n}"
  73
+    repeats.  This should get rid of performance problems and
  74
+    incorrect results with certain regexps involving "{m,n}" repeats.
  75
+
  76
+Version 0.6.3
  77
+  - Fixed back references when REG_NOSUB is used.
  78
+
  79
+  - Compilation errors and warnings fixed.  Now this should compile on
  80
+    systems that don't have wide character support, like OpenBSD, and
  81
+    works on 64 bit machines.
  82
+
  83
+Version 0.6.2
  84
+  - Bug fixes.
  85
+
  86
+Version 0.6.1
  87
+  - Bug fixes.
  88
+
  89
+  - Some documentation updates.
  90
+
  91
+Version 0.6.0
  92
+  - The doc/ directory is now actually included in source
  93
+    distributions (oops).
  94
+
  95
+  - Bug fixes.
  96
+
  97
+  - alloca() is no longer a requirement.  The configure script still
  98
+    looks for it, and it is used if found.
  99
+
  100
+  - New approximate matching syntax.  The new syntax allows
  101
+    approximate matching to be done even using the standard regex API
  102
+    (match costs are only available when the regaexec() API is used).
  103
+
  104
+  - REG_LITERAL implemented.
  105
+
  106
+Version 0.5.3
  107
+  - Bug fixes and compilation fixes.
  108
+
  109
+  - Best match mode (-B) for agrep.
  110
+
  111
+Version 0.5.2
  112
+  - System ABI support.  TRE is now by default configured to be
  113
+    compatible with the system regex binary interface (by including
  114
+    the system regex.h and using the definitions there instead of
  115
+    TRE's own).  This can be disabled with --disable-system-abi.
  116
+
  117
+  - Added a pkg-config file `tre.pc'.
  118
+
  119
+  - Added support for minimal (non-greedy) repetition operators
  120
+    "*?", "+?", "??", and "{m,n}?".  They work similarly to the ones
  121
+    in Perl, except the number of characters matched is minimized
  122
+    instead of the number of repetitions.
  123
+
  124
+  - Added some documentation in the doc/ subdirectory.
  125
+
  126
+  - Bug fixes.
  127
+
  128
+Version 0.5.1
  129
+  - Bug fixes.
  130
+
  131
+Version 0.5.0
  132
+  - Approximate matching functions now fill the pmatch[] array of
  133
+    submatches if wanted.
  134
+
  135
+  - Support for back referencing (not for approximate matching).
  136
+
  137
+  - Changed approximate matching API to be more easily extendible in
  138
+    the future.  The match cost is now returned.
  139
+
  140
+  - Bug fixes.
  141
+
  142
+  - Windows project files (original versions contributed by Aymeric
  143
+    Moizard <jack@atosc.org>, thanks!).
  144
+
  145
+Version 0.4.1
  146
+  - Fixed installed headers.
  147
+
  148
+  - Fixed compilation problems.
  149
+
  150
+Version 0.4.0
  151
+  - The name of the package changed to TRE.
  152
+
  153
+  - New API for approximate regexp matching.
  154
+
  155
+  - New command line utility `agrep' for approximate regexp matching
  156
+    in the style of grep.
  157
+
  158
+  - New translation for Finnish (fi) has been added.
  159
+
  160
+  - Optimizations in regexec.
  161
+
  162
+  - Wide character support and multibyte character set support can be
  163
+    turned off with --disable-wchar and --disable-multibyte,
  164
+    respectively.
  165
+
  166
+  - Lots of bugfixes.
226  README
... ...
@@ -0,0 +1,226 @@
  1
+Introduction
  2
+
  3
+   TRE is a lightweight, robust, and efficient POSIX compliant regexp
  4
+   matching library with some exciting features such as approximate
  5
+   matching.
  6
+
  7
+   At the core of TRE is a new algorithm for regular expression
  8
+   matching with submatch addressing. The algorithm uses linear
  9
+   worst-case time in the length of the text being searched, and
  10
+   quadratic worst-case time in the length of the used regular
  11
+   expression. In other words, the time complexity of the algorithm is
  12
+   O(M2N), where M is the length of the regular expression and N is
  13
+   the length of the text. The used space is also quadratic on the
  14
+   length of the regex, but does not depend on the searched
  15
+   string. This quadratic behaviour occurs only on pathological cases
  16
+   which are probably very rare in practice.
  17
+
  18
+Features
  19
+
  20
+   TRE is not just yet another regexp matcher. TRE has some features
  21
+   which are not there in most free POSIX compatible implementations.
  22
+   Most of these features are not present in non-free implementations
  23
+   either, for that matter.
  24
+
  25
+Approximate matching
  26
+
  27
+   Approximate pattern matching allows matches to be approximate, that
  28
+   is, allows the matches to be close to the searched pattern under
  29
+   some measure of closeness. TRE uses the edit-distance measure (also
  30
+   known as the Levenshtein distance) where characters can be
  31
+   inserted, deleted, or substituted in the searched text in order to
  32
+   get an exact match. Each insertion, deletion, or substitution adds
  33
+   the distance, or cost, of the match. TRE can report the matches
  34
+   which have a cost lower than some given threshold value. TRE can
  35
+   also be used to search for matches with the lowest cost.
  36
+
  37
+   TRE includes a version of the agrep command line tool for
  38
+   approximate regexp matching in the style of grep.  Unlike other
  39
+   agrep implementations (like the one by Sun Wu and Udi Manber from
  40
+   University of Arizona available) TRE agrep allows full regexps of
  41
+   any length, any number of errors, and non-uniform costs for
  42
+   insertion, deletion and substitution.
  43
+
  44
+Strict standard conformance
  45
+
  46
+   POSIX defines the behaviour of regexp functions precisely.  TRE
  47
+   attempts to conform to these specifications as strictly as
  48
+   possible.  TRE always returns the correct matches for subpatterns,
  49
+   for example.  Very few other implementations do this correctly. In
  50
+   fact, the only other implementations besides TRE that I am aware of
  51
+   (free or not) that get it right are Rx by Tom Lord, Regex++ by John
  52
+   Maddock, and the AT&T ast regex by Glenn Fowler and Doug McIlroy.
  53
+
  54
+   The standard TRE tries to conform to is the IEEE Std 1003.1-2001,
  55
+   or Open Group Base Specifications Issue 6, commonly referred to as
  56
+   "POSIX".  The relevant parts are the base specifications on regular
  57
+   expressions (and the rationale) and the description of the
  58
+   regcomp() API.
  59
+
  60
+   For an excellent survey on POSIX regexp matchers, see the testregex
  61
+   pages by Glenn Fowler of AT&T Labs Research.
  62
+
  63
+Predictable matching speed
  64
+
  65
+   Because of the novel matching algorithm used in TRE, the maximum
  66
+   time consumed by any regexec() call is always directly proportional
  67
+   to the length of the searched string.  There is one exception: if
  68
+   back references are used, the matching may take time that grows
  69
+   exponentially with the length of the string.  This is because
  70
+   matching back references is an NP complete problem, and almost
  71
+   certainly requires exponential time to match in the worst case.
  72
+
  73
+Predictable and modest memory consumption
  74
+
  75
+   A regexec() call never allocates memory from the heap. TRE
  76
+   allocates all the memory it needs during a regcomp() call, and some
  77
+   temporary working space from the stack frame for the duration of
  78
+   the regexec() call. The amount of temporary space needed is
  79
+   constant during matching and does not depend on the searched
  80
+   string. For regexps of reasonable size TRE needs less than 50K of
  81
+   dynamically allocated memory during the regcomp() call, less than
  82
+   20K for the compiled pattern buffer, and less than two kilobytes of
  83
+   temporary working space from the stack frame during a regexec()
  84
+   call. There is no time/memory tradeoff. TRE is also small in code
  85
+   size; statically linking with TRE increases the executable size
  86
+   less than 30K (gcc-3.2, x86, GNU/Linux).
  87
+
  88
+Wide character and multibyte character set support
  89
+
  90
+   TRE supports multibyte character sets. This makes it possible to
  91
+   use regexps seamlessly with, for example, Japanese locales. TRE
  92
+   also provides a wide character API.
  93
+
  94
+Binary pattern and data support
  95
+
  96
+   TRE provides APIs which allow binary zero characters both in
  97
+   regexps and searched strings. The standard API cannot be easily
  98
+   used to, for example, search for printable words from binary data
  99
+   (although it is possible with some hacking).  Searching for
  100
+   patterns which contain binary zeroes embedded is not possible at
  101
+   all with the standard API.
  102
+
  103
+Completely thread safe
  104
+
  105
+   TRE is completely thread safe.  All the exported functions are
  106
+   re-entrant, and a single compiled regexp object can be used
  107
+   simultaneously in multiple contexts; e.g.  in main() and a signal
  108
+   handler, or in many threads of a multithreaded application.
  109
+
  110
+Portable
  111
+
  112
+   TRE is portable across multiple platforms. Here's a table of
  113
+   platforms and compilers that have been successfully used to compile
  114
+   and run TRE:
  115
+
  116
+     Platform(s)                       | Compiler(s)
  117
+     ----------------------------------+------------
  118
+     Various GNU/Linux systems         | GCC 2.95.3, GCC 3.2.1
  119
+     Solaris 2.8 sparc                 | GCC 3.2.1, Sun Workshop 6 compilers
  120
+     Solaris 2.8 x86                   | Sun Workshop 6 compilers
  121
+     AIX 4.3.2                         | C for AIX compiler version 5, GCC 3.2.1
  122
+     Windows 2000, Windows 98          | Microsoft Visual C++ 6.0
  123
+     Cygwin 1.3.20-1 (under Windows 98)| GCC 3.2.1
  124
+     Compaq Tru64 UNIX V5.1A           | Compaq C V6.4-014
  125
+     Compaq Tru64 UNIX V5.1B           | Compaq C V6.5-011
  126
+     Digital UNIX V4.0                 | DEC C V5.9-005
  127
+     HP-UX 10.20                       | HP C Compiler A.10.32.30
  128
+     HP-UX 11                          | HP C Compiler A.11.00.00
  129
+     IRIX 6.5                          | MIPSpro Compilers 7.3.1.3m, GCC 3.0.4
  130
+     NetBSD 1.5.2                      | egcs-1.1.2
  131
+
  132
+   TRE 0.5.3 should compile without changes on all of the above
  133
+   platforms.  Tell me if you are using TRE on a platform that is
  134
+   not listed above, and I'll add it to the list.
  135
+
  136
+Free
  137
+
  138
+   TRE is free; you can redistribute it and/or modify it under the
  139
+   terms of the GNU General Public License version 2 (June 1991) as
  140
+   published by the Free Software Foundation. See the file COPYING,
  141
+   included in the distribution packages, for the complete license.
  142
+
  143
+   If you cannot accept the GNU GPL, it may be possible that you can
  144
+   persuade me to give or sell you a version which is licensed
  145
+   differently.  Please contact me for more information.
  146
+
  147
+Roadmap
  148
+
  149
+   There are currently two features, both related to collating
  150
+   elements, missing from 100% POSIX compliance. These are:
  151
+
  152
+     * Support for collating elements (e.g. [[.<X>.]], where <X> is a
  153
+       collating element). It is not possible to support
  154
+       multi-character collating elements portably, since POSIX does
  155
+       not define a way to determine whether a character sequence is a
  156
+       multi-character collating element or not.
  157
+
  158
+     * Support for equivalence classes, for example [[=<X>=]], where
  159
+       <X> is a collating element. An equivalence class matches any
  160
+       character which has the same primary collation weight as
  161
+       <X>. Again, POSIX provides no portable mechanism for
  162
+       determining the primary collation weight of a collating
  163
+       element.
  164
+
  165
+   Note that other portable regexp implementations don't support
  166
+   collating elements either.  The single exception is Regex++, which
  167
+   comes with its own database for collating elements for different
  168
+   locales.
  169
+
  170
+   These are other features I'm planning to implement:
  171
+
  172
+     * All the missing GNU extensions enabled in GNU regex, such as
  173
+       [[: and [[:>:]]
  174
+
  175
+     * New syntax for approximate matching to control the approximate
  176
+       matching parameters (deletion/insertion/substitution costs,
  177
+       maximum cost) in different parts of the regexp.
  178
+
  179
+     * A REG_SHORTEST regexec() flag for returning the shortest match
  180
+       instead of the longest match.
  181 </