Permalink
Browse files

initial import of upstream libre

From: cvs -d:pserver:anonymous@libre.cvs.sourceforge.net:/cvsroot/libre
  • Loading branch information...
avsm committed Dec 21, 2011
0 parents commit 594c50761808dd42a5658e9901dc7d9acd65823e
Showing with 5,754 additions and 0 deletions.
  1. +12 −0 Changes
  2. +14 −0 INSTALL
  3. +504 −0 LICENSE
  4. +3 −0 META
  5. +68 −0 Makefile
  6. +70 −0 README
  7. +39 −0 TODO.txt
  8. +657 −0 automata.ml
  9. +78 −0 automata.mli
  10. +120 −0 cset.ml
  11. +23 −0 cset.mli
  12. +24 −0 depend
  13. +953 −0 re.ml
  14. +127 −0 re.mli
  15. +122 −0 re_emacs.ml
  16. +32 −0 re_emacs.mli
  17. +138 −0 re_glob.ml
  18. +32 −0 re_glob.mli
  19. +226 −0 re_perl.ml
  20. +36 −0 re_perl.mli
  21. +154 −0 re_posix.ml
  22. +74 −0 re_posix.mli
  23. +279 −0 re_str.ml
  24. +187 −0 re_str.mli
  25. +9 −0 tests/.cvsignore
  26. +19 −0 tests/CVS/Entries
  27. +1 −0 tests/CVS/Repository
  28. +1 −0 tests/CVS/Root
  29. +36 −0 tests/Input
  30. +24 −0 tests/Makefile
  31. +20 −0 tests/env.ml
  32. +38 −0 tests/longest.c
  33. +42 −0 tests/pcre_match.ml
  34. +8 −0 tests/pcre_scan.ml
  35. +4 −0 tests/perl_scan.pl
  36. +56 −0 tests/re_match.ml
  37. +10 −0 tests/re_scan.ml
  38. +150 −0 tests/scan.ml
  39. +95 −0 tests/test_emacs.ml
  40. +161 −0 tests/test_perl.ml
  41. +397 −0 tests/test_re.ml
  42. +164 −0 tests/test_str.ml
  43. +180 −0 tests/unison.ml
  44. +163 −0 tests/unison2.ml
  45. +204 −0 tests/unison3.ml
12 Changes
@@ -0,0 +1,12 @@
+- Improved API for accessing substring information.
+- The search can now be bounded to a given length.
+- The function "execp" returns a boolean indicating whether the match
+ was successful.
+- The "leol" assertion is fully implemented.
+- The "stop" assertion matches the end of the searched part of the
+ string.
+- "nest" operator: when matching against "nest e", only the group
+ contained in the last match of e will be considered as matching.
+- The semantics of nested matches in Posix regular expressions
+ now follows the standard.
+- Str-compatibility interface
14 INSTALL
@@ -0,0 +1,14 @@
+
+Requirements
+
+ The installation procedure defined in the Makefile requires findlib
+ (http://www.ocaml-programming.de/packages/documentation/findlib/).
+
+Installation
+
+- Compile with "make all".
+
+- If you have ocamlopt, do also "make opt".
+
+- Become super-user if necessary and do "make install"
+ (A "make uninstall" removes the library.)
504 LICENSE

Large diffs are not rendered by default.

Oops, something went wrong.
3 META
@@ -0,0 +1,3 @@
+version = "0.1"
+archive(byte) = "re.cma"
+archive(native) = "re.cmxa"
@@ -0,0 +1,68 @@
+
+NAME = re
+
+OCAMLC = ocamlfind ocamlc -g
+OCAMLOPT = ocamlfind ocamlopt -unsafe
+OCAMLDEP = ocamldep
+
+INCFLAGS =
+OBJECTS = cset.cmo automata.cmo \
+ re.cmo re_posix.cmo re_emacs.cmo re_perl.cmo re_glob.cmo re_str.cmo
+XOBJECTS = $(OBJECTS:cmo=cmx)
+INTFS = re.mli re_posix.mli re_emacs.mli re_perl.mli re_glob.mli re_str.mli
+
+ARCHIVE = $(NAME).cma
+XARCHIVE = $(NAME).cmxa
+
+REQUIRES =
+PREDICATES =
+
+all: $(ARCHIVE)
+opt: $(XARCHIVE)
+
+$(ARCHIVE): $(OBJECTS)
+ $(OCAMLC) -a -o $(ARCHIVE) -package "$(REQUIRES)" -linkpkg \
+ -predicates "$(PREDICATES)" $(OBJECTS)
+$(XARCHIVE): $(XOBJECTS)
+ $(OCAMLOPT) -a -o $(XARCHIVE) -package "$(REQUIRES)" -linkpkg \
+ -predicates "$(PREDICATES)" $(XOBJECTS)
+
+.SUFFIXES: .cmo .cmi .cmx .ml .mli
+
+.ml.cmo:
+ $(OCAMLC) -package "$(REQUIRES)" -predicates "$(PREDICATES)" \
+ $(INCFLAGS) -c $<
+.mli.cmi:
+ $(OCAMLC) -package "$(REQUIRES)" -predicates "$(PREDICATES)" \
+ $(INCFLAGS) -c $<
+.ml.cmx:
+ $(OCAMLOPT) -package "$(REQUIRES)" -predicates "$(PREDICATES)" \
+ $(INCFLAGS) -c $<
+
+depend: *.ml *.mli
+ $(OCAMLDEP) $(INCFLAGS) *.ml *.mli util/*.ml util/*.mli > depend
+include depend
+
+install: all
+ { test ! -f $(XARCHIVE) || extra="$(XARCHIVE) "`basename $(XARCHIVE) .cmxa`.a; }; \
+ ocamlfind install $(NAME) $(INTFS) $(INTFS:mli=cmi) $(ARCHIVE) META $$extra
+
+uninstall:
+ ocamlfind remove $(NAME)
+
+clean::
+ rm -f *.cmi *.cmo *.cmx *.cma *.cmxa *.a *.o
+ rm -f util/*.cmi util/*.cmo util/*.cmx util/*.o
+
+clean::
+ cd tests; make clean
+
+realclean: clean
+ rm -f *~ util/*~
+
+distrib: realclean
+ cd ..; tar zcvf re.tar.gz --exclude CVS re
+
+check: $(ARCHIVE)
+ fort $(ARCHIVE) -env tests/env.ml \
+ tests/test_re.ml tests/test_emacs.ml tests/test_perl.ml
70 README
@@ -0,0 +1,70 @@
+
+DESCRIPTION
+===========
+
+RE is a regular expression library for OCaml. It is still under
+developpement, but is already rather usable.
+
+CONTACT
+=======
+
+This library has been written by Jerome Vouillon (Jerome.Vouillon@inria.fr).
+It can be downloaded from http://libre.sourceforge.net
+
+Bug reports, suggestions and contributions are welcome.
+
+FEATURES
+========
+
+The following styles of regular expressions are supported:
+- Perl-style regular expressions (module Re_perl);
+- Posix extended regular expressions (module Re_posix);
+- Emacs-style regular expressions (module Re_emacs);
+- Shell-style file globbing (module Re_glob).
+
+It is also possible to build regular expressions by combining simpler
+regular expressions (module Re)
+
+The most notable missing features are back-references and
+look-ahead/look-behind assertions.
+
+PERFORMANCES
+============
+
+The matches are performed by lazily building a DFA (deterministic
+finite automata) from the regular expression. As a consequence,
+matching takes linear time in the length of the matched string.
+
+The compilation of patterns is slower than with libraries using
+back-tacking, such as PCRE. But, once a large enough part of the
+DFA is built, matching is extremely fast.
+
+Of course, for some combinations of regular expression and string, the
+part of the DFA that needs to be build is so large that this point is
+never reached, and matching will be slow. This is not expected to
+happen often in practice, and actually a lot of expressions that
+behaves badly with a backtracking implementation are very efficient
+with this implementation.
+
+The library is at the moment entirely written in OCaml. As a
+consequence, regular expression matching is much slower when the
+library is compiled to bytecode than when it is compiled to native
+code.
+
+Here are some timing results (Pentium III 500Mhz):
+* Scanning a 1Mb string containing only 'a's, except for the last
+ character which is a 'b', searching for the pattern "aa?b"
+ (repeated 100 times).
+ - RE: 2.6s
+ - PCRE: 68s
+* Regular expression example from http://www.bagley.org/~doug/shootout/.
+ - RE: 0.43s
+ - PCRE: 3.68s
+* The large regular expression (about 2000 characters long) that
+ Unison uses with my preference file to decide whether a file should
+ be ignored or not. This expression is matched against a filename
+ about 20000 times.
+ - RE: 0.31s
+ - PCRE: 3.7s
+ However, RE is only faster than PCRE when there are more than about
+ 300 filenames.
@@ -0,0 +1,39 @@
+
+High priority (before next release)
+=============
+* Improve the Perl regular expressions parser
+* Character classes (in the three regular expression parsers)
+
+* Reduce memory usage
+ - More compact representation of character sequences
+ - Special notation for "anything but this set of characters"
+ (more generally, optimize the compilation of regular expressions)
+* Simple optimisations
+ - alt containing alt
+ - epsilon elimination
+ - Seq (Seq (x,y), z) => Seq (x, Seq (y, z)) under some circumptances
+ (x or y has a fixed length)
+ ...
+
+* Test suite
+
+Medium priority
+===============
+* Implement back-references
+* Implement look-ahead and look-behind assertions
+
+Low priority
+============
+* Optimize the main loop for processor that are not register starved
+* Rewrite the main loops in C
+ (but keep the option to compile a pure OCaml version)
+* Limit the size of the cached DFAs by removing states that have not
+ been used recently
+* Documentation
+
+Other ideas
+===========
+* It would be great to have a more generic interface (parameterized
+ over some abstract tokens).
+* Str compatibility module
+ (should we implement string_partial_match?)
Oops, something went wrong.

0 comments on commit 594c507

Please sign in to comment.