Note repo is moved

google · Jun 12, 2022 · 44f8d5d · 44f8d5d
1 parent 3af063a
commit 44f8d5d
Showing 1 changed file with 1 addition and 234 deletions.
diff --git a/README.rst b/README.rst
@@ -2,238 +2,5 @@
 sre_yield
 =========
 
-Quick Start
-===========
 
-The goal of ``sre_yield`` is to efficiently generate all values that can match a
-given regular expression, or count possible matches efficiently.  It uses the
-parsed regular expression, so you get a much more accurate result than trying
-to just split strings.
-
-.. code-block:: pycon
-
-    >>> s = 'foo|ba[rz]'
-    >>> s.split('|')  # bad
-    ['foo', 'ba[rz]']
-
-    >>> import sre_yield
-    >>> list(sre_yield.AllStrings(s))  # better
-    ['foo', 'bar', 'baz']
-
-It does this by walking the tree as constructed by ``sre_parse`` (same thing
-used internally by the ``re`` module), and constructing chained/repeating
-iterators as appropriate.  There may be duplicate results, depending on your
-input string though -- these are cases that ``sre_parse`` did not optimize.
-
-.. code-block:: pycon
-
-    >>> import sre_yield
-    >>> list(sre_yield.AllStrings('.|a', charset='ab'))
-    ['a', 'b', 'a']
-
-...and happens in simpler cases too:
-
-.. code-block:: pycon
-
-    >>> list(sre_yield.AllStrings('a|a'))
-    ['a', 'a']
-
-
-Quirks
-======
-
-The membership check, ``'abc' in values_obj`` is by necessity fullmatch -- it
-must cover the entire string.  Imagine that it has ``^(...)$`` around it.
-Because ``re.search`` can match anywhere in an arbitrarily string, emulating
-this would produce a large number of junk matches -- probably not what you
-want.  (If that is what you want, add a ``.*`` on either side.)
-
-Here's a quick example, using the presidents regex from http://xkcd.com/1313/
-
-.. code-block:: pycon
-
-    >>> s = 'bu|[rn]t|[coy]e|[mtg]a|j|iso|n[hl]|[ae]d|lev|sh|[lnd]i|[po]o|ls'
-
-    >>> import re
-    >>> re.search(s, 'kennedy') is not None  # note .search
-    True
-    >>> v = sre_yield.AllStrings(s)
-    >>> v.__len__()
-    23
-    >>> 'bu' in v
-    True
-    >>> v[:5]
-    ['bu', 'rt', 'nt', 'ce', 'oe']
-
-If you do want to emulate search, you end up with a large number of matches
-quickly.  Limiting the repetition a bit helps, but it's still a very large
-number.
-
-.. code-block:: pycon
-
-    >>> v2 = sre_yield.AllStrings('.{,30}(' + s + ').{,30}')
-    >>> el = v2.__len__()  # too big for int
-    >>> print(str(el).rstrip('L'))
-    57220492262913872576843611006974799576789176661653180757625052079917448874638816841926032487457234703154759402702651149752815320219511292208238103
-    >>> 'kennedy' in v2
-    True
-
-
-Capturing Groups
-================
-
-If you're interested in extracting what would match during generation of a
-value, you can use AllMatches instead to get Match objects.
-
-.. code-block:: pycon
-
-    >>> v = sre_yield.AllMatches(r'a(\d)b')
-    >>> m = v[0]
-    >>> m.group(0)
-    'a0b'
-    >>> m.group(1)
-    '0'
-
-This even works for simplistic backreferences, in this case to have matching quotes.
-
-.. code-block:: pycon
-
-    >>> v = sre_yield.AllMatches(r'(["\'])([01]{3})\1')
-    >>> m = v[0]
-    >>> m.group(0)
-    '"000"'
-    >>> m.groups()
-    ('"', '000')
-    >>> m.group(1)
-    '"'
-    >>> m.group(2)
-    '000'
-
-
-Anchors and Lookaround
-======================
-
-Some very simple anchors are supported out of the box, to enable parsing
-patterns where the anchors are actually redundant with it being fullmatch, such
-as ``^foo$``.  More complex anchoring should raise an exception at parse time,
-as will any use of lookaround.  (``\b`` is supported at beginning/end despite
-this being not quite correct.)
-
-.. code-block:: pycon
-
-    >>> list(sre_yield.AllStrings('foo$'))
-    ['foo']
-    >>> list(sre_yield.AllStrings('^$'))
-    ['']
-    >>> list(sre_yield.AllStrings('.\\b.'))  # doctest: +IGNORE_EXCEPTION_DETAIL
-    Traceback (most recent call last):
-    ...
-    ParseError: Non-end-anchor None found at END state
-
-Supporting these in a limited way is possible, but I haven't found the time in
-6+ years to implement it, because I don't have a need.  Instead if you need
-these and don't mind (potentially many) extra matches, provide ``relaxed=True`` to
-pretend these don't exist, at which point the values returned by ``sre_yield``
-will be a superset of the true matches, and you can postprocess them yourself.
-
-.. code-block:: pycon
-
-    >>> pattern = '.\\b.'
-    >>> values = list(sre_yield.AllStrings(pattern, charset='a-', relaxed=True))
-    >>> values
-    ['aa', '-a', 'a-', '--']
-    >>> [v for v in values if re.fullmatch(pattern, v)]
-    ['-a', 'a-']
-
-
-Reporting Bugs, etc.
-====================
-
-We welcome bug reports -- see our issue tracker on `GitHub
-<https://github.com/google/sre_yield/issues>`_ to see if it's been reported before.
-If you'd like to discuss anything, we have a `Google Group
-<https://groups.google.com/group/sre_yield>`_ as well.
-
-
-Related Modules
-===============
-
-We're aware of three similar modules, but each has a different goal.
-
-
-xeger
------
-
-Xeger was originally written `in Java <https://code.google.com/p/xeger/>`_ and
-ported `to Python <https://bitbucket.org/leapfrogdevelopment/rstr>`_.  This
-generates random entries, which may suffice if you want to get just a few
-matching values.  This module and ``xeger`` differ statistically in the way
-they handle repetitions:
-
-.. code-block:: pycon
-
-    >>> import random
-    >>> v = sre_yield.AllStrings('[abc]{1,4}')
-    >>> len(v)
-    120
-
-    # Now random.choice(v) has a 3/120 chance of choosing a single letter.
-    >>> len([x for x in v if len(x) == 1])
-    3
-
-    # xeger(v) has ~25% chance of choosing a single letter, because the length
-    and match are chosen independently.
-    # (This doesn't run, so the doctests don't require xeger)
-    > from rstr import xeger
-    > sum([1 if len(xeger('[abc]{1,4}')) == 1 else 0 for _ in range(120)])
-    26
-
-In addition, ``xeger`` differs in the default matching of ``'.'`` is for
-printable characters (which you can get by setting ``charset=string.printable``
-in ``sre_yield``, if desired).
-
-
-sre_dump
---------
-
-Another module that walks ``sre_parse``'s tree is ``sre_dump``, although it
-does not generate matches, only reconstructs the string pattern (useful
-primarily if you hand-generate a tree).  If you're interested in the space,
-it's a good read.  http://www.dalkescientific.com/Python/sre_dump.html
-
-
-jpetkau1
---------
-
-Can find matches by using randomization, so sort of handles anchors.  Not
-guaranteed though, but another good look at internals.
-http://web.archive.org/web/20071024164712/http://www.uselesspython.com/jpetkau1.py
-(and slightly older version in the announcement on `python-list
-<https://mail.python.org/pipermail/python-list/2001-August/104757.html>`_).
-
-
-Differences between sre_yield and the re module
-===============================================
-
-There are certainly valid regular expressions which ``sre_yield`` does not
-handle.  These include things like lookarounds, backreferences, but also a few
-other exceptions:
-
-- The maximum value for repeats is system-dependant -- CPython's ``sre`` module
-  there's a special value which is treated as infinite (either 2**16-1 or
-  2**32-1 depending on build).  In sre_yield, this is taken as a literal,
-  rather than infinite, thus (on a 2**16-1 platform):
-
-  .. code-block:: pycon
-
-      >>> len(sre_yield.AllStrings('a*')[-1])
-      65535
-      >>> import re
-      >>> len(re.match('.*', 'a' * 100000).group(0))
-      100000
-
-- The ``re`` `module docs <http://docs.python.org/2/library/re.html#regular-expression-syntax>`_
-  say "Regular expression pattern strings may not contain null bytes"
-  yet this appears to work fine.
-- Order does not depend on greediness.
-- The regex is treated as fullmatch.
+ARCHIVED: See https://github.com/sre-yield/sre-yield for continuing development.