Skip to content

Latest commit

 

History

History
116 lines (93 loc) · 2.76 KB

splitters.rst

File metadata and controls

116 lines (93 loc) · 2.76 KB

linesep

Splitter Classes

linesep provides a set of classes (called splitters) for splitting strings in chunks, inspired by the ~codecs.IncrementalEncoder and ~codecs.IncrementalDecoder classes of the codecs module. Input is fed to a splitter instance one piece at a time, and the segments split from the input so far are (depending on the methods used) either returned immediately or else retrieveable from the splitter afterwards. This is useful when you have a data source that is neither a string nor a filehandle.

If the input is in the form of an iterable, a splitter can be used to iterate over it and yield each segment:

>>> import linesep >>> splitter = linesep.SeparatedSplitter("twofour' 'two' '' 'four' '' '' '|' 'six'

Alternatively, input can be provided to the splitter one piece at a time by passing it to the ~Splitter.split() method, which returns all newly-split off items:

>>> splitter = linesep.TerminatedSplitter("0", retain=False) >>> splitter.split("foo0bar0baz") ['foo', 'bar'] >>> splitter.split("0quux0gnusto0", final=True) ['baz', 'quux', 'gnusto']

At a lower level, input can be provided to the ~Splitter.feed() method, and the output can be retrieved with ~Splitter.get() or `~Splitter.getall()`:

>>> splitter = linesep.UniversalNewlineSplitter(retain=True, translate=True) >>> splitter.feed("foonbarrnbaz") >>> splitter.nonempty True >>> splitter.get() 'foon' >>> splitter.nonempty True >>> splitter.get() 'barn' >>> splitter.nonempty False >>> splitter.get() Traceback (most recent call last): ... SplitterEmptyError: No items available in splitter >>> splitter.close() >>> splitter.nonempty True >>> splitter.get() 'baz' >>> splitter.nonempty False

Like the *_preceded, *_separated, and *_terminated functions, strings passed to splitters may be either binary or text. However, the input to a single instance of a splitter must be either all binary or all text, and the output type will match.

Splitters

Splitter

ParagraphSplitter

PrecededSplitter

SeparatedSplitter

TerminatedSplitter

UnicodeNewlineSplitter

UniversalNewlineSplitter

Utilities

get_newline_splitter

SplitterState()

SplitterClosedError

SplitterEmptyError