linesep
linesep
provides a set of classes (called splitters
) for splitting strings in chunks, inspired by the ~codecs.IncrementalEncoder and ~codecs.IncrementalDecoder classes of the codecs module. Input is fed to a splitter instance one piece at a time, and the segments split from the input so far are (depending on the methods used) either returned immediately or else retrieveable from the splitter afterwards. This is useful when you have a data source that is neither a string nor a filehandle.
If the input is in the form of an iterable, a splitter can be used to iterate over it and yield each segment:
>>> import linesep >>> splitter = linesep.SeparatedSplitter("twofour' 'two' '' 'four' '' '' '|' 'six'
Alternatively, input can be provided to the splitter one piece at a time by passing it to the ~Splitter.split() method, which returns all newly-split off items:
>>> splitter = linesep.TerminatedSplitter("0", retain=False) >>> splitter.split("foo0bar0baz") ['foo', 'bar'] >>> splitter.split("0quux0gnusto0", final=True) ['baz', 'quux', 'gnusto']
At a lower level, input can be provided to the ~Splitter.feed() method, and the output can be retrieved with ~Splitter.get() or `~Splitter.getall()`:
>>> splitter = linesep.UniversalNewlineSplitter(retain=True, translate=True) >>> splitter.feed("foonbarrnbaz") >>> splitter.nonempty True >>> splitter.get() 'foon' >>> splitter.nonempty True >>> splitter.get() 'barn' >>> splitter.nonempty False >>> splitter.get() Traceback (most recent call last): ... SplitterEmptyError: No items available in splitter >>> splitter.close() >>> splitter.nonempty True >>> splitter.get() 'baz' >>> splitter.nonempty False
Like the *_preceded
, *_separated
, and *_terminated
functions, strings passed to splitters may be either binary or text. However, the input to a single instance of a splitter must be either all binary or all text, and the output type will match.
Splitter
ParagraphSplitter
PrecededSplitter
SeparatedSplitter
TerminatedSplitter
UnicodeNewlineSplitter
UniversalNewlineSplitter
get_newline_splitter
SplitterState()
SplitterClosedError
SplitterEmptyError