GitHub - nlgranger/SeqTools: A python library to manipulate and transform indexable data (lists, arrays, ...)

SeqTools

SeqTools extends the functionalities of itertools to indexable (list-like) objects. Some of the provided functionalities include: element-wise function mapping, reordering, reindexing, concatenation, joining, slicing, minibatching, etc.

SeqTools functions implement on-demand evaluation under the hood: operations and transformations are only applied to individual items when they are actually accessed. A simple but powerful prefetch function is also provided to eagerly evaluate elements in background threads or processes.

SeqTools originally targets data science, more precisely the data preprocessing stages. Being aware of the experimental nature of this usage, on-demand execution is made as transparent as possible by providing fault-tolerant functions and insightful error message.

Example

>>> def count_lines(filename):
...     with open(filename) as f:
...         return len(f.readlines())
>>>
>>> def count_words(filename):
...     with open(filename) as f:
...         return len(f.read().split())
>>>
>>> filenames = ["a.txt", "b.txt", "c.txt", "d.txt"]
>>> lc = seqtools.smap(count_lines, filenames)
>>> wc = seqtools.smap(count_words, filenames)
>>> counts = seqtools.collate([lc, wc])
>>> # no computations so far!
>>> lc[2]  # only evaluates on index 2
3
>>> counts[1]  # same for index 1
(1, 2)

Batteries included!

The library comes with a set of functions to manipulate sequences:

concatenate
batch
gather
prefetch
interleave
uniter

and others (suggestions are also welcome).

Installation

pip install seqtools

Documentation

The documentation is hosted at https://seqtools-doc.readthedocs.io.

Contributing and Support

Use the issue tracker to request features, propose improvements or report issues. For questions regarding usage, please send an email.

Related libraries

Joblib, proposes low-level functions with many optimization settings to optimize pipelined transformations. This library notably provides advanced caching mechanisms which are not the primary concern of SeqTool. SeqTool uses a simpler container-oriented interface with multiple utility functions in order to assist fast prototyping. On-demand evaluation is its default behaviour and applies at all layers of a transformation pipeline. Eager evaluation of elements in SeqTools does not break the list-like interface and can be used in the middle of a transformation pipeline.

SeqTools is conceived to connect nicely to the data loading pipeline of Machine Learning libraries such as PyTorch's torch.utils.data and torchvision.transforms or Tensorflow's tf.data. The interface of these libraries focuses on iterators to access transformed elements, contrary to SeqTools which also provides arbitrary reads via indexing.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
.github/workflows		.github/workflows
docs		docs
seqtools		seqtools
tests		tests
.readthedocs.yml		.readthedocs.yml
CHANGELOG.txt		CHANGELOG.txt
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeqTools

Example

Batteries included!

Installation

Documentation

Contributing and Support

Related libraries

About

Releases 8

Packages

Languages

License

nlgranger/SeqTools

Folders and files

Latest commit

History

Repository files navigation

SeqTools

Example

Batteries included!

Installation

Documentation

Contributing and Support

Related libraries

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages