pipelib

pipel is a Python library for parallelizing map-like functions inspired by MapReduce, but very simple. Full disclaimer: I have published this small package because personally I found it helpful, but do not expect any guarantees or maintenance. I am sure that there are tons of libraries better than pipel, but this approach worked for me.

Install

pipel supports Python >= 3.6.

pip install pipel

Use case

Parallelize map-like functions that need to be applied to generators (instead of lists), since data do not fit in memory. In addition, using callable objects instead of functions is allowed, even if the objects are not pickable. Process-safe logging is provided thanks to the multiprocessing-logging library.

Usage

The code is documented and there is one example in the examples/ directory. For a real-world usage, see https://github.com/TeMU-BSC/CorpusCleaner.

from pipel import Pipeline
pipeline = Pipeline(streamers=streamers,  # List of generators
                    mappers_factory=mappers_factory,  # Function that returns a list of functions/callables that will be
                                                      # consecutively applied to each element in each generator
                    output_reducer=output_reducer,  # Function that receives batches of the objects processed by the
                                                    # mappers; typically, it will write them in disk.
                    batch_size=batch_size,  # Number of objects yielded from the generators that will be simultaneously
                                            # instantiated in memory
                    parallel=True,
                    logger=logger,
                    log_every_iter=1)
pipeline.run()

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
pipel		pipel
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pipelib

Install

Use case

Usage

About

Releases

Packages

Contributors 2

Languages

License

jordiae/pipel

Folders and files

Latest commit

History

Repository files navigation

pipelib

Install

Use case

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages