Pypelines-ETL

Simple library to make pipelines or ETL

Installation

$ pip install pypelines-etl

Usage

pypelines allows you to build ETL pipeline. For that, you simply need the combination of an Extractor, some Transformer or Filter, and a Loader.

Extractor

Making an extractor is fairly easy. Simply decorate a function that return the data with Extractor:

import pandas
from pypelines import Extractor

@Extractor
def read_iris_dataset(filepath: str) -> pandas.Dataframe:
    return pandas.read_csv(filepath)

Transformer or Filter

The Transformer and Filter decorators are equivalent.

Making a Transformer or a Filter is even more easy:

import pandas
from pypelines import Filter, Transformer

@Filter
def keep_setosa(df: pandas.DataFrame) -> pandas.DataFrame:
    return df[df['class'] == 'Iris-setosa']


@Filter
def keep_petal_length(df: pandas.DataFrame) -> pandas.Series:
    return df['petallength']


@Transformer
def mean(series: pandas.Series) -> float:
    return series.mean()

Note that it is possible to combine the Transformer and the Filter to shorten the pipeline syntax. For example:

new_transformer = keep_setosa | keep_petal_length | mean
pipeline = read_iris_dataset('filepath.csv') | new_transformer
print(pipeline.value)
# 1.464

Loader

In order to build a Loader, it suffices to decorate a function that takes at least one data parameter.

import json
from pypelines import Loader

@Loader
def write_to_json(output_filepath: str, data: float) -> None:
    with open(output_filepath, 'w') as file:
        json.dump({'mean-petal-lenght': {'value': data, 'units': 'cm'}}, file)

A Loader can be called without the data parameter, which loads arguments (like an URL or a path). For example, calling write_to_json(output.json) will not execute the function, but store the output_filepath argument until the Loader execution in a pipeline. The standard execution of the function (with the data argument) is however still usable write_to_json(output.json, data=1.464).

ETL pipeline

To make and run the pipeline, simply combine the Extractor with the Transformer, the Filter and the Loader

read_iris_dataset('filepath.csv') | keep_setosa | keep_petal_length | mean | write_to_json('output.json')

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
pypelines		pypelines
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pypelines

pypelines

tests

tests

.gitignore

.gitignore

README.md

README.md

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

Pypelines-ETL

Installation

Usage

Extractor

Transformer or Filter

Loader

ETL pipeline

About

Releases

Packages

Languages

gacou54/pypelines-etl

Folders and files

Latest commit

History

Repository files navigation

Pypelines-ETL

Installation

Usage

Extractor

Transformer or Filter

Loader

ETL pipeline

About

Resources

Stars

Watchers

Forks

Languages