pandas-predicates

A library for creating predicating expressions to filter pandas dataframes.

Status

This is a work in progress. Be mindful that:

there's no released version yet
the API is not stable
the documentation is sparse

Why

I often found myself creating really unwieldy code for creating filtering masks for pandas dataframes. Inspired by libraries like polars or pyspark, which use static expressions to build queries, I decided to create a simple API to create expressions that can be used to filter a dataframe in pandas.

Goals

For now the goal is to simply provide a simple API to create predicates to filter a pandas dataframe. Initially I don't intend to support generic expressions, but that may change in the future.

Stated goals:

follow as much as possible the pandas API
cover all methods that can be used to create a boolean mask
implement useful combinators to combine predicates

Progress tracking

We are implementing the methods in this order:

Syntax

For now the expression language have the following syntax:

    import pandas as pd
    from pandas_predicates import col

    df = pd.DataFrame([
        {"a": 1, "b": "blabla", "c": 3},
        {"a": 4, "b": "bleble", "c": 6},
        {"a": 7, "b": "marco", "c": 9},
    ])

    # simple predicates

    # select rows where column a is equal to 1
    a_is_1 = col("a") == 1
    assert a_is_1.filter(df) == df[df.a == 1]

    # select rows where column a is not equal to 1
    a_is_not_1 = col("a") != 1
    assert a_is_not_1.filter(df) == df[df.a != 1]

    # select rows where column a is greater than 4 and b contains the regex "(bl.)+"
    a_gt_1_and_b_matches = (col("a") > 4) & col("b").str.contains(r"(bl.)+")
    assert a_gt_1_and_b_matches.filter(df) == df[(df.a > 4) & df.b.str.contains(r"(bl.)+")]

    # select rows where column a is greater than 4 or b contains the regex "(bl.)+"
    a_gt_1_or_b_matches = (col("a") > 4) | col("b").str.contains(r"(bl.)+")
    assert a_gt_1_or_b_matches.filter(df) == df[(df.a > 4) | df.b.str.contains(r"(bl.)+")]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.circleci		.circleci
pandas_predicates		pandas_predicates
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pandas-predicates

Status

Why

Goals

Progress tracking

Syntax

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rcalsaverini/pandas-predicates

Folders and files

Latest commit

History

Repository files navigation

pandas-predicates

Status

Why

Goals

Progress tracking

Syntax

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages