Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GT.pipe() and GT.pipes() #363

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

jrycw
Copy link
Contributor

@jrycw jrycw commented May 24, 2024

Related to #353.

After reviewing the pipe() methods from Pandas and Polars, I propose adding GT.pipe() and GT.pipes() in this PR.

GT.pipe()

GT.pipe() will function similarly to Pandas and Polars, receiving a function along with optional positional and keyword arguments. An example is provided below:

import polars as pl
from great_tables import GT, loc, style
from great_tables.data import towny


towny_mini = pl.from_pandas(towny).head(10)
columns = ["land_area_km2", "density_2021"]
colors = ["lightgray", "lightblue"]


def tbl_style(gtbl: GT, columns: list[str], colors: list[str]) -> GT:
    for column, color in zip(columns, colors):
        gtbl = gtbl.tab_style(
            style=style.fill(color=color),
            locations=loc.body(columns=column, rows=pl.col(column).eq(pl.col(column).max())),
        )
    return gtbl


(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    ).pipe(tbl_style, columns, colors)
)

GT.pipes()

GT.pipe() serves as a convenient helper method for chaining multiple pre-built functions (or using partial). An example is provided below:

from functools import partial
import polars as pl
from great_tables import GT, loc, style
from great_tables.data import towny


towny_mini = pl.from_pandas(towny).head(10)
columns = ["land_area_km2", "density_2021"]
colors = ["lightgray", "lightblue"]


def tbl_style(gtbl: GT, column: str, color: str) -> GT:
    return gtbl.tab_style(
        style=style.fill(color=color),
        locations=loc.body(columns=column, rows=pl.col(column).eq(pl.col(column).max())),
    )


(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    ).pipes(
        *[partial(tbl_style, column=column, color=color) for column, color in zip(columns, colors)]
    )
)

Table

Both methods yield the table as follows:
image

Assistance Required

I hope the API design makes sense, and could the team please review my docs to ensure they render correctly? I'm relatively new to quarto.

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2024

Codecov Report

Attention: Patch coverage is 87.50000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 86.07%. Comparing base (e3649d2) to head (7e4a167).
Report is 12 commits behind head on main.

Current head 7e4a167 differs from pull request most recent head 3d7152d

Please upload reports for the commit 3d7152d to get more accurate results.

Files Patch % Lines
great_tables/_pipe.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #363      +/-   ##
==========================================
+ Coverage   85.90%   86.07%   +0.17%     
==========================================
  Files          41       42       +1     
  Lines        4328     4331       +3     
==========================================
+ Hits         3718     3728      +10     
+ Misses        610      603       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jrycw
Copy link
Contributor Author

jrycw commented May 24, 2024

I'm not sure if it's a good idea to support passing a list into pipes(). If we add two lines to pipes(), like:

def pipes(self: "GT", *funcs: Callable["GT", "GT"] | list[Callable["GT", "GT"]]) -> "GT":
    if isinstance(funcs[0], list) and len(funcs) == 1:  # edited
        funcs = funcs[0]
    for func in funcs:
        self = pipe(self, func)
    return self

Then, we can do:

(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    ).pipes(
        [partial(tbl_style, column=column, color=color) for column, color in zip(columns, colors)]
    )
)

@machow and @rich-iannone , could you share your thoughts on this idea with me?

@jrycw
Copy link
Contributor Author

jrycw commented May 25, 2024

I've updated .pipes() to accept a list of functions in the latest commit. We can decide which commit to use later.

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the wait! Thanks for taking the time to work through this, and look at what pandas and polars do. I'm noticing pandas and polars have only the one .pipe() method. Maybe we start with this, and then if people are asking for it, consider a .pipes() method?

Packages like toolz have a function---called compose()---that makes it easy to send multiple things to a single .pipe() method. For example...

from toolz.functoolz import compose, compose_left

from functools import partial
import polars as pl
from great_tables import GT, loc, style
from great_tables.data import towny


towny_mini = pl.from_pandas(towny).head(10)
columns = ["land_area_km2", "density_2021"]
colors = ["lightgray", "lightblue"]


def tbl_style(gtbl: GT, column: str, color: str) -> GT:
    return gtbl.tab_style(
        style=style.fill(color=color),
        locations=loc.body(columns=column, rows=pl.col(column).eq(pl.col(column).max())),
    )


(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    ).pipe(
        compose_left(*[partial(tbl_style, column=column, color=color) for column, color in zip(columns, colors)])
    )
)

@jrycw
Copy link
Contributor Author

jrycw commented May 29, 2024

@machow thanks for introducing me to toolz. It's new to me, and it looks like it has a lot of great features. Sure, let's remove the .pipes() method. I've reviewed the preview docs and made some adjustments. Other than that, it looks good to me. What do you think?

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking really good, I left a quick thought on how we might be able to annotate pipe using ParamSpec. But overall, it seems basically ready to go! @rich-iannone you'll probably know the right section for .pipe in the reference

from .gt import GT


def pipe(self: "GT", func: Callable[..., "GT"], *args: Any, **kwargs: Any) -> "GT":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use ParamSpec here? Something like...

from typing import ParamSpec, Callable

P = ParamSpec("P")
def punt(f: Callable[P, int], *args: P.args, **kwargs: P.kwargs) -> int:
    return f(*args, **kwargs)


def add(x: int, y: int) -> int:
    return x + y

# flagged in static check as z is not an arg to add
punt(add, 1, y = 2, z = 3)

https://mypy-play.net/?mypy=latest&python=3.12&gist=23dd2d435071c6e9f5639cfcadf7dd16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! This is somewhat new to me, so please feel free to correct me if I've misunderstood anything.

@@ -174,6 +174,13 @@ quartodoc:
contents:
- GT.save
- GT.as_raw_html
- title: Pipeline
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about putting pipe in the helper functions section. @rich-iannone WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants