Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(python): More efficient handling of *args/**kwargs #7026

Merged
merged 1 commit into from Feb 20, 2023

Conversation

stinodego
Copy link
Member

Inspired by #7019

Applying the same logic to similar methods.

I will create a centralized parsing module for this, but for now I just added it everywhere as I think it's a good improvement.

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars labels Feb 19, 2023
@ritchie46
Copy link
Member

I will create a centralized parsing module for this

Wouldn't this add another function call?

@stinodego
Copy link
Member Author

stinodego commented Feb 19, 2023

I will create a centralized parsing module for this

Wouldn't this add another function call?

Well, depends on how we approach it.

I was thinking of moving selection_to_pyexpr_list and expr_to_lit_or_expr to a 'parsing' module, and include other things there as well like _datetime_to_pl_timestamp which currently lives in the utils module.

When we have it centralized like that, we can make things more verbose to increase performance. For example, we could unpack expr_to_lit_or_expr into selection_to_pyexpr_list to save a whole bunch of function calls.

Because it's a dedicated module, that doesn't harm readability too much as you expect extensive parsing logic there. And it could make all methods that use that parsing logic cleaner and possibly more efficient.

I have to admit that up until now I haven't thought too much about this type of optimization, as I believe the power of Polars isn't in the few nanoseconds we can save in dispatching the Python call to Rust, but in the actual computation that happens there. But it's good to keep this in mind more.

@jakob-keller
Copy link
Contributor

I have been experimenting with applying functools.singledispatch to optimize selection_to_pyexpr_list. Results are promising, but there are circular import issues beyond my grasp of the codebase.

Maybe this could be combined somehow?

@functools.singledispatch
def selection_to_pyexpr_list(
    exprs: IntoExpr | Iterable[IntoExpr],
    structify: bool = False,
) -> list[PyExpr]:
    if isinstance(exprs, Iterable):
        return _iterable_selection_to_pyexpr_list(exprs, structify=structify)

    return _single_item_selection_to_pyexpr_list(exprs, structify=structify)


@selection_to_pyexpr_list.register(type(None))
def _none_selection_to_pyexpr_list(
    exprs,
    structify: bool = False,
) -> list[PyExpr]:
    return []


@selection_to_pyexpr_list.register(str)
@selection_to_pyexpr_list.register(Expr)
@selection_to_pyexpr_list.register(pli.Series)  # fails due to circular import
@selection_to_pyexpr_list.register(pli.WhenThen)  # fails due to circular import
@selection_to_pyexpr_list.register(pli.WhenThenThen)  # fails due to circular import
def _single_item_selection_to_pyexpr_list(
    exprs,
    structify: bool = False,
) -> list[PyExpr]:
    return [expr_to_lit_or_expr(exprs, str_to_lit=False, structify=structify)._pyexpr]


@selection_to_pyexpr_list.register(dict)
@selection_to_pyexpr_list.register(list)
@selection_to_pyexpr_list.register(set)
@selection_to_pyexpr_list.register(tuple)
def _iterable_selection_to_pyexpr_list(
    exprs,
    structify: bool = False,
) -> list[PyExpr]:
    return [
        expr_to_lit_or_expr(e, str_to_lit=False, structify=structify)._pyexpr
        for e in exprs  # type: ignore[union-attr]
    ]

@ritchie46 ritchie46 merged commit 7cdb328 into pola-rs:master Feb 20, 2023
@stinodego
Copy link
Member Author

@jakob-keller Thanks for the suggestion of using @functools.singledispatch... I'll put this on my to-do list to look at it later.

@jakob-keller
Copy link
Contributor

@jakob-keller Thanks for the suggestion of using @functools.singledispatch... I'll put this on my to-do list to look at it later.

Check out #7044 for progress on applying single dispatch to sequence_to_pydf

josemasar pushed a commit to josemasar/polars that referenced this pull request Feb 21, 2023
@stinodego stinodego deleted the perf-args-kwargs branch February 22, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants