Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutate function #1226

Open
samukweku opened this issue Dec 18, 2022 · 2 comments
Open

mutate function #1226

samukweku opened this issue Dec 18, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@samukweku
Copy link
Collaborator

samukweku commented Dec 18, 2022

Brief Description

I would like to propose a mutate function, similar to pandas' assign function, but more flexible - it will also serve as replacement for the transform functions

Example API

df.mutate(y='sum',n=lambda df: df.nth(1))
df.mutate(y='sum',n=lambda df: df.nth(1), by='x')


# replicate dplyr's across
# https://stackoverflow.com/q/63200530/7175713
# select_columns syntax can fit in nicely here
mtcars.mutate(("*t", "mean"), ("*p", "sum"), {"cyl": lambda df: df + 1, "new_col": lambda df: df.select_columns("*t").sum(axis=1))
@samukweku
Copy link
Collaborator Author

  • for multiple columns, we use a tuple of three args (cols, func, names) - where cols is the cols we wish to select, func is a function or list/tuple of functions, while col names is how the col will be renamed, either flattened for a MultiIndex, or prefix/suffix added. For the names, we'll use an f-string format sort of. - the idea is borrowed from R's dplyr's across function.
  • for single columns, we can pass that as a dictionary, or use pandas named agg for more control of the output renaming

@samukweku samukweku mentioned this issue Jan 9, 2023
3 tasks
@thatlittleboy thatlittleboy added the enhancement New feature or request label Jan 22, 2023
@samukweku
Copy link
Collaborator Author

so far, tests i've conducted show that building a dictionary and passing it to pandas transform/apply/agg/assign deliver faster performance compared to what I came up with for mutate. maybe someone else comes up with a cleaner API, which is fast as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants