-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example request #59
Comments
You can use custom functions for this. I've tried it and it works well.
Pass the parameters into the function. The parameters will be passed in as
vectors. Debug your custom function until it takes/returns vectors of
inputs as well as scalars. If you run into problems finding the right API
calls to operate on vectors, just loop over the inputs.
Remember to annotate your function with @make_symbolic, see section in
dfply readme "Extending dfply with custom functions".
If you run into speed problems with huge multi million row dataframes, you
can rewrite it to use straight Pandas with Numba for JIT compliling.
I would provide a code sample, but I am not at my PC right now. Good luck!
|
Numba will take a function designed for scalars, and after an annotation,
produce a function that will take vectors. It will JIT compile the function
in the background to C using LLVM, so it will run at the same speed as C
code.
To use this with dfply, you will have to have two functions: the outer one
is annotated with @make_symbolic, this calls the inner one which has been
vectorised with Numba.
To use this with Pandas, you can call the function directly so it is somewhat faster. However, dfply is so much cleaner and easier to code with that its not worth doing this unless profiling shows a speed bottleneck.
|
thanks for this, I saw that this would work - but it is somewhat cumbersome in my mind compared to native dplyr in R. Essentially with this, much of the nicety and ease of flow of writing dplyr goes away. Would there be a possibility of an explicit "Intention" based object that would do exactly this, i.e. a wrapper EF i.e. for element_function), that wraps a function that takes elements of a vector as arguments, automatically iterates over the series element? It is just that any requirement to have "decorates" breaks the natural piping flow of dplyr, and having to prepare it all beforehand breaks the reading-flow of the code. |
In my experience, 90% of dplyr does not need custom functions. If custom
functions are required, then they are quite fast to use - just copy, paste
and modify another working function you already have.
In my opinion, dplyr is just as clean and usable as plyr.
…On Thu, 19 Jul 2018 13:29 hhoeflin, ***@***.***> wrote:
thanks for this, I saw that this would work - but it is incredibly
cumbersome. Essentially with this, much of the nicety and ease of flow of
writing dplyr goes away.
Would there be a possibility of an explicit "Intention" based object that
would do exactly this, i.e. a wrapper EF i.e. for element_function), that
wraps a function that takes elements of a vector as arguments,
automatically iterates over the series element?
It is just that any requirement to have "decorates" breaks the natural
piping flow of dplyr, and having to prepare it all beforehand breaks the
reading-flow of the code.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#59 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABOypBECimk6e1V9vue5zpvEyO78Dbxxks5uIHuOgaJpZM4VWCzD>
.
|
Hi,
I am an avid dplyr user in R and somewhat new to python. I have been looking for a dplyr-like package in python for a while when I came across dfply which looks pretty close to what I was looking for.
Please excuse if this is not quite the right forum, but I was looking for some help/request some documentation/request a feature.
My use case essentially is that I have a function that operates on single elements of a data-frame columns, e.g.
my_func(a,b)
where both a and b are single elements from columns of a data frame. I have found a stackoverflow-post that shows this for an operation on a single column only.
https://stackoverflow.com/questions/42671168/dfply-mutating-string-column-typeerror
The solution show here of using X.file.apply for the column X.file in the data-frame seems to only work when you only have a single column to operate on.
What i was essentially wondering is - how do you recommend to best use dfply in this context? Could you add some documentation on how best to use functions that don't natively understand Series objects?
E.g. could there be an "Intention" like object that takes a function that operators on several parameters, each of which is intended to be a single element from a column, "vectorizes" this function and then when passed an intention object representing a "Series", applies this appropriately?
Thanks for your help!
The text was updated successfully, but these errors were encountered: