Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
ENH: parallel support in .apply #13111
So here is an example of how to do a parallel apply using dask. This could be baked into
impl and timings:
Now for some caveats.
People want to parallelize a poor implementation. Generally you proceed thru the following steps first:
You always want to make code simpler, not more complex. Its hard to know a-priori where bottlenecks are. People think
Ok my 2c about optimizing things.
In order for parallelization to actually matter the function you are computing should take some non-trivial amount of time to things like:
If these criteria are met, then sure give it a try.
I think providing pandas a first class way to parallelize things, even tough people will just naively use it is probably not a bad thing.
Further extensions to this are:
I too worry about premature parallelization, but I'm not sure how you prevent that. I think I'd rather try and find ways to encourage Numba compilation (and if there are missing Numba features preventing that from being effective, address them). At least then you could engage