New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPython.parallel issue with pushing pandas TimeSeries #2793
Comments
Pinging @wesm on this one. There are really two issues here:
So the direct question to @wesm: Can you detail serialization of pandas objects? What, in addition to the arrays themselves, is necessary to reconstruct a pandas object? What pandas objects actually subclass ndarray (it is an |
I don't think this is a problem with pandas, I think it is more general than that because any class that subclasses numpy.ndarray will break under the current implementation. Therefore, I think it makes sense for the IPython.parallel code to only use the efficient numpy path if the object is
rather than
Making DirectView.push efficient for pandas.TimeSeries objects is a nice-to-have, but I would have thought that it is ultimately less important than having DirectView.push work as expected. As an interim workaround solution the pandas.TimeSeries object can be wrapped in a pandas.DataFrame to stop the efficient numpy path from being used. Many thanks |
An excellent point. I will change the check from I would still love to hear from @wesm about efficient serialization of pandas objects. |
typecheck change is in #2800 |
Sorry it's taken me a while to have a look. This is probably another case of "Series probably shouldn't be an ndarray (subclass)". I would recommend using pickle whenever the class is not exactly |
@wesm - yes, that's the right answer in general, and what is done in #2800. But I would still like to give pandas the special treatment we do for numpy. So any time you can write up (or code up, if necessary) a representation of pandas data structures that is buffers + metadata, so that we can add it to our zero-copy stuff, that would be great. |
use `type(obj) is cls` as switch when canning `isinstance(obj, cos)` would trigger the canning shortcuts for subclasses, which can be inappropriate (e.g. pandas.TimeSeries). closes #2793
`isinstance(obj, cos)` would trigger the canning shortcuts for subclasses, which can be inappropriate (e.g. pandas.TimeSeries). closes ipython#2793
`isinstance(obj, cos)` would trigger the canning shortcuts for subclasses, which can be inappropriate (e.g. pandas.TimeSeries). closes ipython#2793
use `type(obj) is cls` as switch when canning `isinstance(obj, cos)` would trigger the canning shortcuts for subclasses, which can be inappropriate (e.g. pandas.TimeSeries). closes ipython#2793
A pandas.TimeSeries is cast to a numpy.ndarray when pushed using IPython.parallel.
Here is an example:
This seems to be a result of dview.push handling numpy arrays differently: http://ipython.org/ipython-doc/dev/parallel/parallel_details.html
and pandas.TimeSeries using numpy arrays:
Is there a way to cast back to pandas.TimeSeries on the ipcluster engines?
Many thanks.
The text was updated successfully, but these errors were encountered: