savez method for DataFrame, Series: porting data between python2 and python3 #3151

cottrell · 2013-03-23T20:11:18Z

I find that I am sometimes working between python2 and 3 installs and using pickle for passing data is problematic. I am having a look at adding some simple functions like:

pandas.loadnpz
pandas.obj.savenpz (where obj would be DataFrame, Series, Panel etc ...)

Any opinions on this?
Is it already there and I haven't found it?
Is there a natural (efficient) place to do this? It seems the save/load are fairly generic and are attached to all PandasObjects. Maybe at the level of NDFrame would make most sense?

Also, supposing there is interest (or at least not objection) to this, is there any way to add a test for this under the current framework since the full functionality would involve the space of both python2.* and python3.* pandas.

ghost · 2013-03-23T20:15:36Z

pandas uses 2to3, and tests are cross-python by using

if PY3:
    foo
else:
    bar

That shouldn't be a problem.

I don't think pickle across pythons has been raised as an issue before, so thanks for that. ( edit: #686 )
Probably, it would be a hard sell to merge a new binary serialization format into pandas core,
perhaps HDFStore can serve as a de-facto storage format? @jreback moved it
light years ahead in the last couple of releases.

jreback · 2013-03-23T20:18:10Z

have u considered

http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytables

offers all of the savez type functionality, faster, has compressing options, and offers tables (optional) for another option

only downside is a couple of additional dependencies
py3k should be coming soon for PyTables btw

jreback · 2013-03-23T20:54:56Z

See this PyTables issue, provides a savez/PyTables comparsion:
PyTables/PyTables#185

Here is HDFStore export capability to R table format:
http://pandas.pydata.org/pandas-docs/dev/io.html#external-compatibility

Here (see 6), is something that could be useful: #2391

I could see adding an export method to HDFStore with a format specified, one of these could be npz (and R table format too). (Think of HDFStore as managing the binary file save formats for pandas)

cottrell · 2013-03-23T21:38:54Z

Is pytables necessary for running pandas? I thought it was optional.

cottrell · 2013-03-23T21:40:11Z

Also, npz and npy are not new serializations. They are part of numpy which pandas is built upon. Serializing the object and serialization the data are two fundamentally different things.

jreback · 2013-03-23T22:14:06Z

pytables is optional, but highly recommended, esp when dealing with data of any non-trivial size

you can simply do this I believe
np.savez(file, series.index, series.values)

what I think you are talking about is supporting this method officially. I have no problem with it, but its essentially deprecated as its a numpy only format.

just because something is in numpy does not mean pandas should support it, after all, just my 2c

wesm · 2013-03-23T22:41:35Z

Reminder that we need to implement a pickle-agnostic binary data format using msgpack or some such that is not dependent on pickle (and preferable not dependent on too many internal details of pandas objects).

cottrell · 2013-03-24T00:05:47Z

Had a go at getting PyTables with python3 ... still a lot of work I think. It looks like PyTables depends on numexpr which is not yet py3k'd. I've moderate success hacking away at these kinds of conversions but I don't really know what I'm doing which makes me less than an ideal contributor.

HDFStore looks like a great option. But it would be much better if it was a required dependency of pandas. On the other hand this would make pandas harder to install.

jreback · 2013-03-24T00:13:42Z

looks like both Numexpr and pytables are going to be py3 very soon in any event (the branches are merged )

the dependency doesn't matter, the user can install if they want. in fact for 0.11 we made Numexpr a highly recommended dependency in order to use internally (but all that means is doc warnings!)
if u want extra performance then the user would install it

another really good option if is read_csv/to_csv they are quite fast

jreback · 2013-03-24T00:14:50Z

fwiw both Numexpr and pytables are maintained by same team, so should be released together

cottrell · 2013-03-24T15:24:00Z

Sounds great! I'll try to find the dev branches and try it out ...

jreback · 2013-09-20T18:10:19Z

PyTables 3.0.0 and ne 2.1 solve this problem to a large extent

ghost mentioned this issue Mar 24, 2013

Create efficient binary storage format alternative to pickle #686

Closed

jreback closed this as completed Sep 20, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

savez method for DataFrame, Series: porting data between python2 and python3 #3151

savez method for DataFrame, Series: porting data between python2 and python3 #3151

cottrell commented Mar 23, 2013

ghost commented Mar 23, 2013

jreback commented Mar 23, 2013

jreback commented Mar 23, 2013

cottrell commented Mar 23, 2013

cottrell commented Mar 23, 2013

jreback commented Mar 23, 2013

wesm commented Mar 23, 2013

cottrell commented Mar 24, 2013

jreback commented Mar 24, 2013

jreback commented Mar 24, 2013

cottrell commented Mar 24, 2013

jreback commented Sep 20, 2013

savez method for DataFrame, Series: porting data between python2 and python3 #3151

savez method for DataFrame, Series: porting data between python2 and python3 #3151

Comments

cottrell commented Mar 23, 2013

ghost commented Mar 23, 2013

jreback commented Mar 23, 2013

jreback commented Mar 23, 2013

cottrell commented Mar 23, 2013

cottrell commented Mar 23, 2013

jreback commented Mar 23, 2013

wesm commented Mar 23, 2013

cottrell commented Mar 24, 2013

jreback commented Mar 24, 2013

jreback commented Mar 24, 2013

cottrell commented Mar 24, 2013

jreback commented Sep 20, 2013