Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN deprecate save&load in favour of to_pickle&read_pickle #3787

Merged
merged 2 commits into from
Jun 15, 2013

Conversation

hayd
Copy link
Contributor

@hayd hayd commented Jun 7, 2013

Add read_pickle to top-level and to_pickle as instance methods, deprecation warning til 0.12 for save and load everwhere. See lower down for how it's working.

Both read_pickle and to_pickle are in io.pickle, save&load remain in core.common (but call to_pickle, read_pickle resp).

cc #3782.

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

though you were going to move to io/pickle.py ? I guess not much code....

need to add in io.rst / v0.11.1 the to_pickle/from_pickle refs....

also...the pickling section in the docs, I think in basics.rst should be moved to io.rst

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

I haven't managed to do that yet (getting some nasty circular import errors...)

I will move the pickle docs to io anyway.

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

you can leave save/load where they are (but import to_pickle/from_pickle from io/pickle) I think will fix
so core.common is not dependent on anything, and io/pickle can import it

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

I'm pretty sure I tried that exactly, but it wasn't working. Will stash apply and push it here so we can have a look.

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

@jreback something obvious?

@@ -25,6 +20,8 @@

from pandas.core.config import get_option
from pandas.core import array as pa
import pandas.io.pickle as pkl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the problem, do this IN the save routine (and load routine), so you don't get circular inputs (e.g. to_hdf in core/generic.py)

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

Darn, this means I can't do my clever @set_doc thing... (Thanks it was indeed that!)

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

ok, updated (Still WIP).

Should I rename the method from_pickle from inside the generics class (in line with others). I wasn't sure if it should force it to be a DataFrame...

Related, while messing around... odd behaviour with the type checking of from_csv, should be a separate issue bit... huh? from_csv is surely not working correctly.

In [9]: s = pd.Series([1,2,3])

In [10]: s.to_csv('a')

In [11]: df = pd.DataFrame.from_csv('a')

In [12]: df
Out[12]:
            1
0
2013-06-01  2
2013-06-02  3

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

didn't realized there was a from_pickle in DF, yes get rid of it (well put a warning, then call read_pickle), we'll take it out in 0.12 (same with from_csv), but that's a bit more involved so its a separate issue (as its used internally)

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

as to your from_csv question, it shouldn't be attached to a particular class (and I think that might be the issue here, some of the default options are 'wrong') as they assume certain things

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

Nono, there isn't. I was going to call it to_pickle read_pickle inside the DataFrame, but lots* of other variables use the from_ prefix:

In [2]: pd.DataFrame.fr
pd.DataFrame.from_csv      pd.DataFrame.from_items
pd.DataFrame.from_dict     pd.DataFrame.from_records

There's no read_ counterparts inside the DataFrame object...

* well, 4.

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

maybe I misundestaood....read_pickle is equiv of the from_*.....(of which from_csv will go way)...from_dict to stay thought IMHO it is useless clutter too

to_pickle is replacing save....and fyi....just put in core/generic nothing added to DataFrame (put it in PandasObject) as to_pickle....

in 0.12...also fixing the inheritance so that things like Index will inherti from PandasObject (which is changed as well....its really the top-level object in 0.12).....but doesn't matter for now

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

Yeah that's where it is in the pr :), just giving example.

Currently load is also inside the DataFrame, but returns whatever the pickle was (i.e. could be a series). This is the bit I think is a little odd. (Am I making any sense?)

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

where is load inside Frame?

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

pd.DataFrame.load/read_pickle inherits from generic like you say... which does explain the behaviour. But I still kinda think it's confusing...


@classmethod #@set_doc(pkl.read_pickle)
def read_pickle(cls, path):
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this (its already in core.common)

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

so take both read_pickle and load out of generic? I think that's reasonable.

Ruthless, but reasonable... :)

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

yes....read_xxx ONLY operate from the top level (which makes sense because you a) don't want to have to add to every conceivable object, and b) you could have different return objects based on your options (e.g. the squeeze option in pd.read_csv)

the to_xxx exist in core/generic only

so load/save you are leaving on core.common until 0.12 (with a warning)....

(and save with a warning in core/generic as well to be replace by to_pickle)

the idea was to completely presever back compat with the old names (but provide warnings)

then break on 0.12

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

@hayd

as an aside.....there is a pandas.utils.decorators '@deprecate'...but of course I didn't use it anywhere.....

you want to change the warnings to use this? (i think also io.parsers.ExcelFile would need changing too)

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

Ok, your inline comments have confused me.

Just to confirm:

  • include read_pickle in common, but not in generic.
  • include to_pickle in common & generic
  • save/load as they are in generic & common (with warning these will be removed in 0.12)

last two are already current behaviour of pr.

If that's right, I also should remove references to Series.read_pickle and friends from the api rsts.

(yeah I'll go through the warnings.)

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

  • read_pickle should be only in io/pickle.py (and picked up by the io api)
  • to_pickle only in core/generic.py (under PandasObject so it picks up Series (we are sort of breaking the ability to pickle explicity Index, but I don't think that matters)

save/load are screwed up......so

  • save: remove from core/common.py, with warnings in core/generic.py
  • load as it exists in core/common.py but with a warning, remove from core/generic.py

end up with 1 save, 1 load (in the right places, just deprecated)

also remove save/load from top-level api (and instead temporarily import them from the io/api.py``
just for consistency (and put a note to remove in 0.12)

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

I didn't realize save/load were in so many places.....bad code :)

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

In [48]: s1.dump?
Type:       builtin_function_or_method
String Form:<built-in method dump of Series object at 0x10411d2a8>
Docstring:
a.dump(file)

Dump a pickle of the array to the specified file.
The array can be read back with pickle.load or numpy.load.

Parameters
----------
file : str
    A string naming the dump file.

@hayd
Copy link
Contributor Author

hayd commented Jun 7, 2013

That comes from numpy... hmmm even if we delete load in pandas does this mean it'll still be there?

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

I don't think that is actually necessary
can leave....doesn't really matter

@cpcloud
Copy link
Member

cpcloud commented Jun 7, 2013

i can't wait for msgpack...!

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

yep....if some is insterest in another project....I think that pandas needs to incorporate the msgpack cython code directly (I think the license is ok) to make it more efficient

@wesm
Copy link
Member

wesm commented Jun 10, 2013

I'll wait to merge. This is going to make some people grumpy but easy to fix their code.

Btw guys, I will pull msgpack directly into pandas for customization...

@jreback
Copy link
Contributor

jreback commented Jun 10, 2013

@hayd andy this is easy to make completely back compat; just import load/save into top-level name space from com (and then have those versions show the warnings), then call the appropriate to / from pickle....so just like with other io API, providing warning if you use the original method in 0.11.1, but still works, removing in 0.12 (in favor or new method)

@hayd
Copy link
Contributor Author

hayd commented Jun 10, 2013

So keep all instances of save and load (but with deprecate warning), basically back to our earlier iteration (?) but to_pickle not in top level...

@jreback
Copy link
Contributor

jreback commented Jun 10, 2013

yes on save/load

to_pickle is an instance method, and also a top-level method (so should be in io.api AND core/generic)

@hayd
Copy link
Contributor Author

hayd commented Jun 10, 2013

Really? I thought we were killing that and just going to have read_pickle top level (and save and load)?

to_pickle (and save and load) instance methods.

@jreback
Copy link
Contributor

jreback commented Jun 10, 2013

yes (I am reversing myself).....I guess save/load exist as imports from top-level and com now, so let's leave all those cases (with a warning) for now (also load on instance methods? )

then fix all in 0.12

@hayd
Copy link
Contributor Author

hayd commented Jun 10, 2013

How it's working:

In [1]: df = pd.DataFrame()

In [2]: df.save('path')
pandas/core/generic.py:42: FutureWarning: save is deprecated and will be removed in v0.12, use to_pickle
  warnings.warn("save is deprecated and will be removed in v0.12, use to_pickle", FutureWarning)

In [3]: pd.save(df, 'path')
pandas/core/common.py:2098: FutureWarning: save is deprecated and will be removed in v0.12, use obj.to_pickle
  warnings.warn("save is deprecated and will be removed in v0.12, use obj.to_pickle", FutureWarning)

In [4]: df.to_pickle('path')  # the new way

In [9]: pd.to_pickle  # not a way
AttributeError: 'module' object has no attribute 'to_pickle'
In [5]: pd.load('path')
pandas/core/common.py:2083: FutureWarning: load is deprecated and will be removed in v0.12, use obj.read_pickle
  warnings.warn("load is deprecated and will be removed in v0.12, use read_pickle", FutureWarning)
Out[5]:
Empty DataFrame
Columns: []
Index: []

In [6]: df.load('path')
pandas/core/generic.py:48: FutureWarning: load is deprecated and will be removed in v0.12, use pd.read_pickle
  warnings.warn("load is deprecated and will be removed in v0.12, use pd.read_pickle", FutureWarning)
Out[6]:
Empty DataFrame
Columns: []
Index: []

In [7]: pd.read_pickle('path')  # the new way
Out[7]:
Empty DataFrame
Columns: []
Index: []

In [8]: df.read_pickle('path')  # not a way
AttributeError: 'DataFrame' object has no attribute 'read_pickle'

I still have to fiddle around with the docs a little (I've probably missed some things), and then will squash.

@jreback
Copy link
Contributor

jreback commented Jun 10, 2013

that looks good

I am +1 on merging (when you r done with docs)

@wesm ?

@hayd
Copy link
Contributor Author

hayd commented Jun 10, 2013

@jreback As we discussed, save and load have remained in common, but I'm not sure how to do the main api.rst when functions are across two (core.common and io.pickle)... should I just drop save and load from the api.rst?

@jreback
Copy link
Contributor

jreback commented Jun 10, 2013

yes I would they r deprecated

@hayd
Copy link
Contributor Author

hayd commented Jun 10, 2013

Ok, I think that's it, have rebased and squashed. I don't think I've missed any save/load calls anywhere (tests/travis seem happy).

@jreback
Copy link
Contributor

jreback commented Jun 12, 2013

@wesm ok to merge on this, should provide complete back-compat (with a warning), and introduce new conformat to_pickle / read_pickle syntax

@wesm
Copy link
Member

wesm commented Jun 13, 2013

okay merge away. By the way, going to have to leave in the deprecated save/load methods for at least a year because of the book and stuff. We should probably start making a page that says "what has changed in pandas since Python for Data Analysis was published" to avoid making people angry (especially less hackerish folk)

@hayd
Copy link
Contributor Author

hayd commented Jun 13, 2013

Will delete bits saying "and will be removed in v0.12" then :)

That is an excellent idea.

@wesm
Copy link
Member

wesm commented Jun 15, 2013

ok pls let me know when ready to merge

@hayd
Copy link
Contributor Author

hayd commented Jun 15, 2013

Ready!

hayd added a commit that referenced this pull request Jun 15, 2013
CLN deprecate save&load in favour of to_pickle&read_pickle
@hayd hayd merged commit 4e02cae into pandas-dev:master Jun 15, 2013
@twmr
Copy link

twmr commented Jun 15, 2013

Does it make sense to add support for reading from file like objects (StringIO) and writing to such objects to read_pickle/to_pickle ? I can come up with a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants