Pickle incompatibility between 0.25 and 1.0 when saving a MultiIndex dataframe #34535

naspli · 2020-06-02T20:44:11Z

This seems to have caused the problem described here:

https://stackoverflow.com/questions/61641738/pandas-1-0-cannot-pickle-load-dict-containing-dataframe-with-multiindex

which I'm now also experiencing. I dumped a MultiIndex dataframe containing ndarrays to a pickle on disk under pandas 0.25.x in Python 3.6, and now I'm getting:

AttributeError: Can't get attribute 'FrozenNDArray' on <module 'pandas.core.indexes.frozen'

when trying to load it in pandas 1.0.3 (still on Python 3.6). Any suggestions/workarounds? Should I instead open up a new issue?

This solved issue seems related, but is for Python 2.7:

#31988

This comment in the original rationale for getting rid of FrozenNDArray mentions pandas.compat.pickle_compat.py, which seems relevant:

#9031 (comment)

Originally posted by @mspacek in #29335 (comment)

The text was updated successfully, but these errors were encountered:

naspli · 2020-06-02T20:47:48Z

@WillAyd #29335 (comment) suggested @mspacek open a new issue but I did not see it. I have ran into the same problem.

Froskekongen · 2020-08-05T09:01:11Z

Is there any current efforts on fixing this issue, or is the recommended approach finding a workaround?

jreback · 2020-08-05T11:01:03Z

you would have to show an actual example

pd.read_pickle is the way to load as it ensures compatibility (pickle.load will not work)

we test loading older pickles explicitly so likely this is your setup

Froskekongen · 2020-08-05T11:49:16Z

Thanks, @jreback. The issue is of course related to having containers that contain dataframes. Here is an example with dataclasses:

Run first with pandas==0.25.2 and then with 1.1 to reproduce.

from dataclasses import dataclass
import pandas as pd
import numpy as np
import pickle as pkl


@dataclass
class ContainerWithDataframe:
    name: str
    frame: pd.DataFrame

    def save(self):
        with open(f"{self.name}_{pd.__version__}.pkl", 'wb') as ww:
            pkl.dump(self, ww)


def load_container(name, pdversion):
    fname = f"{name}_{pdversion}.pkl"
    try:
        with open(fname, 'rb') as ff:
            return pkl.load(ff)
    except AttributeError as e:
        print(f"Can't load {fname}: {e}")


if __name__ == "__main__":
    df1 = pd.DataFrame(data=np.ones((2, 2)))
    df2 = pd.DataFrame(data=2*np.ones((2, 2)), index=pd.MultiIndex.from_arrays([[1, 2], ['a', 'a']]))
    df3 = pd.DataFrame(data=np.ones((2,2)), columns=pd.MultiIndex.from_product([[1], ['A', 'B']]))

    cont1 = ContainerWithDataframe('regular', df1)
    cont2 = ContainerWithDataframe('index_multindex', df2)
    cont3 = ContainerWithDataframe('columns_multiindex', df3)

    cont1.save()
    cont2.save()
    cont3.save()

    cont1_loaded = load_container('regular', "0.25.2")
    cont2_loaded = load_container('index_multindex', "0.25.2")
    cont3_loaded = load_container('columns_multiindex', "0.25.2")

Is there a way to modify the container to share the logic used by pd.read_pickle?

jreback · 2020-08-05T12:30:24Z

you should simply use pd.read_pickle

Froskekongen · 2020-08-05T13:20:08Z

@jreback: What exactly do you mean by this (you should simply use pd.read_pickle)?

The example above is a toy example showing the problem. In our systems we have a computational graph where each node itself can be a computational graph. The primary building blocks of the graph may contain dataframes, and what we are doing now is to pickle these computational graph, so that we can run them elsewhere.

jreback · 2020-08-05T13:23:35Z

exactly what i said
pickle.load does not handle backward compatibility
never has never will

pd.read_pickle does

veneto-maggio · 2020-08-16T03:21:50Z

It seems a valid use case where dataframe(s) are parts of a larger container which is pickled as a whole. In this case pd.read_pickle does not apply, although one can workaround this by defining some new class which holds the dataframe and its setstate calls pd.read_pickle. I'd hope that pickle.load of pandas dataframe includes some backward compatibility support like that.

Corrects brain-score#11 I encountered FrozenIndices Error trying to load a pandas dataframe after updating my pandas version. This commit introduces backward compatibility, i.e. you are fine if you pickle.dump() using an old pandas' version and trying to load using a new pandas' version. pandas-dev/pandas#34535 pickle.load() still called for all non Dataframe objects see https://github.com/pandas-dev/pandas/blob/f2c8480af2f25efdbd803218b9d87980f416563e/pandas/io/pickle.py#L203

Corrects #11 I encountered FrozenIndices Error trying to load a pandas dataframe after updating my pandas version. This commit introduces backward compatibility, i.e. you are fine if you pickle.dump() using an old pandas' version and trying to load using a new pandas' version. pandas-dev/pandas#34535 pickle.load() still called for all non Dataframe objects see https://github.com/pandas-dev/pandas/blob/f2c8480af2f25efdbd803218b9d87980f416563e/pandas/io/pickle.py#L203

pseudotensor · 2021-05-03T22:17:17Z

@jreback Ya, the idea that one only ever loads isolated pandas frames is quite simplified. As @veneto-maggio said, often pandas frame would be part of large pickle file, so direct support for pickle is most general.

fredrikw · 2024-05-29T08:41:26Z

Old issue, I know but if anyone else finds this by googling, pd.read_pickle will handle any pickled object, not just pickled DataFrames! I believe that is what jreback is refering to.

naspli changed the title ~~Pickle incompatibility between 0.25 and 1.0 when using saving a MultiIndex dataframe~~ Pickle incompatibility between 0.25 and 1.0 when saving a MultiIndex dataframe Jun 2, 2020

jbrockmendel added the IO Pickle read_pickle, to_pickle label Jun 5, 2020

jreback closed this as completed Aug 5, 2020

jreback added this to the No action milestone Aug 5, 2020

p-mc-grath mentioned this issue Apr 1, 2021

backward compatibility for caching done with old pandas indices brain-score/result_caching#11

Merged

p-mc-grath mentioned this issue Apr 3, 2021

backward compatibility for caching done with old pandas indices brain-score/result_caching#13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pickle incompatibility between 0.25 and 1.0 when saving a MultiIndex dataframe #34535

Pickle incompatibility between 0.25 and 1.0 when saving a MultiIndex dataframe #34535

naspli commented Jun 2, 2020

naspli commented Jun 2, 2020

Froskekongen commented Aug 5, 2020

jreback commented Aug 5, 2020

Froskekongen commented Aug 5, 2020

jreback commented Aug 5, 2020

Froskekongen commented Aug 5, 2020

jreback commented Aug 5, 2020

veneto-maggio commented Aug 16, 2020

pseudotensor commented May 3, 2021

fredrikw commented May 29, 2024

Pickle incompatibility between 0.25 and 1.0 when saving a MultiIndex dataframe #34535

Pickle incompatibility between 0.25 and 1.0 when saving a MultiIndex dataframe #34535

Comments

naspli commented Jun 2, 2020

naspli commented Jun 2, 2020

Froskekongen commented Aug 5, 2020

jreback commented Aug 5, 2020

Froskekongen commented Aug 5, 2020

jreback commented Aug 5, 2020

Froskekongen commented Aug 5, 2020

jreback commented Aug 5, 2020

veneto-maggio commented Aug 16, 2020

pseudotensor commented May 3, 2021

fredrikw commented May 29, 2024