Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails for multilevel columns pandas DataFrames #346

Closed
antoinecollet5 opened this issue Jan 22, 2021 · 1 comment
Closed

Fails for multilevel columns pandas DataFrames #346

antoinecollet5 opened this issue Jan 22, 2021 · 1 comment

Comments

@antoinecollet5
Copy link
Contributor

antoinecollet5 commented Jan 22, 2021

Jsonpickle does not support multilevel columns pandas DataFrames.

Here is an example that fails. As you can see, the multiindex is supported, not the multilevel columns.

import pandas as pd
import numpy as np
import jsonpickle as jp
import jsonpickle.ext.pandas as jsonpickle_pd

### This works ###
midx = pd.MultiIndex(levels=[["zero", "one"], ["x", "y"]], 
                     codes=[[1, 1, 0, 0], [1, 0, 1, 0]])
df = pd.DataFrame(np.random.randn(4, 2), index=midx, columns=['col1', 'col2'])
frozen = jp.encode(df)
thawed = jp.decode(frozen)

### This does not ###
iterables = [['inj', 'prod'], ['hourly', 'cum']]
names = ['first', 'second']
# transform it to tuples
columns = pd.MultiIndex.from_product(iterables, names=names)
# build a multi-index from it
df2 = pd.DataFrame(np.random.randn(3, 4), index=["A", "B", "C"], 
                   columns=columns)
frozen2 = jp.encode(df2)
thawed2 = jp.decode(frozen2)

Here is a modification of pandas.py that seems to fix the issue. I am new to git so I need to figure out how to make a clean pull request. I ll try to add an appropriate unit test.

def make_read_csv_params(meta):
    meta_dtypes = meta.get('dtypes', {})
    # [None] makes it compatible with objects serialized before 
    # column_levels_names has been introduced.
    column_level_names = meta.get('column_level_names', [None])
    # The header is used to select the rows of the csv from which
    # the columns names are retrived
    header = meta.get('header', [0])
    parse_dates = []
    converters = {}
    dtype = {}
    timedeltas = []
    for k, v in meta_dtypes.items():
        if v.startswith('datetime'):
            parse_dates.append(k)
        elif v.startswith('complex'):
            converters[k] = complex
        elif v.startswith('timedelta'):
            timedeltas.append(k)
            dtype[k] = 'object'
        else:
            dtype[k] = v

    return dict(dtype=dtype, header=header, parse_dates=parse_dates, 
                converters=converters), timedeltas, column_level_names


class PandasDfHandler(BaseHandler):
    pp = PandasProcessor()

    def flatten(self, obj, data):
        dtype = obj.dtypes.to_dict()

        meta = {'dtypes': {k: str(dtype[k]) for k in dtype}, 
                'index': encode(obj.index),
                'column_level_names': obj.columns.names,
                'header': list(range(len(obj.columns.names)))}

        data = self.pp.flatten_pandas(
            obj.reset_index(drop=True).to_csv(index=False), data, meta
        )
        return data

    def restore(self, data):
        csv, meta = self.pp.restore_pandas(data)
        params, timedeltas, column_level_names = make_read_csv_params(meta)

        df = (
            pd.read_csv(StringIO(csv), **params)
            if data['values'].strip()
            else pd.DataFrame()
        )
        for col in timedeltas:
            df[col] = pd.to_timedelta(df[col])    
        df.set_index(decode(meta['index']), inplace=True)
        # restore the column level(s) name(s)
        df.columns.names = column_level_names        
        return df
@antoinecollet5
Copy link
Contributor Author

I just did a pull request.

@davvid davvid closed this as completed in 565c299 Jan 31, 2021
davvid added a commit to davvid/jsonpickle that referenced this issue Jan 31, 2021
ujson is needed in order to pass the pandas tests, so add it to
the general "testing" section.  We should work to eliminate this.

ujson is now fully python3 compatible and does not need to be
blocked on python3.8 anymore.

Related-to: jsonpickle#346 jsonpickle#347
Signed-off-by: David Aguilar <davvid@gmail.com>
davvid added a commit to davvid/jsonpickle that referenced this issue Jan 31, 2021
The ujson module was narrowed down as the reason why the tests
were passing on python2 and failing on newer python3 versions.

Re-enable the multilevel columns test now that ujson is present.

Related-to: jsonpickle#346 jsonpickle#347
Signed-off-by: David Aguilar <davvid@gmail.com>
davvid added a commit to davvid/jsonpickle that referenced this issue Jan 31, 2021
Flatten the dtypes meta dictionary before handing it off to the backend
to ensure that special types, such as tuples in dicts, are handled properly.

Related-to: jsonpickle#346 jsonpickle#347
Signed-off-by: David Aguilar <davvid@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant