DataFrame constructor acts differently with lists and Numpy arrays #9131

scls19fr · 2014-12-22T17:38:25Z

Hello,

I noticed a strange behavior when a Numpy array is given to a Pandas DataFrame constructor.
I don't know really if it should be considered as an issue (or a feature)...
but anyway a tip to fix this will be nice.

import pandas as pd
import numpy as np
lst = [{'a':1, 'b':2}, {'a':3, 'b':2, 'c':3}]

In []: pd.DataFrame(lst)
Out[]:
   a  b   c
0  1  2 NaN
1  3  2   3

but with Numpy array

In []: arr=np.array(lst)
In []: arr
Out[]: array([{'a': 1, 'b': 2}, {'a': 3, 'c': 3, 'b': 2}], dtype=object)

In []: pd.DataFrame(arr)
Out[]:
                             0
0           {u'a': 1, u'b': 2}
1  {u'a': 3, u'c': 3, u'b': 2}

I was expecting same results. I was expecting a DataFrame with columns named 'a', 'b', 'c' like
when I feed DataFrame with a standard list.

I can "fix" this using

In []: pd.DataFrame(list(arr))
Out[]:
   a  b   c
0  1  2 NaN
1  3  2   3

I don't think that pd.DataFrame(list(arr)) is a nice idea... (with a big array it will be probably very long)

Any idea ?

Kind regards

The text was updated successfully, but these errors were encountered:

jreback · 2014-12-23T00:09:31Z

if you are starting with a python structure (a list), I am not sure of the issue here. Why would you convert to a numpy array first? what are you trying to do?

scls19fr · 2014-12-23T07:43:57Z

I'm getting data from a JSON response. This is a list of nested dictionaries which first need to be flatten. I could do this:

data = [flatten_dict(d) for d in data]

but I think it's better for performance issue to work with numpy arrays

f_flatten_dict = np.vectorize(flatten_dict)
a_data = np.array(data)
a_data = f_flatten_dict(a_data)

and I build a DataFrame

Here is my flatten_dict function

def flatten_dict(d, parent_key=''):
    """Recursively flatten a dict"""
    items = []
    for k, v in d.items():
        new_key = parent_key + '_' + k if parent_key else k
        if isinstance(v, collections.MutableMapping):
            items.extend(flatten_dict(v, new_key).items())
        elif type(v) == list:
            for n in range(len(v)):
                mykey = "%s_%d" % (new_key, n)
                items.extend(flatten_dict(v[n], mykey).items())
        else:
            items.append((new_key, v))
    return dict(items)

shoyer · 2014-12-23T07:54:54Z

@scls19fr I would encourage you to profile your code to test your theories about performance (IPython makes this easy with the %timeit magic). In this case, I am pretty sure that a numpy array would not be faster than a Python list -- usually using non-native types in your array or np.vectorize are signs that numpy will not speed things up.

To give a little more context on the design here, pandas does some inference steps about how to format the data only when it is provided as a list for this exact reason, because it's usually not a good idea to nest dictionaries in numpy arrays.

scls19fr · 2014-12-23T10:43:59Z

Thanks. I understand your point of view. But is there any tip to create columns from dict keys automatically ?

jreback · 2014-12-23T10:49:50Z

@scls19fr you might want to have a look here: http://pandas.pydata.org/pandas-docs/stable/io.html#normalization

closing as a usage issue

shoyer added the Usage Question label Dec 22, 2014

jreback closed this as completed Dec 23, 2014

scls19fr mentioned this issue Dec 24, 2014

Python Pandas DataFrame output marians/openweather#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame constructor acts differently with lists and Numpy arrays #9131

DataFrame constructor acts differently with lists and Numpy arrays #9131

scls19fr commented Dec 22, 2014

jreback commented Dec 23, 2014

scls19fr commented Dec 23, 2014

shoyer commented Dec 23, 2014

scls19fr commented Dec 23, 2014

jreback commented Dec 23, 2014

DataFrame constructor acts differently with lists and Numpy arrays #9131

DataFrame constructor acts differently with lists and Numpy arrays #9131

Comments

scls19fr commented Dec 22, 2014

jreback commented Dec 23, 2014

scls19fr commented Dec 23, 2014

shoyer commented Dec 23, 2014

scls19fr commented Dec 23, 2014

jreback commented Dec 23, 2014