Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty DataFrame constructor disregards dtype kwarg #10106

Closed
mrocklin opened this issue May 11, 2015 · 7 comments
Closed

Empty DataFrame constructor disregards dtype kwarg #10106

mrocklin opened this issue May 11, 2015 · 7 comments
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions

Comments

@mrocklin
Copy link
Contributor

This is new in 0.16.1

In [1]: import pandas as pd

In [2]: pd.__version__ 
Out[2]: '0.16.1'

In [3]: f = pd.DataFrame({'a': [0, 10, 20, 30, 40], 'b': [5, 4 ,3, 2, 1]},
                          index=[1, 2, 3, 4, 4])

In [4]: f.dtypes 
Out[4]: 
a    int64
b    int64
dtype: object

In [6]: pd.DataFrame(columns=['a', 'b'], dtype=f.dtypes).dtypes
Out[6]: 
a    object
b    object
dtype: object

Also other mechanisms to convey dtype information yield uninformative errors

In [7]: pd.DataFrame(columns=['a', 'b'], dtype={'a': 'i4', 'b': 'f4'}).dtypes
ValueError: entry not a 2- or 3- tuple

In [8]: pd.DataFrame(columns=['a', 'b'], dtype=[('a', 'i4'), ('b', 'f4')]).dtypes
NotImplementedError: compound dtypes are not implementedin the DataFrame constructor

Is there a way to create an empty DataFrame with given dtypes in 0.16.1?

@mrocklin
Copy link
Contributor Author

Oh, this is on 3.4

In [11]: sys.version_info 
Out[11]: sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0)

@jreback
Copy link
Contributor

jreback commented May 11, 2015

this is #4464. It is not implemented to pass a compound dtype. (regardless of the error message)

work-around is to do this:

In [2]: pd.DataFrame({'a' : Series(dtype='i4'), 'b' : Series(dtype='f4')})
Out[2]: 
Empty DataFrame
Columns: [a, b]
Index: []

In [3]: pd.DataFrame({'a' : Series(dtype='i4'), 'b' : Series(dtype='f4')}).dtypes
Out[3]: 
a      int32
b    float32
dtype: object

@mrocklin
Copy link
Contributor Author

Hrm, that's odd. Tests started failing for me only recently. This appeared to be the reason. Looks like I'll have to dig more deeply. Thanks for the rapid response. Closing.

@mrocklin
Copy link
Contributor Author

And thanks for the suggestions. Those look like sane workarounds.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions API Design labels May 11, 2015
@jiffyclub
Copy link

I also had tests on empty DataFrames start failing with 0.16.1. For me the difference comes down to this differing behavior in the DataFrame constructor, even when both create empty DataFrames:

In [2]: pd.DataFrame(columns=['x']).x
Out[2]: Series([], Name: x, dtype: object)

In [3]: pd.DataFrame({'x': []}).x
Out[3]: Series([], Name: x, dtype: float64)

Note the different dtypes of the columns.

@jreback
Copy link
Contributor

jreback commented May 16, 2015

@jiffyclub these are not the same as the default for a list (even if empty, when no dtype is specified) is to return a float dtype

In [2]: np.array([]).dtype
Out[2]: dtype('float64')

In the first case (your [2]), however you haven't specified anything at all, so object is correct.

@mrocklin
Copy link
Contributor Author

Yeah, I think that the new behavior is appropriate.

Both @jiffyclub and I were surprised by the change, but that's a small price to pay for progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

3 participants