Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/ENH: Bad columns dtype when creating empty DataFrame #22858

Closed
araraonline opened this issue Sep 27, 2018 · 3 comments

Comments

Projects
None yet
4 participants
@araraonline
Copy link
Contributor

commented Sep 27, 2018

Code Sample

>>> df = pd.DataFrame(columns=list('ABC'), dtype='int64')
>>> df
Empty DataFrame
Columns: [A, B, C]
Index: []
>>> df.dtypes
A    float64
B    float64
C    float64
dtype: object

Problem description

When creating a DataFrame with no rows, the presence of a dtype argument may convert the columns into float64. The problem does not happen if the DataFrame has one or more rows:

>>> df = pd.DataFrame([[1, 2, 3]], columns=list('ABC'), dtype='int64')
>>> df
   A  B  C
0  1  2  3
>>> df.dtypes
A    int64
B    int64
C    int64
dtype: object

Expected Output

>>> df = pd.DataFrame(columns=list('ABC'), dtype='int64')
>>> df.dtypes
A    int64
B    int64
C    int64
dtype: object

Output of pd.show_versions()

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.5-arch1-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.8.0
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.8
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: 0.4.0
matplotlib: 2.2.3
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.6
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None

@JustinZhengBC

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

This seems to be intended behaviour, as demonstrated by the following test in pandas/tests/frame/test_constructor.py::TestDataFramConstructors::test_constructor_corner

df = DataFrame(index=lrange(10), columns=['a', 'b'], dtype=int)
    assert df.values.dtype == np.dtype('float64')

The code responsible for this behaviour is found in pandas/core/dtypes/cast.py, on line 1223. Commenting out these two lines causes the above test, and no others, to fail in the pytest suite.

if is_integer_dtype(dtype) and isna(value):
    dtype = np.float64
@araraonline

This comment has been minimized.

Copy link
Contributor Author

commented Oct 3, 2018

I don't feel this is intended behavior, but it may be a rough corner produced by the code you mentioned.

In the issue sample, the columns are empty, therefore, no need to upcast to float:

>>> df = pd.DataFrame(columns=list('ABC'), dtype='int64')
>>> df
Empty DataFrame
Columns: [A, B, C]
Index: []

In the test case you mentioned, though, the DataFrame must be filled with NaN and therefore float is needed:

>>> df = pd.DataFrame(index=range(10), columns=['a', 'b'], dtype=int)
>>> df
    a   b
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
@JustinZhengBC

This comment has been minimized.

Copy link
Contributor

commented Oct 3, 2018

Good point. Theoretically it could be fixed by making the int cast to float only if an lrange is specified. I can try it out later and submit a PR if the tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.