Dataframe column dtype changed from int8 to int64 when setting complete column #11638

cpaulik · 2015-11-18T12:17:46Z

The following example should explain:

Python 2.7.10 |Continuum Analytics, Inc.| (default, Oct 19 2015, 18:04:42) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-52-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: None
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None

In [4]: df = pd.DataFrame({'one': np.full(10, 0, dtype=np.int8)})

In [5]: df
Out[5]: 
   one
0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    0

In [6]: df.dtypes
Out[6]: 
one    int8
dtype: object

In [7]: df.loc[1, 'one'] = 6

In [8]: df
Out[8]: 
   one
0    0
1    6
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    0

In [9]: df.dtypes
Out[9]: 
one    int8
dtype: object

In [10]: df.one = np.int8(7)

In [11]: df.dtypes
Out[11]: 
one    int64
dtype: object

In [12]: df
Out[12]: 
   one
0    7
1    7
2    7
3    7
4    7
5    7
6    7
7    7
8    7
9    7

So it is cast to the correct dtype if a slice of the column is changed but setting the whole column changes the dtype even when explicitly set to np.int8

The text was updated successfully, but these errors were encountered:

jreback · 2015-11-18T12:20:24Z

hmm, I recall seeing almost the same issue, but can't locate ATM. yep, looks buggy. pull-requests to fix are welcome.

varunkumar-dev · 2015-11-19T00:08:50Z

@jreback In comman.py
it is upcasting int8 to int64.

 # provide implicity upcast on scalars
    elif is_integer(val):
        dtype = np.int64

    elif is_float(val):
        dtype = np.float64

if there is no specific requirement for upcasting, then I can do a PR .
My proposed solution is using np.issubdtype inside is_integer and is_float function to provide support for all kind of int and float types.
Please suggest .

jreback · 2015-11-19T00:13:08Z

this is actually tricky. do lib.isscalar on the val first, if its a scalar, then use the defaults, else you should 'assume' its a zero-dim scalar and do val.dtype

alternatively you can do

if isinstance(val, ndarray):
    dtype = val.dtype
else:
    dtype = np.int64

etc (this might be better)

varunkumar-dev · 2015-11-19T01:17:44Z

We are calling _infer_dtype_from_scalar(val) which is already doing this job. How does it solve the problem ? Am I missing something ?

jreback · 2015-11-19T01:20:03Z

right......so must be someplace else then (because _infer_dtype_from_scalar is doing the correct job)

varunkumar-dev · 2015-11-19T01:25:51Z

As pointed earlier , how about modifying _infer_dtype_from_scalar(val) and use np.issubdtype inside is_integer(val) and is_float(val) function to provide support for all kind of int and float types ? I am assuming that this will generate the same error for float types as well.

jreback · 2015-11-19T01:40:47Z

how would that help?
these are already caught above

varunkumar-dev · 2015-11-19T03:14:37Z

If we modify integer and float conditions in _infer_dtype_from_scalar(val) like this

elif is_integer(val):
        if isinstance(val, int):
            dtype = np.int64
        else:
            dtype = type(val)

elif is_float(val):
        if isinstance(val, float):
            dtype = np.float64
        else:
            dtype = type(val)

It will resolve the discrepancy without breaking anything. Please suggest .

jreback · 2015-11-19T03:17:35Z

ahh, the problem is this:

In [1]: x = np.int8(7)

In [2]: isinstance(x, np.ndarray)
Out[2]: False

In [9]: pd.core.common.is_integer(x)
Out[9]: 1

you can try that change that you are suggesting above and see what breaks (and of course add a test for this behavior).

It looks like it should work.

Added test case TestInferDtype

jreback · 2015-11-20T00:59:32Z

closed by #11644

cpaulik · 2015-11-20T15:13:51Z

Wow, thank you for the quick fix. I was going to try but got lost in the pandas internals. Maybe next time 😄

COMPAT: compat of scalars on all platforms, xref #11638

jreback · 2015-11-20T15:18:49Z

thank @varun-kr !

Version 0.17.1 * tag 'v0.17.1': (168 commits) add nbviewer link Revert "DOC: fix sponsor notice" DOC: a few touchups DOC: fix sponsor notice DOC: warnings and remove HTML COMPAT: compat of scalars on all platforms, xref pandas-dev#11638 DOC: fix build errors/warnings DOC: whatsnew edits DOC: fix link syntax DOC: update release.rst / whatsnew edits BUG: fix col iteration in DataFrame.round, pandas-dev#11611 DOC: Clarify foramtting BUG: pandas-dev#11638 return correct dtype for int and float BUG: pandas-dev#11637 fix to_csv incorrect output. DOC: sponsor notice BUG: indexing with a range , pandas-dev#11652 Fix link to numexpr ENH: fixup tilde expansion, xref pandas-dev#11438 ENH: tilde expansion for write output formatting functions, pandas-dev#11438 DOC: fix up doc-string creations in generic.py ...

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Nov 18, 2015

jreback added this to the Next Major Release milestone Nov 18, 2015

varunkumar-dev mentioned this issue Nov 19, 2015

BUG #11638 return correct dtype for int and float #11644

Closed

jreback modified the milestones: 0.17.1, Next Major Release Nov 19, 2015

varunkumar-dev added a commit to varunkumar-dev/pandas that referenced this issue Nov 19, 2015

BUG pandas-dev#11638 return correct dtype for int and float

c61020e

Added test case TestInferDtype

jreback pushed a commit to jreback/pandas that referenced this issue Nov 20, 2015

BUG: pandas-dev#11638 return correct dtype for int and float

24cdf45

jreback closed this as completed Nov 20, 2015

jreback added a commit to jreback/pandas that referenced this issue Nov 20, 2015

COMPAT: compat of scalars on all platforms, xref pandas-dev#11638

b52b7fc

jreback added a commit that referenced this issue Nov 20, 2015

Merge pull request #11662 from jreback/scalar

a3fd834

COMPAT: compat of scalars on all platforms, xref #11638

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataframe column dtype changed from int8 to int64 when setting complete column #11638

Dataframe column dtype changed from int8 to int64 when setting complete column #11638

cpaulik commented Nov 18, 2015

jreback commented Nov 18, 2015

varunkumar-dev commented Nov 19, 2015

jreback commented Nov 19, 2015

varunkumar-dev commented Nov 19, 2015

jreback commented Nov 19, 2015

varunkumar-dev commented Nov 19, 2015

jreback commented Nov 19, 2015

varunkumar-dev commented Nov 19, 2015

jreback commented Nov 19, 2015

jreback commented Nov 20, 2015

cpaulik commented Nov 20, 2015

jreback commented Nov 20, 2015

Dataframe column dtype changed from int8 to int64 when setting complete column #11638

Dataframe column dtype changed from int8 to int64 when setting complete column #11638

Comments

cpaulik commented Nov 18, 2015

jreback commented Nov 18, 2015

varunkumar-dev commented Nov 19, 2015

jreback commented Nov 19, 2015

varunkumar-dev commented Nov 19, 2015

jreback commented Nov 19, 2015

varunkumar-dev commented Nov 19, 2015

jreback commented Nov 19, 2015

varunkumar-dev commented Nov 19, 2015

jreback commented Nov 19, 2015

jreback commented Nov 20, 2015

cpaulik commented Nov 20, 2015

jreback commented Nov 20, 2015