New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe column dtype changed from int8 to int64 when setting complete column #11638

Closed
cpaulik opened this Issue Nov 18, 2015 · 12 comments

Comments

Projects
None yet
3 participants
@cpaulik
Contributor

cpaulik commented Nov 18, 2015

The following example should explain:

Python 2.7.10 |Continuum Analytics, Inc.| (default, Oct 19 2015, 18:04:42) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-52-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: None
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None

In [4]: df = pd.DataFrame({'one': np.full(10, 0, dtype=np.int8)})

In [5]: df
Out[5]: 
   one
0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    0

In [6]: df.dtypes
Out[6]: 
one    int8
dtype: object

In [7]: df.loc[1, 'one'] = 6

In [8]: df
Out[8]: 
   one
0    0
1    6
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    0

In [9]: df.dtypes
Out[9]: 
one    int8
dtype: object

In [10]: df.one = np.int8(7)

In [11]: df.dtypes
Out[11]: 
one    int64
dtype: object

In [12]: df
Out[12]: 
   one
0    7
1    7
2    7
3    7
4    7
5    7
6    7
7    7
8    7
9    7

So it is cast to the correct dtype if a slice of the column is changed but setting the whole column changes the dtype even when explicitly set to np.int8

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 18, 2015

hmm, I recall seeing almost the same issue, but can't locate ATM. yep, looks buggy. pull-requests to fix are welcome.

@jreback jreback added this to the Next Major Release milestone Nov 18, 2015

@varun-kr

This comment has been minimized.

Contributor

varun-kr commented Nov 19, 2015

@jreback In comman.py
it is upcasting int8 to int64.

 # provide implicity upcast on scalars
    elif is_integer(val):
        dtype = np.int64

    elif is_float(val):
        dtype = np.float64

if there is no specific requirement for upcasting, then I can do a PR .
My proposed solution is using np.issubdtype inside is_integer and is_float function to provide support for all kind of int and float types.
Please suggest .

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 19, 2015

this is actually tricky. do lib.isscalar on the val first, if its a scalar, then use the defaults, else you should 'assume' its a zero-dim scalar and do val.dtype

alternatively you can do

if isinstance(val, ndarray):
    dtype = val.dtype
else:
    dtype = np.int64

etc (this might be better)

@varun-kr

This comment has been minimized.

Contributor

varun-kr commented Nov 19, 2015

We are calling _infer_dtype_from_scalar(val) which is already doing this job. How does it solve the problem ? Am I missing something ?

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 19, 2015

right......so must be someplace else then (because _infer_dtype_from_scalar is doing the correct job)

@varun-kr

This comment has been minimized.

Contributor

varun-kr commented Nov 19, 2015

As pointed earlier , how about modifying _infer_dtype_from_scalar(val) and use np.issubdtype inside is_integer(val) and is_float(val) function to provide support for all kind of int and float types ? I am assuming that this will generate the same error for float types as well.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 19, 2015

how would that help?
these are already caught above

@varun-kr

This comment has been minimized.

Contributor

varun-kr commented Nov 19, 2015

If we modify integer and float conditions in _infer_dtype_from_scalar(val) like this

elif is_integer(val):
        if isinstance(val, int):
            dtype = np.int64
        else:
            dtype = type(val)

elif is_float(val):
        if isinstance(val, float):
            dtype = np.float64
        else:
            dtype = type(val)

It will resolve the discrepancy without breaking anything. Please suggest .

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 19, 2015

ahh, the problem is this:

In [1]: x = np.int8(7)

In [2]: isinstance(x, np.ndarray)
Out[2]: False

In [9]: pd.core.common.is_integer(x)
Out[9]: 1

you can try that change that you are suggesting above and see what breaks (and of course add a test for this behavior).

It looks like it should work.

@jreback jreback modified the milestones: 0.17.1, Next Major Release Nov 19, 2015

varun-kr added a commit to varun-kr/pandas that referenced this issue Nov 19, 2015

jreback added a commit to jreback/pandas that referenced this issue Nov 20, 2015

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 20, 2015

closed by #11644

@jreback jreback closed this Nov 20, 2015

jreback added a commit to jreback/pandas that referenced this issue Nov 20, 2015

@cpaulik

This comment has been minimized.

Contributor

cpaulik commented Nov 20, 2015

Wow, thank you for the quick fix. I was going to try but got lost in the pandas internals. Maybe next time 😄

jreback added a commit that referenced this issue Nov 20, 2015

Merge pull request #11662 from jreback/scalar
COMPAT: compat of scalars on all platforms, xref #11638
@jreback

This comment has been minimized.

Contributor

jreback commented Nov 20, 2015

thank @varun-kr !

yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 3, 2015

Merge tag 'v0.17.1' into debian
Version 0.17.1

* tag 'v0.17.1': (168 commits)
  add nbviewer link
  Revert "DOC: fix sponsor notice"
  DOC: a few touchups
  DOC: fix sponsor notice
  DOC: warnings and remove HTML
  COMPAT: compat of scalars on all platforms, xref pandas-dev#11638
  DOC: fix build errors/warnings
  DOC: whatsnew edits
  DOC: fix link syntax
  DOC: update release.rst / whatsnew edits
  BUG: fix col iteration in DataFrame.round, pandas-dev#11611
  DOC: Clarify foramtting
  BUG: pandas-dev#11638 return correct dtype for int and float
  BUG: pandas-dev#11637 fix to_csv incorrect output.
  DOC: sponsor notice
  BUG: indexing with a range , pandas-dev#11652
  Fix link to numexpr
  ENH: fixup tilde expansion, xref pandas-dev#11438
  ENH: tilde expansion for write output formatting functions, pandas-dev#11438
  DOC: fix up doc-string creations in generic.py
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment