Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe column dtype changed from int8 to int64 when setting complete column #11638

Closed
cpaulik opened this issue Nov 18, 2015 · 12 comments
Closed
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@cpaulik
Copy link

cpaulik commented Nov 18, 2015

The following example should explain:

Python 2.7.10 |Continuum Analytics, Inc.| (default, Oct 19 2015, 18:04:42) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-52-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: None
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None

In [4]: df = pd.DataFrame({'one': np.full(10, 0, dtype=np.int8)})

In [5]: df
Out[5]: 
   one
0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    0

In [6]: df.dtypes
Out[6]: 
one    int8
dtype: object

In [7]: df.loc[1, 'one'] = 6

In [8]: df
Out[8]: 
   one
0    0
1    6
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    0

In [9]: df.dtypes
Out[9]: 
one    int8
dtype: object

In [10]: df.one = np.int8(7)

In [11]: df.dtypes
Out[11]: 
one    int64
dtype: object

In [12]: df
Out[12]: 
   one
0    7
1    7
2    7
3    7
4    7
5    7
6    7
7    7
8    7
9    7

So it is cast to the correct dtype if a slice of the column is changed but setting the whole column changes the dtype even when explicitly set to np.int8

@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

hmm, I recall seeing almost the same issue, but can't locate ATM. yep, looks buggy. pull-requests to fix are welcome.

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Nov 18, 2015
@jreback jreback added this to the Next Major Release milestone Nov 18, 2015
@varunkumar-dev
Copy link
Contributor

@jreback In comman.py
it is upcasting int8 to int64.

 # provide implicity upcast on scalars
    elif is_integer(val):
        dtype = np.int64

    elif is_float(val):
        dtype = np.float64

if there is no specific requirement for upcasting, then I can do a PR .
My proposed solution is using np.issubdtype inside is_integer and is_float function to provide support for all kind of int and float types.
Please suggest .

@jreback
Copy link
Contributor

jreback commented Nov 19, 2015

this is actually tricky. do lib.isscalar on the val first, if its a scalar, then use the defaults, else you should 'assume' its a zero-dim scalar and do val.dtype

alternatively you can do

if isinstance(val, ndarray):
    dtype = val.dtype
else:
    dtype = np.int64

etc (this might be better)

@varunkumar-dev
Copy link
Contributor

We are calling _infer_dtype_from_scalar(val) which is already doing this job. How does it solve the problem ? Am I missing something ?

@jreback
Copy link
Contributor

jreback commented Nov 19, 2015

right......so must be someplace else then (because _infer_dtype_from_scalar is doing the correct job)

@varunkumar-dev
Copy link
Contributor

As pointed earlier , how about modifying _infer_dtype_from_scalar(val) and use np.issubdtype inside is_integer(val) and is_float(val) function to provide support for all kind of int and float types ? I am assuming that this will generate the same error for float types as well.

@jreback
Copy link
Contributor

jreback commented Nov 19, 2015

how would that help?
these are already caught above

@varunkumar-dev
Copy link
Contributor

If we modify integer and float conditions in _infer_dtype_from_scalar(val) like this

elif is_integer(val):
        if isinstance(val, int):
            dtype = np.int64
        else:
            dtype = type(val)

elif is_float(val):
        if isinstance(val, float):
            dtype = np.float64
        else:
            dtype = type(val)

It will resolve the discrepancy without breaking anything. Please suggest .

@jreback
Copy link
Contributor

jreback commented Nov 19, 2015

ahh, the problem is this:

In [1]: x = np.int8(7)

In [2]: isinstance(x, np.ndarray)
Out[2]: False

In [9]: pd.core.common.is_integer(x)
Out[9]: 1

you can try that change that you are suggesting above and see what breaks (and of course add a test for this behavior).

It looks like it should work.

@jreback jreback modified the milestones: 0.17.1, Next Major Release Nov 19, 2015
varunkumar-dev added a commit to varunkumar-dev/pandas that referenced this issue Nov 19, 2015
jreback pushed a commit to jreback/pandas that referenced this issue Nov 20, 2015
@jreback
Copy link
Contributor

jreback commented Nov 20, 2015

closed by #11644

@jreback jreback closed this as completed Nov 20, 2015
jreback added a commit to jreback/pandas that referenced this issue Nov 20, 2015
@cpaulik
Copy link
Author

cpaulik commented Nov 20, 2015

Wow, thank you for the quick fix. I was going to try but got lost in the pandas internals. Maybe next time 😄

jreback added a commit that referenced this issue Nov 20, 2015
COMPAT: compat of scalars on all platforms, xref #11638
@jreback
Copy link
Contributor

jreback commented Nov 20, 2015

thank @varun-kr !

yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 3, 2015
Version 0.17.1

* tag 'v0.17.1': (168 commits)
  add nbviewer link
  Revert "DOC: fix sponsor notice"
  DOC: a few touchups
  DOC: fix sponsor notice
  DOC: warnings and remove HTML
  COMPAT: compat of scalars on all platforms, xref pandas-dev#11638
  DOC: fix build errors/warnings
  DOC: whatsnew edits
  DOC: fix link syntax
  DOC: update release.rst / whatsnew edits
  BUG: fix col iteration in DataFrame.round, pandas-dev#11611
  DOC: Clarify foramtting
  BUG: pandas-dev#11638 return correct dtype for int and float
  BUG: pandas-dev#11637 fix to_csv incorrect output.
  DOC: sponsor notice
  BUG: indexing with a range , pandas-dev#11652
  Fix link to numexpr
  ENH: fixup tilde expansion, xref pandas-dev#11438
  ENH: tilde expansion for write output formatting functions, pandas-dev#11438
  DOC: fix up doc-string creations in generic.py
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants