Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_first throws ValueError: Cannot convert NA to integer #14687

Closed
Dmitrii-I opened this issue Nov 18, 2016 · 1 comment · Fixed by #14886
Closed

combine_first throws ValueError: Cannot convert NA to integer #14687

Dmitrii-I opened this issue Nov 18, 2016 · 1 comment · Fixed by #14886
Labels
Bug Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@Dmitrii-I
Copy link

Dmitrii-I commented Nov 18, 2016

I do not understand why there is a need to convert NA to integer if the result does not have NAs. Perhaps the combine_first algo needs to do it under the hood?

A small, complete example of the issue

from pandas import DataFrame
DataFrame({'a': [0, 1, 3, 5]}).combine_first(DataFrame({'a': [1, 4]}))

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-16-12b973b1b150>", line 1, in <module>
    pd.DataFrame({'a': [0, 1, 3, 5]}).combine_first(pd.DataFrame({'a': [1, 4]}))
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py", line 3787, in combine_first
    return self.combine(other, combiner, overwrite=False)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py", line 3714, in combine
    otherSeries = otherSeries.astype(new_dtype)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py", line 3054, in astype
    raise_on_error=raise_on_error, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 3168, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 3035, in apply
    applied = getattr(b, f)(**kwargs)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 462, in astype
    values=values, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 505, in _astype
    values = _astype_nansafe(values.ravel(), dtype, copy=True)
  File "/usr/local/lib/python3.4/dist-packages/pandas/types/cast.py", line 531, in _astype_nansafe
    raise ValueError('Cannot convert NA to integer')
ValueError: Cannot convert NA to integer

Expected Output

   a
0  0
1  1
2  3
3  5

It does work when at least one item is a float:

DataFrame({'a': [0.0, 1, 3, 5]}).combine_first(DataFrame({'a': [1, 4]}))

     a
0  0.0
1  1.0
2  3.0
3  5.0

I am aware that integer series cannot have NAs but there is no need to introduce NAs here. I do like it that the series is not upcasted to float silently though.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-66-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: None
pip: 1.5.4
setuptools: 3.3
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.9.2
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

@jorisvandenbossche jorisvandenbossche added Bug Regression Functionality that used to work in a prior pandas version labels Nov 18, 2016
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 18, 2016

This seems to be a regression from 0.18, as this worked before:

In [1]: DataFrame({'a': [0, 1, 3, 5]}).combine_first(DataFrame({'a': [1, 4]}))
Out[1]: 
   a
0  0
1  1
2  3
3  5

In [2]: pd.__version__
Out[2]: u'0.18.1'

@Dmitrii-I Thanks for the report! Always welcome to look into what could have caused this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants