combine_first throws ValueError: Cannot convert NA to integer #14687

Closed
Dmitrii-I opened this Issue Nov 18, 2016 · 1 comment

Comments

Projects
None yet
2 participants

Dmitrii-I commented Nov 18, 2016 edited by jorisvandenbossche

I do not understand why there is a need to convert NA to integer if the result does not have NAs. Perhaps the combine_first algo needs to do it under the hood?

A small, complete example of the issue

from pandas import DataFrame
DataFrame({'a': [0, 1, 3, 5]}).combine_first(DataFrame({'a': [1, 4]}))

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-16-12b973b1b150>", line 1, in <module>
    pd.DataFrame({'a': [0, 1, 3, 5]}).combine_first(pd.DataFrame({'a': [1, 4]}))
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py", line 3787, in combine_first
    return self.combine(other, combiner, overwrite=False)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py", line 3714, in combine
    otherSeries = otherSeries.astype(new_dtype)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py", line 3054, in astype
    raise_on_error=raise_on_error, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 3168, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 3035, in apply
    applied = getattr(b, f)(**kwargs)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 462, in astype
    values=values, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 505, in _astype
    values = _astype_nansafe(values.ravel(), dtype, copy=True)
  File "/usr/local/lib/python3.4/dist-packages/pandas/types/cast.py", line 531, in _astype_nansafe
    raise ValueError('Cannot convert NA to integer')
ValueError: Cannot convert NA to integer

Expected Output

   a
0  0
1  1
2  3
3  5

It does work when at least one item is a float:

DataFrame({'a': [0.0, 1, 3, 5]}).combine_first(DataFrame({'a': [1, 4]}))

     a
0  0.0
1  1.0
2  3.0
3  5.0

I am aware that integer series cannot have NAs but there is no need to introduce NAs here. I do like it that the series is not upcasted to float silently though.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-66-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: None
pip: 1.5.4
setuptools: 3.3
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.9.2
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

Owner

jorisvandenbossche commented Nov 18, 2016 edited

This seems to be a regression from 0.18, as this worked before:

In [1]: DataFrame({'a': [0, 1, 3, 5]}).combine_first(DataFrame({'a': [1, 4]}))
Out[1]: 
   a
0  0
1  1
2  3
3  5

In [2]: pd.__version__
Out[2]: u'0.18.1'

@Dmitrii-I Thanks for the report! Always welcome to look into what could have caused this change.

jorisvandenbossche added this to the 0.19.2 milestone Nov 18, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment