pd.to_numeric produces misleading results on DataFrame #11776

Closed
mortada opened this Issue Dec 6, 2015 · 2 comments

Comments

Projects
None yet
2 participants
Contributor

mortada commented Dec 6, 2015

when pd.to_numeric is called with errors='coerce' on a DataFrame, it doesn't raise and just returns the original DataFrame.

This may be related to the discussion here pydata#11221 as this function currently doesn't support anything more than 1-d.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [1, 2, 'foo'], 'b': [2.3, -1, 'bar']})

In [3]: df
Out[3]:
     a    b
0    1  2.3
1    2   -1
2  foo  bar

In [4]: pd.to_numeric(df)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-9febd95a7c0a> in <module>()
----> 1 pd.to_numeric(df)

/Users/mortada_mehyar/code/github/pandas/pandas/tools/util.py in to_numeric(arg, errors)
     94         conv = lib.maybe_convert_numeric(arg,
     95                                          set(),
---> 96                                          coerce_numeric=coerce_numeric)
     97     except:
     98         if errors == 'raise':

/Users/mortada_mehyar/code/github/pandas/pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:52369)()
    518 cdef int64_t iINT64_MIN = <int64_t> INT64_MIN
    519
--> 520 def maybe_convert_numeric(object[:] values, set na_values,
    521                           bint convert_empty=True, bint coerce_numeric=False):
    522     '''

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

In [5]: pd.to_numeric(df, errors='coerce')
Out[5]:
     a    b
0    1  2.3
1    2   -1
2  foo  bar

Note that the last expression doesn't raise but the previous one does.

Seems like we should either

  1. make pd.to_numeric work with DataFrame or NDFrame in general
  2. simply raise here too if a DataFrame or something more than 1-d is passed
Contributor

jreback commented Dec 6, 2015

best to raise for non 1-d
(and check pd.to_datetime/to_timedelta) for the same

jreback added this to the 0.18.0 milestone Dec 6, 2015

Contributor

mortada commented Dec 7, 2015

sounds good, I'll send a PR

jreback closed this in #11780 Dec 10, 2015

@jreback jreback added a commit that referenced this issue Dec 10, 2015

@jreback jreback Merge pull request #11780 from mortada/to_numeric_should_raise_on_df
BUG: to_numeric should raise if input is more than one dimension #11776
fbb09f4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment