Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support downcasting of nullable dtypes in to_numeric #33013

Closed
jorisvandenbossche opened this issue Mar 25, 2020 · 3 comments · Fixed by #38746
Closed

ENH: Support downcasting of nullable dtypes in to_numeric #33013

jorisvandenbossche opened this issue Mar 25, 2020 · 3 comments · Fixed by #38746
Assignees
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@jorisvandenbossche
Copy link
Member

This currently does not yet work:

In [5]: pd.to_numeric(pd.Series([1, 2, 3], dtype="Int64"), downcast='integer') 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-64a04efaaac0> in <module>
----> 1 pd.to_numeric(pd.Series([1, 2, 3], dtype="Int64"), downcast='integer')

~/scipy/pandas/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    179             for dtype in typecodes:
    180                 if np.dtype(dtype).itemsize <= values.dtype.itemsize:
--> 181                     values = maybe_downcast_to_dtype(values, dtype)
    182 
    183                     # successful conversion

~/scipy/pandas/pandas/core/dtypes/cast.py in maybe_downcast_to_dtype(result, dtype)
    141         dtype = np.dtype(dtype)
    142 
--> 143     converted = maybe_downcast_numeric(result, dtype, do_round)
    144     if converted is not result:
    145         return converted

~/scipy/pandas/pandas/core/dtypes/cast.py in maybe_downcast_numeric(result, dtype, do_round)
    234                     return new_result
    235             else:
--> 236                 if np.allclose(new_result, result, rtol=0):
    237                     return new_result
    238 

<__array_function__ internals> in allclose(*args, **kwargs)

~/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/numeric.py in allclose(a, b, rtol, atol, equal_nan)
   2169 
   2170     """
-> 2171     res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
   2172     return bool(res)
   2173 

<__array_function__ internals> in isclose(*args, **kwargs)

~/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/numeric.py in isclose(a, b, rtol, atol, equal_nan)
   2268 
   2269     xfin = isfinite(x)
-> 2270     yfin = isfinite(y)
   2271     if all(xfin) and all(yfin):
   2272         return within_tol(x, y, atol, rtol)

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I suppose because under the hood a conversion to object dtype array is happening somewhere.

@jorisvandenbossche jorisvandenbossche added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Mar 25, 2020
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Mar 25, 2020
@yixinxiao7
Copy link

take

This was referenced Apr 9, 2020
@jreback jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations labels Apr 10, 2020
@languitar
Copy link

Interestingly, this worked with pandas < version 1.0. I recently upgraded our existing code base from 0.2x to 1.1 and now the existing and unmodified code fails with exactly this error. So, this is a regression:

Before 1.0:

In [7]: pd.to_numeric(pd.Series([42], dtype="Int64"), downcast="integer")
Out[7]: 
0    42
dtype: int8

In [8]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.8.5.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.7.12-arch1-1
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.3
numpy            : 1.19.1
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 47.1.1
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 7.17.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None

After:

In [2]: pd.to_numeric(pd.Series([42], dtype="Int64"), downcast="integer")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-0969c270a2a4> in <module>
----> 1 pd.to_numeric(pd.Series([42], dtype="Int64"), downcast="integer")

/usr/lib/python3.8/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    180             for dtype in typecodes:
    181                 if np.dtype(dtype).itemsize <= values.dtype.itemsize:
--> 182                     values = maybe_downcast_to_dtype(values, dtype)
    183 
    184                     # successful conversion

/usr/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in maybe_downcast_to_dtype(result, dtype)
    149         dtype = np.dtype(dtype)
    150 
--> 151     converted = maybe_downcast_numeric(result, dtype, do_round)
    152     if converted is not result:
    153         return converted

/usr/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in maybe_downcast_numeric(result, dtype, do_round)
    241                     return new_result
    242             else:
--> 243                 if np.allclose(new_result, result, rtol=0):
    244                     return new_result
    245 

<__array_function__ internals> in allclose(*args, **kwargs)

/usr/lib/python3.8/site-packages/numpy/core/numeric.py in allclose(a, b, rtol, atol, equal_nan)
   2187 
   2188     """
-> 2189     res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
   2190     return bool(res)
   2191 

<__array_function__ internals> in isclose(*args, **kwargs)

/usr/lib/python3.8/site-packages/numpy/core/numeric.py in isclose(a, b, rtol, atol, equal_nan)
   2286 
   2287     xfin = isfinite(x)
-> 2288     yfin = isfinite(y)
   2289     if all(xfin) and all(yfin):
   2290         return within_tol(x, y, atol, rtol)

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

In [3]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : d9fff2792bf16178d4e450fe7384244e50635733
python           : 3.8.5.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.7.12-arch1-1
Version          : #1 SMP PREEMPT Fri, 31 Jul 2020 17:38:22 +0000
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.0
numpy            : 1.19.1
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 49.6.0
Cython           : 0.29.21
pytest           : 6.0.1
hypothesis       : 5.25.0
sphinx           : 3.2.0
blosc            : None
feather          : None
xlsxwriter       : 1.3.2
lxml.etree       : 4.5.2
html5lib         : 1.1
pymysql          : None
psycopg2         : 2.8.5 (dt dec pq3 ext lo64)
jinja2           : 2.11.2
IPython          : 7.17.0
pandas_datareader: 0.10.0dev0
bs4              : 4.9.1
bottleneck       : 1.3.2
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.1
numexpr          : 2.7.1
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.2
sqlalchemy       : 1.3.18
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

@jorisvandenbossche
Copy link
Member Author

Note that it didn't fully work (but didn't raise an error, for sure), as the resulting type is not a nullable dtype anymore. And when missing values were present, no downcasting happened at all:

In [2]: pd.to_numeric(pd.Series([42, None], dtype="Int64"), downcast="integer")   
Out[2]: 
0     42
1    NaN
dtype: Int64

@arw2019 arw2019 assigned arw2019 and unassigned yixinxiao7 Nov 18, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants