Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.to_numeric doesn't return numeric dtype #47408

Open
2 of 3 tasks
quentinblampey opened this issue Jun 17, 2022 · 5 comments
Open
2 of 3 tasks

BUG: pd.to_numeric doesn't return numeric dtype #47408

quentinblampey opened this issue Jun 17, 2022 · 5 comments
Assignees
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions

Comments

@quentinblampey
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

s = pd.Series([3, np.array(2.1), "a"])
print(pd.to_numeric(s, errors="coerce"))

# 0      3
# 1    2.1
# 2      a
# dtype: object

Issue Description

pd.to_numeric returned a Series with dtype object while it is written in the doc that it should return a float or int dtype. Also, "a" was not even converted to NaN.

I know that having a np.array() value inside a Series is uncommon. I was actually trying to validate the inputs that users of my package may provide and I wanted to make sure I get a numeric dtype, but I found out that pd.to_numeric is actually not completely bulletproof.

Note that it works as expected with pd.Series([3, np.array(2)]) and pd.Series([3, "a"]) though.

Expected Behavior

It should return a Series with a numeric dtype, i.e. either the following if pandas consider that np.array(2.1) can be converted into a number

0    3.0
1    2.1
2    NaN
dtype: float64

Or this otherwise

0    3.0
1    NaN
2    NaN
dtype: float64

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.9.7.final.0
python-bits : 64
OS : Darwin
OS-release : 21.5.0
Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.4.2
numpy : 1.21.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3
setuptools : 58.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : 4.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.30.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli :
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : 2.0.1
matplotlib : 3.4.3
numba : 0.55.1
numexpr : 2.8.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
snappy : None
sqlalchemy : None
tables : 3.7.0
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@quentinblampey quentinblampey added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 17, 2022
@dmashley
Copy link

This problem is in the function maybe_convert_numeric

File "pandas_libs\lib.pyx", line 2291, in pandas._libs.lib.maybe_convert_numeric
TypeError: len() of unsized object

@phofl
Copy link
Member

phofl commented Jun 18, 2022

Hi, thanks for your report.

It does not work as expected, when you remove the "a". Calling np.array(2.1) creates a 0-dimensional numpy array, that seems to not support len().

@ivan-afonichkin
Copy link

Hello, I'm a newcomer and would like to take this as my first issue if that's OK :).

@ivan-afonichkin
Copy link

take

@simonjayhawkins simonjayhawkins added Numeric Operations Arithmetic, Comparison, and Logical operations and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 20, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jun 20, 2022
@simonjayhawkins
Copy link
Member

Hello, I'm a newcomer and would like to take this as my first issue if that's OK :).

go for it.

It appears that the issue could be with some compiled code. If you're comfortable with that, great.

otherwise feel free to look for issues labelled "good first issue" or bug reports that reference the Python code.

@simonjayhawkins simonjayhawkins added Dtype Conversions Unexpected or buggy dtype conversions and removed Numeric Operations Arithmetic, Comparison, and Logical operations labels Jun 20, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

6 participants