Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: conversion of float32 to string shows too much precision #36451

Closed
jorisvandenbossche opened this issue Sep 18, 2020 · 4 comments · Fixed by #36464
Closed

BUG: conversion of float32 to string shows too much precision #36451

jorisvandenbossche opened this issue Sep 18, 2020 · 4 comments · Fixed by #36464
Labels
Bug Regression Functionality that used to work in a prior pandas version Strings String extension data type and string data
Milestone

Comments

@jorisvandenbossche
Copy link
Member

On master (but also on 1.1):

In [4]: pd.Series([0.1], dtype="float64").astype("string")
Out[4]: 
0    0.1
dtype: string

In [5]: pd.Series([0.1], dtype="float32").astype("string")
Out[5]: 
0    0.10000000149011612
dtype: string

When converting to the object-dtype string, it works as expected:

In [6]: pd.Series([0.1], dtype="float64").astype("str")
Out[6]: 
0    0.1
dtype: object

In [7]: pd.Series([0.1], dtype="float32").astype("str")
Out[7]: 
0    0.1
dtype: object

cc @topper-123

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Sep 18, 2020

Correction: the above behaviour is on released 1.1.0, and on 1.1.2 and master we have this bug for astype(str) as well:

In [1]: pd.Series([0.1], dtype="float64").astype("string")  
Out[1]: 
0    0.1
dtype: string

In [2]: pd.Series([0.1], dtype="float32").astype("string")
Out[2]: 
0    0.10000000149011612
dtype: string

In [3]: pd.Series([0.1], dtype="float64").astype("str")
Out[3]: 
0    0.1
dtype: object

In [4]: pd.Series([0.1], dtype="float32").astype("str") 
Out[4]: 
0    0.10000000149011612
dtype: object

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Strings String extension data type and string data labels Sep 18, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.2 milestone Sep 18, 2020
@jreback jreback modified the milestones: 1.2, 1.1.3 Sep 19, 2020
@dsaxton
Copy link
Member

dsaxton commented Sep 19, 2020

ref #35519 which I think caused the regression for str

@jorisvandenbossche jorisvandenbossche modified the milestones: 1.1.3, 1.2 Sep 19, 2020
@topper-123
Copy link
Contributor

topper-123 commented Sep 19, 2020

This is a bit odd, seems like a straight str conversion (as is done in #35519) should work right:

>>> str(np.float64(0.1))
0.1  # ok
str(np.float32(0.1))
0.1  # ok

But I see the function added in #35519 gives a different string:

>>> f64 = np.array([0.1], dtype="float64")
>>> pd._libs.lib.ensure_string_array(f64)
array(['0.1'], dtype=object)  # ok
>>> f32 = np.array([0.1], dtype="float32")
>>> pd._libs.lib.ensure_string_array(f32)
array(['0.10000000149011612'], dtype=object)  # not ok

so something in ensure_string_array. I'll look into it.

EDIT: I can see @dsaxton has already figured it out and pushed a PR, Thanks!

@jorisvandenbossche
Copy link
Member Author

Ah, so the difference is:

In [38]: str(np.float32(0.1))                                                                                                                                                                                      
Out[38]: '0.1'

In [39]: str(float(np.float32(0.1)))                                                                                                                                                                               
Out[39]: '0.10000000149011612'

(because converting to object dtype converts to builtin float type)

So I suppose because float32 has less precision, the stdlib float shows those decimals. While the numpy.float32 behaves more as we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants