New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different precision calling .astype(str) on float numbers #11302

Closed
marcomayer opened this Issue Oct 12, 2015 · 18 comments

Comments

Projects
None yet
2 participants
@marcomayer

marcomayer commented Oct 12, 2015

With pandas 0.16.2:

import pandas as pd
pd.DataFrame([1.12345678901234567890]).astype(str)
0
0 1.12345678901

With pandas 0.17:

import pandas as pd
pd.DataFrame([1.12345678901234567890]).astype(str)
0
0 1.1234567890123457

I read the 0.17 release log but couldn't figure out why that is. Is it a bug or a new feature, and if it's a new feature how can I re-activate the old behavior?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 12, 2015

Contributor

what version of numpy?

Contributor

jreback commented Oct 12, 2015

what version of numpy?

@marcomayer

This comment has been minimized.

Show comment
Hide comment
@marcomayer

marcomayer Oct 12, 2015

numpy 1.10.0

marcomayer commented Oct 12, 2015

numpy 1.10.0

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 12, 2015

Contributor

in both cases?

Contributor

jreback commented Oct 12, 2015

in both cases?

@marcomayer

This comment has been minimized.

Show comment
Hide comment
@marcomayer

marcomayer Oct 12, 2015

in both cases yes. I updated with conda update pandas, which also updated numpy. Then I downgraded pandas with conda install pandas=0.16.2 and it worked again.

marcomayer commented Oct 12, 2015

in both cases yes. I updated with conda update pandas, which also updated numpy. Then I downgraded pandas with conda install pandas=0.16.2 and it worked again.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 12, 2015

Contributor

this might be just a printing thing eg the display.precision changed in 0.17.0

Contributor

jreback commented Oct 12, 2015

this might be just a printing thing eg the display.precision changed in 0.17.0

@marcomayer

This comment has been minimized.

Show comment
Hide comment
@marcomayer

marcomayer Oct 12, 2015

0.16.2:

pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.12345678901'}}

0.17:

pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.1234567890123457'}}

marcomayer commented Oct 12, 2015

0.16.2:

pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.12345678901'}}

0.17:

pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.1234567890123457'}}

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 12, 2015

Contributor

no see if the actual numbers are th same

eg df.at[0,0]

Contributor

jreback commented Oct 12, 2015

no see if the actual numbers are th same

eg df.at[0,0]

@marcomayer

This comment has been minimized.

Show comment
Hide comment
@marcomayer

marcomayer Oct 12, 2015

0.16.2:

pd.DataFrame([1.12345678901234567890]).at[0,0]
1.1234567890123457
pd.DataFrame([1.12345678901234567890]).astype(str).at[0,0]
'1.1234567890123457'

0.17:

pd.DataFrame([1.12345678901234567890]).at[0,0]
1.1234567890123457
pd.DataFrame([1.12345678901234567890]).astype(str).at[0,0]
'1.12345678901'

marcomayer commented Oct 12, 2015

0.16.2:

pd.DataFrame([1.12345678901234567890]).at[0,0]
1.1234567890123457
pd.DataFrame([1.12345678901234567890]).astype(str).at[0,0]
'1.1234567890123457'

0.17:

pd.DataFrame([1.12345678901234567890]).at[0,0]
1.1234567890123457
pd.DataFrame([1.12345678901234567890]).astype(str).at[0,0]
'1.12345678901'

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 12, 2015

Contributor

0.16.2

In [2]: pd.__version__
Out[2]: '0.16.2'

In [3]: np.__version__
Out[3]: '1.10.0'

In [4]: pd.DataFrame([1.12345678901234567890]).astype(str)
Out[4]: 
               0
0  1.12345678901

0.17.0

In [1]: pd.__version__
Out[1]: u'0.17.0'

In [2]: np.__version__
Out[2]: '1.10.0'

In [3]: pd.DataFrame([1.12345678901234567890]).astype(str)
Out[3]: 
               0
0  1.12345678901

This is python 2.7 on macosx. pls be more specific about python/os

Contributor

jreback commented Oct 12, 2015

0.16.2

In [2]: pd.__version__
Out[2]: '0.16.2'

In [3]: np.__version__
Out[3]: '1.10.0'

In [4]: pd.DataFrame([1.12345678901234567890]).astype(str)
Out[4]: 
               0
0  1.12345678901

0.17.0

In [1]: pd.__version__
Out[1]: u'0.17.0'

In [2]: np.__version__
Out[2]: '1.10.0'

In [3]: pd.DataFrame([1.12345678901234567890]).astype(str)
Out[3]: 
               0
0  1.12345678901

This is python 2.7 on macosx. pls be more specific about python/os

@marcomayer

This comment has been minimized.

Show comment
Hide comment
@marcomayer

marcomayer Oct 12, 2015

do you get the same when using .to_dict()?

Also I used the python console instead of ipython/notebook to make sure it's not a display issue cause by ipython.

I'm running Python 3.4.3 :: Anaconda 2.3.0 (x86_64) on macosx.

marcomayer commented Oct 12, 2015

do you get the same when using .to_dict()?

Also I used the python console instead of ipython/notebook to make sure it's not a display issue cause by ipython.

I'm running Python 3.4.3 :: Anaconda 2.3.0 (x86_64) on macosx.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 12, 2015

Contributor
Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.16.2'
>>> import numpy as np
>>> np.__version__
'1.10.1'
>>> pd.DataFrame([1.12345678901234567890]).astype(str)
               0
0  1.12345678901
>>> pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.12345678901'}}
>>> quit()

(py3.4_1)bash-3.2$ source deactivate
discarding /Users/jreback/miniconda/envs/py3.4_1/bin from PATH
bash-3.2$ source activate py3.4_2
discarding /Users/jreback/miniconda/bin from PATH
prepending /Users/jreback/miniconda/envs/py3.4_2/bin to PATH
(py3.4_2)bash-3.2$ python
Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.17.0'
>>> np.__version__
'1.10.1'
>>> pd.DataFrame([1.12345678901234567890]).astype(str)
                    0
0  1.1234567890123457
>>> pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.1234567890123457'}}

(numpy 1.10.1 just released, but doesn't have anything to do with this)

Contributor

jreback commented Oct 12, 2015

Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.16.2'
>>> import numpy as np
>>> np.__version__
'1.10.1'
>>> pd.DataFrame([1.12345678901234567890]).astype(str)
               0
0  1.12345678901
>>> pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.12345678901'}}
>>> quit()

(py3.4_1)bash-3.2$ source deactivate
discarding /Users/jreback/miniconda/envs/py3.4_1/bin from PATH
bash-3.2$ source activate py3.4_2
discarding /Users/jreback/miniconda/bin from PATH
prepending /Users/jreback/miniconda/envs/py3.4_2/bin to PATH
(py3.4_2)bash-3.2$ python
Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.17.0'
>>> np.__version__
'1.10.1'
>>> pd.DataFrame([1.12345678901234567890]).astype(str)
                    0
0  1.1234567890123457
>>> pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.1234567890123457'}}

(numpy 1.10.1 just released, but doesn't have anything to do with this)

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 12, 2015

Contributor

so this is just on py3 looks like.

Contributor

jreback commented Oct 12, 2015

so this is just on py3 looks like.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 12, 2015

Contributor

so this goes thru a slightly different path that in 0.16.2 but not really sure why this would have changed.

I'll mark it as a bug, though odd that you actually rely on this behavior?

Contributor

jreback commented Oct 12, 2015

so this goes thru a slightly different path that in 0.16.2 but not really sure why this would have changed.

I'll mark it as a bug, though odd that you actually rely on this behavior?

@jreback jreback added this to the 0.17.1 milestone Oct 12, 2015

@marcomayer

This comment has been minimized.

Show comment
Hide comment
@marcomayer

marcomayer Oct 13, 2015

thank you. I'm not sure about the "output-formatting" label though, isn't this more of a type-conversion/casting issue (float to str)?

I rely on astype(str) for two things:

  • To cast decimal.Decimal types to strings to then save them in HD5 files which is faster than having HD5 save it as non-optimized objects (at least it was so in the past). This still works though, the issue only appears when using floats.
  • I've build hundreds of unittests that take DFs and use astype(str).to_dict() to then pickle the dicts to files. When the unittest is run I load those pickles and compare the contents of each DF. Probably there is a better way to do this but that's what I came up with at some point. Because of this I had also issues with the new date format since it prints differently but that was documented in the release notes so I could adjust them by doing data['date'] = pd.to_datetime(data.date).map(lambda x: str(x.to_datetime64()).replace('NaT','nan')). Now once I would have verified that the results are fine I'll be able to rewrite the pickle files without those converting but I first have to make sure no number at whatever decimal place is different (or figure out and understand why it is).

So I'll try now to find a way to make it through the unittests with 0.17 since I'd like to update due to the new features/optimizations. If you have an idea for a quick workaround let me know...

marcomayer commented Oct 13, 2015

thank you. I'm not sure about the "output-formatting" label though, isn't this more of a type-conversion/casting issue (float to str)?

I rely on astype(str) for two things:

  • To cast decimal.Decimal types to strings to then save them in HD5 files which is faster than having HD5 save it as non-optimized objects (at least it was so in the past). This still works though, the issue only appears when using floats.
  • I've build hundreds of unittests that take DFs and use astype(str).to_dict() to then pickle the dicts to files. When the unittest is run I load those pickles and compare the contents of each DF. Probably there is a better way to do this but that's what I came up with at some point. Because of this I had also issues with the new date format since it prints differently but that was documented in the release notes so I could adjust them by doing data['date'] = pd.to_datetime(data.date).map(lambda x: str(x.to_datetime64()).replace('NaT','nan')). Now once I would have verified that the results are fine I'll be able to rewrite the pickle files without those converting but I first have to make sure no number at whatever decimal place is different (or figure out and understand why it is).

So I'll try now to find a way to make it through the unittests with 0.17 since I'd like to update due to the new features/optimizations. If you have an idea for a quick workaround let me know...

@marcomayer

This comment has been minimized.

Show comment
Hide comment
@marcomayer

marcomayer Oct 13, 2015

Regarding a workaround, this helps me for now to get through the unit-tests:

df.applymap(lambda x: str(x)).to_dict() instead of df.astype(str).to_dict()

Another difference I noticed is when np.NaN is converted to strings:

pd.version
'0.16.2'
np.version
'1.10.1'
pd.DataFrame([np.NaN]).astype(str).to_dict()
{0: {0: 'nan'}}

pd.version
'0.17.0'
np.version
'1.10.1'
pd.DataFrame([np.NaN]).astype(str).to_dict()
{0: {0: ''}}

To be honest I wonder if it wouldn't be a good idea to get the same results with astype(str) as with the standard python str() function? For me there's a significant difference between an empty string and np.NaN.

marcomayer commented Oct 13, 2015

Regarding a workaround, this helps me for now to get through the unit-tests:

df.applymap(lambda x: str(x)).to_dict() instead of df.astype(str).to_dict()

Another difference I noticed is when np.NaN is converted to strings:

pd.version
'0.16.2'
np.version
'1.10.1'
pd.DataFrame([np.NaN]).astype(str).to_dict()
{0: {0: 'nan'}}

pd.version
'0.17.0'
np.version
'1.10.1'
pd.DataFrame([np.NaN]).astype(str).to_dict()
{0: {0: ''}}

To be honest I wonder if it wouldn't be a good idea to get the same results with astype(str) as with the standard python str() function? For me there's a significant difference between an empty string and np.NaN.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 13, 2015

Contributor

@marcomayer ok, should be fixed in #11309

a better way to compare things is just to use np.allclose (or array_equivalent).
converting to string to compare is not generally a good idea

Contributor

jreback commented Oct 13, 2015

@marcomayer ok, should be fixed in #11309

a better way to compare things is just to use np.allclose (or array_equivalent).
converting to string to compare is not generally a good idea

@marcomayer

This comment has been minimized.

Show comment
Hide comment
@marcomayer

marcomayer Oct 13, 2015

that fixed it for me! thanks a lot! I'll also consider np.allclose() for the future.

Marco

marcomayer commented Oct 13, 2015

that fixed it for me! thanks a lot! I'll also consider np.allclose() for the future.

Marco

@marcomayer marcomayer closed this Oct 13, 2015

jreback added a commit that referenced this issue Oct 13, 2015

Merge pull request #11309 from jreback/astype
REGR: change in output formatting for long floats/nan, #11302
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment