-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use shorter float repr when possible #45921
Comments
The current float repr() always calculates the 17 first digits of the This patch implements an algorithm for finding the shortest string that The patch also adds a test case, which takes a long list of floating |
I like this; but I don't have time for a complete thourough review. If Tim has no time, I propose that if it works correctly without leaks Crys, can you test this on Windows and see if it needs any project file |
Applied in r59457 I had to add checks for _M_X64 and _M_IA64 to doubledigits.c. The rest |
Noam, perhaps you can help with this? We checked this in but found a |
I don't know, for me it works fine, even after downloading a fresh SVN |
I've disabled the new repr() in trunk and py3k until somebody has sorted |
Linux on x86. It seems find on PPC OSX. This suggests it could be a byte order bug. |
I also use linux on x86. I think that byte order would cause different |
There is nothing you can do to repr() that's sufficient by itself to The other necessary part is correctly rounded float /input/ routines. The 754 standard does not require correctly rounding input or output Again, this cannot be improved cross-platform without Python supplying http://mail.python.org/pipermail/python-list/2004-July/272167.html Clinger also endorses Gay's code: ftp://ftp.ccs.neu.edu/pub/people/will/retrospective.pdf However, as the Python list link says, Gay's code is "mind-numbingly There is no easy cross-platform solution, where "cross-platform" means |
I've written a small C program for auto config that checks the Tim, does Python run on a non IEEE 754 platform? Should the new repr() |
Again, without replacing float input routines too, this is /not/ good It is good enough to ensure (well, assuming the code is 100% correct) |
Oh, this is sad. Now I know why Tcl have implemented also a decimal to Perhaps we can simply use both their routines? If I am not mistaken, |
The Tcl code can be fonund here: What Tim says gives another reason for using that code - it means that Just to make sure - IEEE does require that operations on doubles will do |
It's really a shame. It was a nice idea ... Could we at least use the new formating for str(float) and the display >>> repr(11./5)
'2.2'
>>> 11./5
2.2000000000000002
>>> str(11./5)
'2.2' |
I think that for str(), the current method is better - using the new But I actually think that we should also use Tcl's decimal to binary |
If I think about it some more, why not get rid of all the float I think that it means:
This is basically what Tcl did, if I understand correctly - see item 6 |
Noam Raphael wrote:
No, that's not correct. The standard defines that nan is always unequal False
>>> float("inf") == float("inf")
True
>>> float("inf") == -1*float("-inf")
True The float module could gain three singletons nan, inf an neginf so you True Christian |
That's right, but the standard also defines that 0.0/0 -> nan, and |
I propose that we add three singletons to the float implementation: PyFloat_NaN The singletons are returned from PyFloat_FromString() for "nan", "inf" I've already started to work on a way to create nan, inf and -inf cross signum(float) -> 1/-1, finity(float) -> bool, infinite(float), nan(float) Here is my work in progress:
#define Py_IEEE_DBL *(double*)&(ieee_double)
#if defined(LITTLE_ENDIAN_IEEE_DOUBLE) && !defined(BIG_ENDIAN_IEEE_DOUBLE)
typedef struct {
unsigned int m4 : 16;
unsigned int m3 : 16;
unsigned int m2 : 16;
unsigned int m1 : 4;
unsigned int exp : 11;
unsigned int sign : 1;
} ieee_double;
#define Py_IEEE_NAN Py_IEEE_DBL{0xffff, 0xffff, 0xffff, 0xf, 0x7ff, 0}
#define Py_IEEE_INF Py_IEEE_DBL{0, 0, 0, 0, 0x7ff, 0}
#define Py_IEEE_NEGINF Py_IEEE_DBL{0, 0, 0, 0, 0x7ff, 1}
#elif !defined(LITTLE_ENDIAN_IEEE_DOUBLE) && defined(BIG_ENDIAN_IEEE_DOUBLE)
typedef struct {
unsigned int sign : 1;
unsigned int exp : 11;
unsigned int m1 : 4;
unsigned int m2 : 16;
unsigned int m3 : 16;
unsigned int m4 : 16;
} ieee_double;
#define Py_IEEE_NAN Py_IEEE_DBL{0, 0x7ff, 0xf, 0xffff, 0xffff, 0xffff}
#define Py_IEEE_INF Py_IEEE_DBL{0, 0x7ff, 0, 0, 0, 0}
#define Py_IEEE_NEGINF Py_IEEE_DBL{1, 0x7ff, 0, 0, 0, 0}
#else
#error Unknown or no IEEE double implementation
#endif |
(1) Despite Tim's grave language, I don't think we'll need to write our (1a) Perhaps it's better to only do this for Python 3.0, which has a (2) Have you two (Christian and Noam) figured out yet why repr(1e5) is (3) Detecting NaNs and infs in a platform-independent way is tricky, (4) Looks like we've been a bit too hasty checking this in. Let's be |
Guido van Rossum wrote:
+1
I wasn't able to reproduce the problem on my work stations. Can you give
In fact detecting NaNs and Infs is platform-independent is dead easy IFF The form (normalized or denormalized) is regardless for the detection of We can also use the much slower version and check if x > DBL_MAX for Christian |
No, traditionally Python has just used whatever C's double provides. There are some places that benefit from IEEE 754, but few that require
So far I have only one box where it is broken (even after make clobber It's an x86 Linux box running in mixed 64-32 bit mode. From /etc/lsb-release: "Ubuntu 6.06 LTS" I'm afraid I'll have to debug this myself, but not today.
Only for IEEE 754 though.
Of course the latter isn't guaranteed to help for non-IEEE-754 |
ISTM, that years of toying with Infs and Nans has not I recommend punting on the subject of NaNs. Attempting The decimal module provides full support for NaNs. |
[Raymond]
Works as intended in 2.5; this is Windows output: 1.#INF
>>> nan = inf - inf
>>> nan # really is a NaN
-1.#IND
>>> nan is nan # of course this is true
True
>>> nan == nan # but not equal anyway
False |
+1 on the fallback strategy for platforms we don't know how to handle. |
Eric and I have set up a branch of py3k for work on this issue. URL for http://svn.python.org/projects/python/branches/py3k-short-float-repr |
My changes on the py3k-short-float-repr branch include:
Remaining to be done:
|
So work on the py3k-short-float-repr branch is nearing completion, and A proposal: I propose that the short float representation should be Eric's summarized his changes above. Here are mine (mostly---some
>>> x = 2e16+8. # 2e16+8. is exactly representable as a float
>>> x
20000000000000010.0 There's no way that this padding with bogus digits can happen for |
Changing target Python versions. I'll upload a patchset to Rietveld sometime soon (later today, I hope). |
I've uploaded the current version to Rietveld: |
The Rietveld patch set doesn't show the three new files, which are: Python/dtoa.c |
Those three missing files have now been added to Rietveld. Just for reference, in case anyone else encounters this: the reason those svn revert Python/dtoa.c (and similarly for the other two files) fixed this. |
On Tue, Apr 7, 2009 at 3:10 AM, Mark Dickinson <report@bugs.python.org> wrote:
In principle that's fine with me.
Historically, we've had a stronger requirement: if you print repr(x) Now that pickle and marshal no longer use repr() for floats I think In order to make progress I recommend that we just not this and don't |
I think ANY attempt to rely on eval(repr(x))==x is asking for trouble, Example: The following C code can vary *even* on a IEEE 754 platform, double x, y;
x = 3.0/7.0;
y = x;
/* ... code that doesnt touch/read x or y ... */
printf(" x==y: %s", (x==y) ? "true" : "false"); So, how can we hope that eval(repr(x))==x is EVER stable? Equality and (Code above based on |
Hmm. With the py3k-short-float-repr stuff, we should be okay moving But for a CPython-generated short repr to give the correct value on some For safety's sake, I'll make sure that marshal (version 1) and pickle |
I disagree. I've read the paper you refer to; nevertheless, it's still
It *is* true that the correctness of Gay's code depends on the FPU being |
The process that you describe in msg85741 is a way of ensuring I'd be interested to see if you could say that the Python object My pedantic mind would strip any and all references to floating-point |
The py3k-short-float-repr branch has been merged to py3k in two parts: r71663 is mostly concerned with the inclusion of David Gay's code into the r71665 contains Eric's *mammoth* rewrite and upgrade of the all the float Note: the new code doesn't give short float repr on *all* platforms,
So:
So on non-Windows x86 platforms that *aren't* using gcc and *do* exhibit The most prominent platform that I can think of that's affected by this Note that if any of the above heuristics is wrong and we end up using |
Hello folks, IIUC, autoconf tries to enable SSE2 by default without asking. Isn't it Or am I misunderstanding what the changes are? |
Yes, I think you're right. Perhaps the SSE2 support should be turned into an --enable-sse2 configure Disabling SSE2 would just mean that all x86/gcc systems would end up using |
Perhaps better to drop the SSE2 bits completely. Anybody who CC="gcc -msse2 -mfpmath=sse" configure && ... Unless there are objections, I'll drop everything involving SSE2 from It's a bit of a shame, though: it's definitely desirable to be using |
SSE2 detection and flags removed in r71723. We'll see how the buildbots |
Is there a way to use SSE when available and x86 when it's not. IIRC, |
Probably, but I don't think there is any point doing so. The main The situation is different for specialized packages like numpy, but |
The advantage is accuracy. No double rounding. This will also help the |
[Raymond]
I guess it's possible in theory, but I don't know of any way to do this in Antoine: as Raymond said, the advantage of SSE2 for numeric work is Those difficulties can be *mostly* dealt with by setting the x87 rounding There's a very nice paper by David Monniaux that covers all this: http://hal.archives-ouvertes.fr/hal-00128124/en/ An example: in Python (any version), try this: >>> 1e16 + 2.9999
10000000000000002.0 On OS X, Windows and FreeBSD you'll get the answer above. On 32-bit Linux/x86 or Solaris/x86 you'll likely get the answer 10000000000000004.0 instead, because Linux doesn't (usually?) change the Intel default </standard x87 rant> |
Just came across this bug, I don't want to reopen this or anything, but regarding the SSE2 code I couldn't help thinking that why can't you just detect the presence of SSE2 when the interpreter starts up and then switch implementations based on that? I think that's what liboil does (http://liboil.freedesktop.org/wiki/). |
Probably this will look non-intuitive, but underlying floating-point arithmetic is binary, not decimal. Thus, round(x, n) sometimes will pick up a decimal representation with more than n digits. This is something common for CPython < 2.7 and < 3.1: Python 2.6.6 (r266:84292, Aug 12 2014, 07:57:07) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> round(0.42754287856598971, 2) 0.42999999999999999 And with this patch as well: Python 3.12.5+ (heads/3.12:0181aa2e3e, Aug 29 2024, 14:55:08) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from mpmath import * >>> round(mp.mpf(0.42754287856598971), 2) mpf('0.42999999999999999') >>> mp.pretty = True >>> round(mp.mpf(0.42754287856598971), 2) 0.43 See also python/cpython#45921 Closes mpmath#455
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: