Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Inconsistent handling of integer overflow between Windows and Linux. #8433
While contributing to scikit-learn, I've uncovered an inconsistent behavior between windows and linux.
Running this line results in different behavior on windows and linux regardless of whether or not python is 32 or 64 bit.
np.full((2, 2), np.iinfo(np.int32).max, dtype=np.int32).trace()
Obviously it would be good to test the exact same python versions for completeness, but I doubt the result will change. I don't have consistent access to a windows machine, but I can install 3.6 on Ubuntu and ensure that the result doesn't magically change to -2.
This behavior is not limited to the trace function
np.full((2, 2), np.iinfo(np.int32).max, dtype=np.int32).sum(axis=0)
shows similar behavior:
Somehow on Ubuntu the result is automatically upcast, but on windows the result overflows and remains an int32. It seems that one of these behaviors should be preferred (ideally the upcast Ubuntu version)
added a commit
Dec 31, 2016
The default integer precision varies from platform to platform and looks to be that of C long. We may want to change that to be 32/64 bits depending on the platform.
referenced this issue
Dec 31, 2016
@charris, Ok, this makes sense, can you clarify what defines the platform in this context? Is the platform at the level of OS (e.g. Windows 7, Windows 10, Ubuntu 16.06, Gentoo 2.1?), does it depend on if the processor is 64/32 bit?, and/or does it also depend on whatever C/Fortran-compiler the numpy libraries were built with? I don't think I've every really fully understood this topic, and I'd like to patch up that hole in my knowledge once and for all.
Looking at the documentation on https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html it seems to imply platform is determined by if the processor is 32/64 bit. However, this behavior seems to indicate it has to do with the operating system as well otherwise the test on 64-bit windows would output the same result as 64-bit linux, however it agrees with 32-bit windows instead.
It's how many bits there are in a C long in the C compiler that numpy was compiled with. In practice though this is always the same for a given (OS, bitness) pair. Basically: long is 32 bits on 32 bit builds, and 64 bits on 64 bit builds, except on Windows long is always 32 bits. (Google "LP64" and "LLP64" for lots and lots more details.) The original motivation for this is that Python int objects are traditionally the same as a C long. (Though Python 3 got rid of this and Python int objects are now arbitrary precision.)
I think the simplest change would be to always use 32 bits on 32 bit platforms and 64 bits on 64 bit platforms. It would be a small compatibility break, but seeing as things are already incompatible across platforms that doesn't bother me much.
The big hammer would be to make all accumulations default to 64 bits, but I don't think that is really necessary.
I would be interested to see how much stuff broke if we were to do this (e.g. try enabling it and then run a few projects test suites). Keeping 32-on-32 and 64-on-64 would certainly not be the end of the world, but just eliminating this source of tricky breakage entirely would be nice if we could get away with it...