New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent handling of integer overflow between Windows and Linux. #8433

Open
Erotemic opened this Issue Dec 31, 2016 · 6 comments

Comments

Projects
None yet
3 participants
@Erotemic
Copy link

Erotemic commented Dec 31, 2016

While contributing to scikit-learn, I've uncovered an inconsistent behavior between windows and linux.
The PR this was uncovered in can be found here: scikit-learn/scikit-learn#8094 (comment)

Running this line results in different behavior on windows and linux regardless of whether or not python is 32 or 64 bit.

np.full((2, 2), np.iinfo(np.int32).max, dtype=np.int32).trace()
  • The result is -2 on Windows 10 (64bit) using both Python 3.6-64 and Python 3.6-32
  • The result is 4294967294 on Ubuntu 16.04 (64bit) using Python3.5-64 and Python 2.7-64

Obviously it would be good to test the exact same python versions for completeness, but I doubt the result will change. I don't have consistent access to a windows machine, but I can install 3.6 on Ubuntu and ensure that the result doesn't magically change to -2.

This behavior is not limited to the trace function

np.full((2, 2), np.iinfo(np.int32).max, dtype=np.int32).sum(axis=0)

shows similar behavior: array([4294967294, 4294967294]) on ubuntu and array([-2, -2]) on windows.
However, this function returns an array instead of a scalar, so we can inspect the dtype. On Ubuntu the dtype is int64, but on Windows the dtype is int32.

Somehow on Ubuntu the result is automatically upcast, but on windows the result overflows and remains an int32. It seems that one of these behaviors should be preferred (ideally the upcast Ubuntu version)

Erotemic added a commit to Erotemic/numpy that referenced this issue Dec 31, 2016

@charris

This comment has been minimized.

Copy link
Member

charris commented Dec 31, 2016

The default integer precision varies from platform to platform and looks to be that of C long. We may want to change that to be 32/64 bits depending on the platform.

dtype : dtype, optional
    Determines the data-type of the returned array and of the accumulator
    where the elements are summed. If dtype has the value None and `a` is
    of integer type of precision less than the default integer
    precision, then the default integer precision is used. Otherwise,
    the precision is the same as that of `a`.
@charris

This comment has been minimized.

Copy link
Member

charris commented Dec 31, 2016

The way to deal with that precision problem is to explicitly specify the dtype iof the accumulator.

@Erotemic Erotemic closed this Jan 2, 2017

@Erotemic

This comment has been minimized.

Copy link
Author

Erotemic commented Jan 2, 2017

@charris, Ok, this makes sense, can you clarify what defines the platform in this context? Is the platform at the level of OS (e.g. Windows 7, Windows 10, Ubuntu 16.06, Gentoo 2.1?), does it depend on if the processor is 64/32 bit?, and/or does it also depend on whatever C/Fortran-compiler the numpy libraries were built with? I don't think I've every really fully understood this topic, and I'd like to patch up that hole in my knowledge once and for all.

Looking at the documentation on https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html it seems to imply platform is determined by if the processor is 32/64 bit. However, this behavior seems to indicate it has to do with the operating system as well otherwise the test on 64-bit windows would output the same result as 64-bit linux, however it agrees with 32-bit windows instead.

@njsmith

This comment has been minimized.

Copy link
Member

njsmith commented Jan 2, 2017

@charris

This comment has been minimized.

Copy link
Member

charris commented Jan 2, 2017

I think the simplest change would be to always use 32 bits on 32 bit platforms and 64 bits on 64 bit platforms. It would be a small compatibility break, but seeing as things are already incompatible across platforms that doesn't bother me much.

The big hammer would be to make all accumulations default to 64 bits, but I don't think that is really necessary.

@njsmith

This comment has been minimized.

Copy link
Member

njsmith commented Jan 2, 2017

The big hammer would be to make all accumulations default to 64 bits, but I don't think that is really necessary.

I would be interested to see how much stuff broke if we were to do this (e.g. try enabling it and then run a few projects test suites). Keeping 32-on-32 and 64-on-64 would certainly not be the end of the world, but just eliminating this source of tricky breakage entirely would be nice if we could get away with it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment