Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

astype(int) silently returns wrong answer under Windows #17640

Closed
Rik-de-Kort opened this issue Oct 26, 2020 · 21 comments
Closed

astype(int) silently returns wrong answer under Windows #17640

Rik-de-Kort opened this issue Oct 26, 2020 · 21 comments

Comments

@Rik-de-Kort
Copy link

Rik-de-Kort commented Oct 26, 2020

Casting an array of float64 to int using astype with as argument int will yield the value -2147483648 under windows.

Reproducing code example:

x = 2384351503.0
np.testing.assert_array_equal(np.array([x]).astype(int), np.array([int(x)]))

I expect this test to go through, but it fails with the following message:

Traceback (most recent call last):
  File "C:\Users\koRR\AppData\Local\conda\conda\envs\LRE\lib\site-packages\IPython\core\interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-725c5d2f7abb>", line 1, in <module>
    np.testing.assert_array_equal(np.array([x]).astype(int), np.array([int(x)]))
  File "C:\Users\koRR\AppData\Local\conda\conda\envs\LRE\lib\site-packages\numpy\testing\_private\utils.py", line 931, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "C:\Users\koRR\AppData\Local\conda\conda\envs\LRE\lib\site-packages\numpy\testing\_private\utils.py", line 840, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 4531835151
Max relative difference: 1.90065733

Using dtypes np.int64 works fine, already np.int32 returns the wrong answer (although that's 0 and the above result occurs only with the builtin int or np.int).

NumPy/Python version information:

Numpy version: 1.19.1
System version: '3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]'
Bug is taking place under Windows 10. Cannot reproduce on my Arch system with numpy 1.19.2 and python 3.8.5 built using GCC.

@eric-wieser
Copy link
Member

eric-wieser commented Oct 26, 2020

Your example is not the same as your error message - the test uses np.array([2384351503], dtype=int), while the error message shows you are using np.array([238435103]). At least on 1.17, the former raises an OverflowError, so I can't even run your example code.

@eric-wieser
Copy link
Member

Bug is taking place under Windows 10. Cannot reproduce on my Arch system with numpy 1.19.2 and python 3.8.5 built using GCC.

int gives you a C long - and long is 32-bit on windows but 64-bit everywhere else.

@Rik-de-Kort
Copy link
Author

Sorry about that, I was typing the message from outside the VM where I got it. Very sloppy on my part. I updated the OP with a proper example.

and long is 32-bit on windows but 64-bit everywhere else.

Thanks for the info. int has unbounded precision in Python so this really took me by surprise. Also it's weird that it didn't raise an overflow error.

@charris
Copy link
Member

charris commented Oct 27, 2020

so this really took me by surprise.

Python 2 had two types of integer: int (C long) and long (unbounded precision). Python 3 only kept the second.

@finoptimal-dev
Copy link

Is there a workaround for this?

@BvB93
Copy link
Member

BvB93 commented Jun 1, 2021

Is there a workaround for this?

Since the issue is caused by an overflow of np.int32 (i.e. the default int-type on windows) you can use np.int64 instead.

@finoptimal-dev
Copy link

Thanks, BvB93. Is there a way to map it globally? I have a codebase that works on linux with astype(int) all over it; now I have a Windows developer. Before I consider changing it everywhere it exists, I'm wondering if there's a solution that spares needing to take that route.

@BvB93
Copy link
Member

BvB93 commented Jun 2, 2021

Thanks, BvB93. Is there a way to map it globally?

I think you might be out of luck here. As far as I'm aware there is no way of manually setting the default integer type to something different from what is specified by the platform in question.

@charris
Copy link
Member

charris commented Jun 3, 2021

There has been some discussion about changing the default type. I believe the use of c_long is inherited from early python.

@seberg
Copy link
Member

seberg commented Jun 3, 2021

I had a PR once to make it np.intp at least, so that 64bit windows would at least use 64bits (making the rule "64bit on 64bit systems").

That was mainly for discussion, but I am more and more seeing a NumPy 2.0 in any case, and this might be important enough to fold in if it happens. (Even if I would prefer to not do too many of such changes at once and rather make this type of "breaking a bit" releases every few years so that the number of such changes are small each time.)

@mattip
Copy link
Member

mattip commented Jun 3, 2021

Is this done correctly in the NEP 47 array API?

@seberg
Copy link
Member

seberg commented Jun 3, 2021

@asmeurer can you check this? From what I remember of the pass I did just today, it probably isn't.

@asmeurer
Copy link
Member

asmeurer commented Jun 3, 2021

I think this issue is relevant data-apis/array-api#151

@seberg
Copy link
Member

seberg commented Jun 3, 2021

@asmeurer the point is that the standard probably wants a well defined "default integer"? But NumPy's default integer is currently not well defined:

  • It differs for windows 64bit compared to linux 64 bit (because long is defined differently) and is also 32bit on 32bit platforms.
  • We automatically "spill" into long long or unsigned long long as noted here. EDIT np.array(2**63)

If we add a new, clean namespace we should try to make good use of it and fix both of these. (unless we are sure we fix both of them in NumPy proper, which I would like to try but is more difficult.)

@asmeurer
Copy link
Member

asmeurer commented Jun 3, 2021

I actually mean this change specifically https://github.com/data-apis/array-api/pull/167/files#diff-7e75cfe3133de16126433bc962f9fe14f216bd682e3870efa965bab40d08322fR9. Based on what is there, I guess we should make the "default" integer dtype int64 on 64-bit Windows. Note that in the array API itself the default integer dtype is only relevant in a couple of places.

@seberg
Copy link
Member

seberg commented Jun 3, 2021

@asmeurer good that it is spelled out nicely!

But your PR does not conform to this for asarra([1, 2, 3, 4]). We could probably make it conform from within NumPy at least with an abstract DType right now. Or you would have to write a light-weight asarray yourself, I guess.

@asmeurer
Copy link
Member

asmeurer commented Jun 3, 2021

You mean specifically on Windows? Or does asarray also use some value based casting in some cases?

What is an abstract DType? If there is some trick that would avoid having to reimplement asarray, that would be ideal.

@seberg
Copy link
Member

seberg commented Jun 3, 2021

We automatically "spill" into long long or unsigned long long as noted here. EDIT np.array(2**63)

This part is not windows specific.

What is an abstract DType

A DType class as per NEP 42. That could customize the dtype discovery during array coercion as per the NEP. But, it we may have to fix casting to/for abstract DTypes first. (Shouldn't be super hard, it should have an additional "common DType" based path, I think – that is probably already mentioned in NEP 42.)

@Rik-de-Kort
Copy link
Author

Thanks, BvB93. Is there a way to map it globally? I have a codebase that works on linux with astype(int) all over it; now I have a Windows developer. Before I consider changing it everywhere it exists, I'm wondering if there's a solution that spares needing to take that route.

In my codebase this is exactly what I have done. Explicit is better than implicit after all. No problems changing over, no problems after. Would recommend!

@seberg
Copy link
Member

seberg commented Nov 3, 2023

Closing, we are trying to switch to 64bit on 64bi tplatforms for NumPy 2.0. See the dev release notes for example. Which will change the default on windows specifically.
(And if we may have to undo that, there is probably not much we can do about this.)

@seberg seberg closed this as completed Nov 3, 2023
@seberg
Copy link
Member

seberg commented Nov 3, 2023

Note that for now, NumPy will still happily return uints and object for out of bound ints, this also doesn't affect 32bit platforms, just because it would be a bigger, more difficult to deal with change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants