Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: numpy.savez_compressed ignores arrays named "like" #23365

Closed
derekelkins opened this issue Mar 9, 2023 · 9 comments
Closed

BUG: numpy.savez_compressed ignores arrays named "like" #23365

derekelkins opened this issue Mar 9, 2023 · 9 comments
Labels

Comments

@derekelkins
Copy link

Describe the issue:

If you call numpy.savez_compressed with the keyword like, i.e. if one of the named arrays is named "like", it does not get saved to the npz file. Nothing like this is documented in the savez_compressed documentation. I didn't see an issue that discussed this.

You can unzip the npz to verify that the array isn't being saved.

This seems to have started happening somewhere between numpy==1.19.5 and numpy==1.21.6. That is, numpy==1.19.5 would correctly save the array named "like". Below I used numpy==1.23.2 and I've also tried numpy==1.24.2 which behave the same as each other.

I suspect savez_compressed isn't the only affected function and "like" isn't the only relevant keyword, but I haven't explored this.

While I'd prefer input to savez_compressed to not just be silently ignored, I'd also be okay with just a documentation update to the relevant functions that explained which keywords couldn't be used.

Reproduce the code example:

import numpy as np

print(f"numpy.version.full_version={np.version.full_version}")
inputs = {
    "foo": np.array([1], dtype=np.float32),
    "like": np.array([2], dtype=np.float32),
}
print(inputs)
np.savez_compressed("example.npz", **inputs)
loaded_inputs = np.load("example.npz", allow_pickle=True)
print(dict(loaded_inputs))
if inputs != dict(loaded_inputs):
    print("Unexpected mismatch")

Error message:

No response

Runtime information:

1.23.2
3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0]

Context for the issue:

I would say silently and unexpectedly failing to save a user's output is a compelling enough reason. That said, in my case I can't simply use a different name as the names are being provided by a third party piece of code.

@rkern
Copy link
Member

rkern commented Mar 9, 2023

I think this happened because of the NEP 35 implementation. But that doesn't seem appropriate for the saving functions. While they might need to be overridden to allow dispatching to other API's implementations of them, these only consume arrays and do not create them, so they should not go through the code path that adds like= keyword arguments.

@pentschev @shoyer From the git blame, it seems like you two did much of the relevant implementations here. What do you think? This is a bug that should be fixed.

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Mar 9, 2023
@pentschev
Copy link
Contributor

I think this happened because of the NEP 35 implementation. But that doesn't seem appropriate for the saving functions. While they might need to be overridden to allow dispatching to other API's implementations of them, these only consume arrays and do not create them, so they should not go through the code path that adds like= keyword arguments.

Agreed, this seems very much like a side-effect from NEP-35 and shouldn't be happening. The like= kwarg is not explicitly implemented for np.savez_compressed though, as you rightly mentioned it wouldn't make sense there, the functions that actively implement like= are listed in

_array_tests = [
('array', *func_args((1,))),
('asarray', *func_args((1,))),
('asanyarray', *func_args((1,))),
('ascontiguousarray', *func_args((2, 3))),
('asfortranarray', *func_args((2, 3))),
('require', *func_args((np.arange(6).reshape(2, 3),),
requirements=['A', 'F'])),
('empty', *func_args((1,))),
('full', *func_args((1,), 2)),
('ones', *func_args((1,))),
('zeros', *func_args((1,))),
('arange', *func_args(3)),
('frombuffer', *func_args(b'\x00' * 8, dtype=int)),
('fromiter', *func_args(range(3), dtype=int)),
('fromstring', *func_args('1,2', dtype=int, sep=',')),
('loadtxt', *func_args(lambda: StringIO('0 1\n2 3'))),
('genfromtxt', *func_args(lambda: StringIO('1,2.1'),
dtype=[('int', 'i8'), ('float', 'f8')],
delimiter=',')),
]
, thus my guess is this is happening due to the use of one of those in the savez_compressed implementation. I'll take a look at the implementation to see if I can make sense of what is happening.

@pentschev @shoyer From the git blame, it seems like you two did much of the relevant implementations here. What do you think? This is a bug that should be fixed.

@shoyer has worked on the NEP-18, but NEP-35 is probably all me.

@pentschev
Copy link
Contributor

I don't know what may have changed but I can indeed confirm the issue with NumPy 1.23.5, but it seems to be working correctly with upstream:

In [1]: import numpy as np
   ...:
   ...: print(f"numpy.version.full_version={np.version.full_version}")
   ...: inputs = {
   ...:     "foo": np.array([1], dtype=np.float32),
   ...:     "like": np.array([2], dtype=np.float32),
   ...: }
   ...: print(inputs)
   ...: np.savez_compressed("example.npz", **inputs)
   ...: loaded_inputs = np.load("example.npz", allow_pickle=True)
   ...: print(dict(loaded_inputs))
   ...: if inputs != dict(loaded_inputs):
   ...:     print("Unexpected mismatch")
   ...:
numpy.version.full_version=1.25.0.dev0+882.gea0b1708b
{'foo': array([1.], dtype=float32), 'like': array([2.], dtype=float32)}
{'foo': array([1.], dtype=float32), 'like': array([2.], dtype=float32)}

Could you confirm the same @derekelkins @rkern ?

@seberg
Copy link
Member

seberg commented Mar 10, 2023

Ah, that would make sense: I refactored the dispatching for speed recently. Previously, we had a kwargs.pop("like") unconditionally, I think. Thinking that it should never be hit.

Now I believe I refactored that away, thus fixing the issue "unintentionally".

Might be nice to test, but I suppose it is fixed then, unless we want a backport fix a lot.

@pentschev
Copy link
Contributor

Thanks @seberg , love how you normally fix various issues in one go like that! 😄

@seberg
Copy link
Member

seberg commented Mar 10, 2023

Lets see if it is a very clear and simple fix (I can do that). In that case, a backport-only for 1.24.x is maybe reasonable considering it is silently losing data :/.

@derekelkins unfortunately, the only decent work-around I can offer to is to use

np.savez_compressed._implementation

You could even do:

import numpy as np
np.savez_compressed = np.savez_compress._implementation

Small chance that it goes away at some point, since it is "private", but I not really anytime soon. There is also NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 which disables the whole thing as an env-variable. (That may well go away, but then, its fixed on main ;), that env-variable would mean that _implementation doesn't exist probably).

@seberg
Copy link
Member

seberg commented Mar 10, 2023

The best I can come up with is explicitly checking whether we got savez_compressed (I will suspect it is the only truly **kwargs function, I guess some recfunctions might exist but they are even more niche).

Refactoring in the information that like= was passed is not hard, but doesn't seem like it will turn out really minimal.

EDIT: Well, and savez of course...

@derekelkins
Copy link
Author

@pentschev Yep. Running from main (1.25.0.dev0+884.g89d64415e) worked for me as well.

@seberg Both of those work-arounds seem to work in my actual use-case. I'll have to see if there are any other adverse effects, but I suspect not.

Thanks everyone for your quick response.

@seberg
Copy link
Member

seberg commented Mar 10, 2023

Closing, its fixed on main, and will be in the next 1.24.x release (I guess there should be one more). For other versions, have to use those workaround unfortunately...

The workaround is safe. The np.savez_compressed._implementation one is also targeted precisely, so there is basically no chance of it doing something unexpected.

@seberg seberg closed this as completed Mar 10, 2023
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants