Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add append and update keywords to savez #22961

Closed
wants to merge 4 commits into from
Closed

Add append and update keywords to savez #22961

wants to merge 4 commits into from

Conversation

mcuntz
Copy link

@mcuntz mcuntz commented Jan 8, 2023

This PR adds the keywords append and update to numpy.savez and numpy.savez_compressed.

I like to store results of lengthy calculations into files such as numpy's npz-files and use these in plotting scripts, which run very often to be refined, etc. If I have to redo some of the calculations but not all, then only some of the results in the output files need updating.

So I implemented that one can simply append new arrays to existing npz-files and also that one can update arrays in existing npz-files. The latter uses a temporary file, just as the zip-utility with -u. Python also only allows 'r', 'w', 'a', and 'x' with zipfile.ZipFile.

I am a first time contributor. I do not know the dispatch mechanism. So I followed how it is done in the save function with the _save_dispatcher.

I added tests. However, python runtests.py --coverage gave already 18 times the same error:
E AssertionError: Got warnings: [<warnings.WarningMessage object at 0x1a93b92d0>]
in numpy/core/tests/test_umath.py::TestSpecialFloats::test_unary_spurious_fpexception
before starting the PR. This is still the case.

@rkern
Copy link
Member

rkern commented Jan 8, 2023

Because this introduces a new constraint on the names of arrays, it can break existing code.

I would prefer that we not add this feature to numpy. We intentionally keep the capabilities of the NPY/NPZ formats fairly minimal. These kinds of features are about the point where I'd really recommend using another format like HDF5 or zarr or the like.

There's no real reason this functionality has to be placed into savez(), even if you do want to stick with the NPZ format. You could have a separate function that does it, and then you wouldn't need to add the keyword arguments that would conflict with array names. The choice to use the other function conveys the intent of those boolean flags. But I think those functions are outside of the scope of the features that we want to maintain in numpy.

@mcuntz
Copy link
Author

mcuntz commented Jan 8, 2023

@rkern Yes, you cannot have arrays anymore with the names append or update. But this is already the case for file.

I could also check if append and update are simple boolean scalars and if they are not then they are arrays that will be written to the file.

@rkern
Copy link
Member

rkern commented Jan 8, 2023

That restriction for file has been there from the beginning. This is adding a new restriction which could break existing code.

@mcuntz
Copy link
Author

mcuntz commented Jan 8, 2023

@rkern The checking for simple boolean scalars does not seem to have convinced you :-( although this would avoid 99.99% of all possible code breaks.

If this is the "official" answer of numpy then I will organise myself differently and you can close this PR so that it is not hanging around (or do I have to do this?).

@rkern
Copy link
Member

rkern commented Jan 8, 2023

Correct, I'm not convinced by the value-dependent reinterpretation of those arguments. That's a little too much magic for something that could be easily avoided by having a separate update_npz() function. It's why we have savez_compressed() instead of savez(..., compressed=True) after all.

I still don't particularly want to have the feature in numpy, regardless of the spelling. While I think I have accurately represented the design goals of the project for this functionality, if there is someone else on the numpy team that wants to step in and champion update_npz(), I won't veto it.

@mcuntz
Copy link
Author

mcuntz commented Jan 10, 2023

@rkern O.K. I restored the savez/savez_compressed code and added updatez/updatez_compressed/_updatez functions, mirroring savez et al., because I want the functionality anyway. I leave it for a few days and if no one else from the numpy team fancies the feature until then, I will close this PR and put the code somewhere else in my one repo.

@mcuntz mcuntz closed this by deleting the head repository Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Needs decision
Development

Successfully merging this pull request may close these issues.

None yet

2 participants