Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store with h5io #181

Merged
merged 22 commits into from
Jan 30, 2024
Merged

Store with h5io #181

merged 22 commits into from
Jan 30, 2024

Conversation

liamhuber
Copy link
Member

@liamhuber liamhuber commented Jan 26, 2024

by default instead of tinybase.storage.h5io. Depends on all three of these PRs over on h5io, so like the root PR here (#160), there's no way for the tests to run on the CI.

After a bunch of upstream PRs here getting the __get/setstate__ in better shape, saving is just:

h5io.write_hdf5(
    fname=self._h5io_storage_file_path,
    data=self,
    title=self.label,
    use_state=True,
    overwrite=True,  # Don't worry about efficiency or updating yet
)

And loading is just:

inst = h5io.read_hdf5(fname=self._h5io_storage_file_path, title=self.label)
self.__setstate__(inst.__getstate__())

Which is nice and clean, but can fail if the data channels hold values that don't play nicely with h5io.

@pmrv and @jan-janssen, at some point I think I'll need a meeting with both of you together to help me work through the strengths and weaknesses of the different solutions available here. I'm slugging along steadily, but I feel a bit out of my depth.

Overall what I like about the h5io solution is that it is leveraging __get/setstate__, so it's largely enough for me to get (cloud)pickle happy with those and then I can just pass the whole objects to write_hdf5 or get them back with read_hdf5 (and use the returned object to set state). So it's extremely concise.

On the other hand tinybase.storage feels much more transparent on what it's storing where, and it is both more clear when it falls back to a pickle solution and much more powerful when it does it -- e.g. h5io.write_hdf5 borks on instances ASE's EMT because of some random and innocuous seeming method on the parent Calculator class; so I can't even store the atomistics workflows with this tool. Also, as much as it's more write, the fact tinybase.storage forces me to use something other than the state methods means that I have more control over the storage path, so nodes can be stored on their semantic path, e.g. my_wf/my_macro/some_node instead of my_wf/nodes/my_macro/nodes/some_node.

Finally, in the medium term, it will become important to resolve the versions of the stored object more vigorously, and ultimately have a framework that supports upgrading stored data to new (compatible) versions. I'm sure we can get this done, but the path forward is not yet super clear for me on this front.

TODO:

  • Refactor; I just slapped this together in a rush and it's rather ugly (Not perfect, but good enough for while we're supporting two frameworks)
  • Fix the docstrings so they apply to both solutions and explain the difference
  • Re-run the deep-dive to make sure the example still works
  • And of course updates once the dependent packages actually exist This is a wontfix unless the upstream out-of-repo PRs get finished and merged before I want to stack something on top of this

Copy link

Binder 👈 Launch a binder notebook on branch pyiron/pyiron_workflow/store_with_h5io

@liamhuber liamhuber mentioned this pull request Jan 26, 2024
The promises are slightly different, but the saving of aggregate workflows with all children from packages should work the same for both back ends, so test it
@liamhuber liamhuber added the format_black trigger the Black formatting bot label Jan 29, 2024
pyiron-runner and others added 12 commits January 29, 2024 19:02
Ultimately, I want to lean directly on tinybase for this, but in the intermediate where we have both tinybase _and_ pure h5io back ends, let's pull things out to keep the base node class tidier.
Call it the backend instead of the mode throughout, prepending with "storage_" if the context is not 100% clear (e.g. in node inits)
Saving and loading is done all at once by the parent; new child instances do not look for their own saved data.
Required moving the macro nodes over to a node package
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@liamhuber liamhuber added format_black trigger the Black formatting bot and removed format_black trigger the Black formatting bot labels Jan 29, 2024
Base automatically changed from parent_decorated_nodes_to_their_module to simple_storage January 30, 2024 18:14
@liamhuber liamhuber merged commit 600b00c into simple_storage Jan 30, 2024
1 of 12 checks passed
@liamhuber liamhuber deleted the store_with_h5io branch January 30, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format_black trigger the Black formatting bot
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants