Thoughts on storage #182

liamhuber · 2024-01-26T16:15:22Z

This is just my stream-of-consciousness notes immediately post-meeting with @pmrv.
I'll come back and update it, and split it into two or more sensible actionable issues next week when I have more time.
Meanwhile, I wanted it available in light of my request (#181) to meet with both @pmrv and @jan-janssen so we can move towards a storage solution that satisfies everyone.

(De)serialization for remote processes

Change the bit of run that gets shipped off to other processes to include info about serializing the output
- We'll probably also need to send off the fully scoped working directory
- For performance, we'd ship out the run function and inputs, and ship back the outputs (not all of self)
Serialize to some temp file, then at the last minute move the file to the "expected" location
One the main process side, update your output, and either delete the temp
When loading a graph, if you find a node who deserializes with a running state, go look for the "expected" save file; if you find it, deserialize it and use this output data in your regular callback (including a full (not just output) save (if requested), a checkpointing save (if requested), and firing output signals)
In this way submitting to an Executor process or a queue process should look very similar
- With the exception that what gets sent to the queue process may also be instructions for using an Executor nested inside that process

This should provide compatibility with both queues and restarting partially executed runs!

Storage implementation

The second topic was storage details.
One thing I forgot to mention that I liked about tinybase's storage solution was that it has a generic abstraction that would allow us to plug in non-hdf5 back ends.
Marvin was very amenable to the idea of using __get/setstate__ natively (where it exists) to do the re/store process, so that we don't necessarily need to define those custom functions.
The rough idea now is to use __getstate__ recursively to get key-value pairs and storage["key"] = value assign them to the generic storage instance.
This can be wrapped in a try-except clause, so that when the assignment fails we (cloud?)pickle the object and try again, which would circumvent the current h5io failure for, e.g., ASE Calculator objects.
Replacing _restore with something based on __setstate__ is a little more opaque to me, because I'm still not totally clear how h5io is getting me my initial classes, but maybe it will make more sense when I go back and look there.

Maybe we could also work in something extra like before- and after-storage hooks, e.g. for storing extra-state data like version controlling, so we can fail hard if we're deserializing the wrong version or whatever.
But that's a bell-and-whistle.

The text was updated successfully, but these errors were encountered:

liamhuber mentioned this issue Feb 15, 2024

Add and test wrappers for sticking nodes in a pyiron job #189

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts on storage #182

Thoughts on storage #182

liamhuber commented Jan 26, 2024

Thoughts on storage #182

Thoughts on storage #182

Comments

liamhuber commented Jan 26, 2024

(De)serialization for remote processes

Storage implementation