Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on storage #182

Open
liamhuber opened this issue Jan 26, 2024 · 0 comments
Open

Thoughts on storage #182

liamhuber opened this issue Jan 26, 2024 · 0 comments

Comments

@liamhuber
Copy link
Member

This is just my stream-of-consciousness notes immediately post-meeting with @pmrv.
I'll come back and update it, and split it into two or more sensible actionable issues next week when I have more time.
Meanwhile, I wanted it available in light of my request (#181) to meet with both @pmrv and @jan-janssen so we can move towards a storage solution that satisfies everyone.

(De)serialization for remote processes

  • Change the bit of run that gets shipped off to other processes to include info about serializing the output
    • We'll probably also need to send off the fully scoped working directory
    • For performance, we'd ship out the run function and inputs, and ship back the outputs (not all of self)
  • Serialize to some temp file, then at the last minute move the file to the "expected" location
  • One the main process side, update your output, and either delete the temp
  • When loading a graph, if you find a node who deserializes with a running state, go look for the "expected" save file; if you find it, deserialize it and use this output data in your regular callback (including a full (not just output) save (if requested), a checkpointing save (if requested), and firing output signals)
  • In this way submitting to an Executor process or a queue process should look very similar
    • With the exception that what gets sent to the queue process may also be instructions for using an Executor nested inside that process

This should provide compatibility with both queues and restarting partially executed runs!

Storage implementation

The second topic was storage details.
One thing I forgot to mention that I liked about tinybase's storage solution was that it has a generic abstraction that would allow us to plug in non-hdf5 back ends.
Marvin was very amenable to the idea of using __get/setstate__ natively (where it exists) to do the re/store process, so that we don't necessarily need to define those custom functions.
The rough idea now is to use __getstate__ recursively to get key-value pairs and storage["key"] = value assign them to the generic storage instance.
This can be wrapped in a try-except clause, so that when the assignment fails we (cloud?)pickle the object and try again, which would circumvent the current h5io failure for, e.g., ASE Calculator objects.
Replacing _restore with something based on __setstate__ is a little more opaque to me, because I'm still not totally clear how h5io is getting me my initial classes, but maybe it will make more sense when I go back and look there.

Maybe we could also work in something extra like before- and after-storage hooks, e.g. for storing extra-state data like version controlling, so we can fail hard if we're deserializing the wrong version or whatever.
But that's a bell-and-whistle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant