Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshots: allow snapshotting some user code #4720

Merged
merged 17 commits into from May 10, 2024

Conversation

hoodmane
Copy link
Member

@hoodmane hoodmane commented Apr 26, 2024

This adds some basic ability to snapshot after executing user code. It is pretty brittle right now:

  1. It will crash if the user loads any binary extensions before taking the snapshot
  2. It doesn't track changes to the file system
  3. It doesn't mind there being non-reproducible objects in the hiwire map

Snapshots will probably have to be experimental for quite a while. This should ideally get a bunch of tests.

  1. I think I have a pretty good solution for this, which I can work on in a followup.

  2. One possibility here is we could serialize the entire filesystem state into the memory snapshot. This would be hard and make the snapshot big, but we wouldn't have to load python_stdlib.zip when restoring from a snapshot so it probably wouldn't increase the total download size by much...

  3. Should be pretty easy to fix.

cc @pfebrer

Checklists

  • Add a CHANGELOG entry
  • Add / update tests
  • Add new / update outdated documentation

This adds some basic ability to snapshot after executing user code. It is pretty
brittle right now:
1. It will crash if the user loads any binary extensions before taking the
snapshot
2. It doesn't track changes to the file system
3. It doesn't mind there being non-reproducible objects in the hiwire map

Snapshots will probably have to be experimental for quite a while.

1. I think I have a pretty good solution for this, which I can work on in a
followup.

2. One possibility here is we could serialize the entire filesystem state into
the memory snapshot. This would be hard and make the snapshot big, but we
wouldn't have to load python_stdlib.zip when restoring from a snapshot so it
probably wouldn't increase the total download size by much...

3. Should be pretty easy to fix.
@pfebrer
Copy link
Contributor

pfebrer commented Apr 26, 2024

Thanks for the heads up! Regarding point 1, does that mean that only pure python packages can be imported before generating the snapshot?

@hoodmane
Copy link
Member Author

Yeah after this PR that is still the case. I will remove this restriction in the next PR. To fix it we have to patch this line in Emscripten:
https://github.com/emscripten-core/emscripten/blob/main/src/library_dylink.js#L633
When taking the snapshot we need to store the pointer to the dylink metadata for each loaded .so file into our snapshot metadata. When restoring the snapshot, we need to make sure that we put the dylink metadata at the same address as before. We also have to store all the handles returned from dlopen that haven't been dlclosed.

@pfebrer
Copy link
Contributor

pfebrer commented Apr 26, 2024

Cool! To test it in my application I need to import packages with C extensions. Point 2 is not a problem and I don't understand what point 3 means 😅

Happy to see that you are getting closer to it!

@hoodmane
Copy link
Member Author

For number 3, the issue is if you did something like:

pyodide.runPython(`
from js import fetch
some_var = fetch(some_url)
`);

Pyodide says okay some_var is a JsProxy with table index 10. But the entry in table index 10 is a Promise<FetchResponse> which is not something we can really serialize. So if you try to make a snapshot after executing this code, it should throw an error explaining this. Unfortunately in real code it is be pretty hard for us to trace through and explain why the JsProxy exists but we can at least say what the value is.

@hoodmane
Copy link
Member Author

I guess we'll also have a problem if ctypes is used...

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. To be honest, I don't understand the code inside snapshot.ts, but it somehow works so I'll approve so you can do further experiments :) I hope we can clean up the code when this feature gets more stable.

packages/gensim/meta.yaml Outdated Show resolved Hide resolved
src/templates/makesnap.mjs Outdated Show resolved Hide resolved
src/js/snapshot.ts Outdated Show resolved Hide resolved
src/tests/test_snapshots.py Show resolved Hide resolved
@hoodmane hoodmane merged commit a802179 into pyodide:main May 10, 2024
37 of 39 checks passed
@hoodmane hoodmane deleted the memory-snapshots-ffi branch May 10, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants