-
-
Notifications
You must be signed in to change notification settings - Fork 788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vendor standard libraries in a zip file #3166
Conversation
Thanks for doing this!
I think we should compare after gzip compression (done by the CDN) in which case I suspect the difference will be less. So from what I understand you are shipping this zip inside the |
Right, that's a good point. I should check that.
Yes, that sounds very interesting. If we ship a zip file instead of a tar file, we might don't need to extract it to the file system, as Python can import it directly. So maybe we won't need to care about implementing unpackers. |
I did some benchmarks of this on my laptop with files served with gzip compression,
(we should add instructions to the docs https://www.npmjs.com/package/http-server is much better for benchmarking tasks than the built-in python http.server, though I haven't managed to make brotli compression work), mainThe download size for loading console.html: 6.14 MB (gzip compressed), 14.43 MB (uncompressed) Load time: Measured with caching in the console disabled each time in a new tab, but I'm not sure to what extent that really disables all WASM related caching. Creating a whole new browser profile for each run is also not very practical. An example of an exception in the stdlib,
very reasonable. This PRThe download size for loading console.html: 5.71 MB (gzip compressed), 12.21 MB (uncompressed) Load time: An example of an exception in the stdlib:
still OK (I wasn't sure how it would display with the zip file). Maybe is there a way to make this path a bit shorter to avoid python-3.10 twice in the path? So overall the size improvement after compression is 8%, so not that much, however, the load time improvement for loading just core Pyodide + stdlib is really large,
So overall it looks great @ryanking13 , and it also addresses the issue of a stdlib in a separate zip file without dealing with custom decompressors. Would you mind adding a changelog particularly mentioning the load time improvement and crediting python/cpython#29984 ? I think we should include this in the next release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the Python developers already solved the bootstrap problem for us by ensuring that they can unzip before they need to import any Python files. That makes sense. It's cool that you didn't need to add hardly any runtime code for this.
cpython/wasm_assets.py
Outdated
# Note that all those tests are moved to the subdirectory of test in Python 3.11, | ||
# so we don't need this after Python 3.11 update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just leave it then? It won't be that long before we update to 3.11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we zip these directories, loadPackage('test')
fails because it will try to unzip files into ctypes/test
, unittest/test
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be that long before we update to 3.11.
BTW, are you working on Python 3.11 update recently? If not, I have some time to try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just noticed that the upstream commits that move test directories are not applied to Python 3.11, this part is going to be needed until we have Python 3.12...
@@ -346,6 +345,8 @@ If you updated the Pyodide version, make sure you also updated the 'indexURL' pa | |||
throw new Error("Didn't expect to load any more file_packager files!"); | |||
}; | |||
|
|||
setHomeDirectory(Module, config.homedir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you move setHomeDirectory
after initializing the module? It's better if possible to keep all of this stuff before initializing Python because there are code paths related to virtual environments that care about these paths on initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bug in main branch: /home/pyodide/lib/python310.zip
is in the sys.path not /lib/python310.zip
.
>>> import sys
>>> sys.path
['/home/pyodide', '/home/pyodide/lib/python310.zip', '/lib/python3.10', '/lib/python3.10/lib-dynload', '/lib/python3.10/site-packages']
moving setHomeDirectory
after the initialization fixed this, but maybe there can be a better fix.
Wow, thanks for the benchmark and for reminding me of this @rth. I'll re-work a bit on this PR next week. Some of the pain points in this PR were resolved in Python 3.11, so I think we can get more benefits from this PR when we update to Python 3.11.
The traceback showing |
As you prefer, but I think it would be good to do a stable release soon. If we go for an Python 3.11 update, we would need to do another alpha release before that. But certainly updating to 3.11 would be great also. |
Packages that currently aren't building in Python 3.11:
And also scipy. Many of these just need to be recythonized though. |
I guess I should open a python 3.11 PR soon so people know what the status is. |
Yeah, we can certainly merge this PR before updating to Python 3.11. I'll try to fix the traceback before merging though. |
Okay, I changed it to show only a relative path in traceback: >>> import pathlib
>>> pathlib.Path("a/b/c").write_text("1")
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "pathlib.py", line 1152, in write_text
File "pathlib.py", line 1117, in open
FileNotFoundError: [Errno 44] No such file or directory: 'a/b/c
>>> json.decoder.JSONDecoder().decode(1)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "json/decoder.py", line 337, in decode
TypeError: expected string or bytes-like object I found that Python embeddable package for Windows which also ships stdlibs in a zip file shows tracebacks like this. So I think it makes sense to follow it? |
Maybe we should make an announcement (on the public mailing list & twitter) that we are considering shipping .pyc compiled sources in the next release (both for stdlib and packages in #3253) and see if there are any concerns, particularly for teaching. I can't think of anything right now, but it doesn't hurt asking since it's a pretty large change (but IMO worth it). Because the other solution would be to deploy two versions one source compiled, the other not. It's significantly more work for us. Though maybe we should still do this at least for packages, as otherwise it would be pretty difficult to debug if anything goes wrong and people want to check the source. |
Yes, that sounds very nice to me!
Yeah, that would be a bit complicated.. especially since it will add extra maintenance burden. |
I find the traceback |
Yes, we can make the path start with |
Is the file in |
The traceback normally shows an absolute path, so if it shows a relative path, it is distinguishable. Though I agree that it is only understandable for those who know what is happening behind... Or maybe we can try something different which explicitly tells that the file is inside the zipfile, something like:
It somewhat matches with the
Not sure this is a good idea 🤔 |
+1 for this. I also agree that displaying an absolute path that doesn't exist can be very confusing. Particularly given that Pyodide is used a lot for teaching Python and so we should try to be as close to the normal experience as possible. |
Opened a discussion: https://discuss.python.org/t/proper-file-paths-to-show-in-exception-for-py-compiled-files/21108. |
In view of #3269 @ryanking13 could we have a flag here to source compile or not? I think shipping a zipfile always is maybe less problematic. I would say we add flag to enable/disable this at build time, merge it as disabled for now, and then see what we deploy and whether we could deploy both. |
return (file_pyc, archivename) | ||
|
||
|
||
def create_stdlib_zip( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could put most of these functions in pyodide_build.packer (or something similar) so we could unit test them better? E.g. running this on some .py files in a temp dir, and checking the output zip (also to check that it doesn't contain .py).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound good to me. That would require some modification in our build system since pyodide-build is not used for CPython build steps now, but I think it wouldn't be that hard.
Yes, that sounds good to me. |
I moved the work in this PR into a build flag
So we can merge this PR as is without affecting default behavior. At the same time, people in the Jupyter workshop can experiment with this. I will open a separate PR to disable .pyc compilation. |
Description
Vendors standard libraries in a zip file (
/lib/python310.zip
) in a compiles(pyc) format. Thewasm_assets.py
code which creates a zip file is adapted from the CPython repository.Zipping standard libraries reduces the size of
pyodide.asm.data
around ~2MB:Note that I haven't checked how this PR affects the import time. I wasn't able to find a good benchmark or experiment of how using zipimport affects the import time.
cc: @tiran
Partially related: #3050
Checklists