New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EFF Reduce the size of shared objects of the C-extensions generated by Cython #27767
Comments
Thanks for investigating @jjerphan !
In Pyodide I think we are already stripping most of them, they why they are 2x smaller there. Outside of browser, I'm not sure to what extent stripping debug informaiton is good. If there is a segfault it's easier to investigate with debug information. |
By default, debug symbols aren't used and Line 477 in 77aeb82
@thomasjpfan, @ogrisel, @jeremiedbb, @lorentzenchr, @Micky774: Do you think we should strip all symbols and optimize for size? Or do you think it is worth keeping symbols unchanged? |
I think it would be reasonable to strip symbols, since I imagine the vast majority of our user base doesn't actually use them. Folks can always build from source if needed. It's mainly helpful on CI to avoid non-descriptive |
I am +1 on stripping the symbols by default. If we need the symbols for development, then we set the environment variable to enable them. |
I am trying to see whether there are other options to reduce the size of the native extensions' shared objects. Do you see anything else? |
Thanks for the suggestions! For the WASM use-case, I think we are already stripping symbols in Pyodide that's why .so are 2x smaller than say on x86_64, $ pyodide auditwheel exports sklearn/utils/_random.cpython-311-wasm32-emscripten.so
sklearn/utils/_random.cpython-311-wasm32-emscripten.so:
FUNC __wasm_call_ctors
FUNC __wasm_apply_data_relocs
FUNC PyInit__random
GLOBAL __pyx_module_is_main_sklearn__utils___random
$ ls -lh sklearn/utils/_random.cpython-311-wasm32-emscripten.so
-rw-------@ 1 rth staff 121K Sep 25 22:40 sklearn/utils/_random.cpython-311-wasm32-emscripten.so However it's still rather large with likely some duplicate objects between .so (aside from the exported symbols). |
Inter-procedural optimizations (such as link-time optimization) might help reducing the size of shared objects since it generally remove objects' duplication in between translation unit for each shared object. Ideally, objects' duplication must not be present across shared objects. Resolving cython/cython#2356 seems relevant in this regard, but I do not know of other mitigations. I am afraid I do not have time to have a look at this issue right now. I'll try to see if I can explore solutions soon. |
Context
scikit-learn uses C-extensions in critical part of its implementations via Cython.
Each C-entension is build from one or several Cython translation unit (a
.pyx
file with a potential.pxd
companion file).In scikit-learn, each C-extension build consists of a single Cython translation which is transpilled to a C or C++ translation unit, which is then compiled to a shared object file.
The resulting C or C++ translation unit contains the code translation from Cython to C and large preambule and epylogue of macros, functions, structs, global variables such as virtual tables, Python module definition, etc.
For instance, while the code of
sklearn/utils/_heap.pyx
only consists of less than 100 lines for a single function, the resultingsklearn/utils/heap.c
file consists of more than 3500 lines, most of being the preambule's and the epilogue's injected by Cython:Content of the generated
sklearn/utils/heap.c
Problem
Currently the uncompressed size of scikit-learn is around 48.8MB, 20MB of which are shared object files. As reported by @rth in pyodide/pyodide#4289, while shared object files are optimized for Emscripten quite heavily, they still accounts for most of the size of scikit-learn on this stack.
Extensions' shared object sizes on Linux
Possible solutions
Strip all symbols and optimize for size
This can be done by adding
-Wl,--strip-all
toextra_link_args
and-Os -g0
toextra_compile_args
.In practice, it can significantly shrink shared object (up to nearly 50% size reduction):
Extensions' shared object sizes on Linux after striping all symbols and optimizing for size
Group several translation units within C extensions (and use interprocedural optimization)
So as to reuse duplicated symbols in shared objects and perform optimization over several translation units (such as inlining functions, etc.)
The text was updated successfully, but these errors were encountered: