New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: out-of-tree Pyodide builds for statsmodels
#9166
Comments
Someone who had significant time would need to contribute this. The core team is pretty small and mostly focused on quality and new features. |
Indeed, I would be happy to start working on this (I edited the issue description a bit in the time you responded). It might be a long ordeal given that this is to be done from the ground up – but as long as someone can help out with code review and suggestions, that would help streamline the process. |
I"m happy to try and help. I have no idea what the process looks like since we
|
Thanks for sharing, @bashtage. The compilation procedure should take care of Cythonizing and building wasm32-compatible shared object files, but I am not sure about how to get a BLAS distribution on WASM working (xref: OpenMathLib/OpenBLAS#4023). The requirements should be partly satisfied, since Edit: Footnotes |
The use of @agriyakhetarpal maybe you can look into how this works in the in-tree Pyodide build for statsmodels? CI logs, code comments and/or patches may show that. The first questions I would ask are:
|
For the record, I can help with this. I even still have my statsmodels commit rights, although I haven't exercised them in a long time. |
I did spend some time surfing through links and files, and I believe that I definitely have some more insights into both questions, but probably not complete answers:
It looks like yes, they do ship it: import scipy
scipy.linalg.cython_blas returns the WASM shared object compiled with Emscripten:
I am not entirely sure that I followed that – I checked: import os
os.listdir("/lib/python3.11/site-packages/scipy/linalg/") which returns expand to view files
where cdef int zgerc(int *m, int *n, z *alpha, z *x, int *incx, z *y, int *incy, z *a, int *lda) noexcept nogil are all returning integers, while locally this doesn't return anything (the cdef void zgerc(int *m, int *n, z *alpha, z *x, int *incx, z *y, int *incy, z *a, int *lda) noexcept nogil I tracked this change down to these lines in the Pyodide recipe for SciPy, and since these files are generated by the build system (or Cython?) at compilation time, the difference in return type seems to be coming from when OpenBLAS was used for SciPy in pyodide/pyodide#3331 – this should be the related patch that induced this change. As for How should we proceed with this? Based on inputs I receive, I can start conjuring a Pyodide build on my fork to test things out (and open a draft PR for visibility as needed if it helps with review and expedites things). I will have to use GitHub Actions, however – I do not have access or credentials for testing builds on Azure pipelines :) |
Minor update: I did compile a WASM wheel through a workflow on my fork, but there are issues with linkage and unresolved symbols appear. Error: Dynamic linking error: cannot resolve symbol Google search reveals some hits:
It looks like this is coming from the It is to be noted that there are warnings from the compilation, however – I am not sure how to debug if any of them is related for our case:
out of which, the first one is: blas.sgemm("N", "N", &model.k_states, &model.k_states, &model.k_states,
&gamma, &kfilter.predicted_state_cov[0,0,smoother.t+1], &kfilter.k_states,
smoother._input_scaled_smoothed_estimator_cov, &kfilter.k_states,
&beta, smoother._tmp0, &kfilter.k_states) so this could also be an issue with the BLAS vendor being linked to. Is there some advice on debugging this (I am unsure how to go forward)? |
Unfortunately I don't have much to contribute on the fortran compiler side, although I can confirm that the following warnings are not a problem (just an early
|
Can you run it with more verbosity so I can see which file? I suspect this is somehow intermediate because stats models does not use Fortran. |
Neither the Pyodide build command nor the |
This is not something that is in statsmodels from what I can tell. Likely either to do with Blas or SciPy. I just checked the generated C source for statsmodels and |
What happens if you just |
That doesn't work, it fails to find the I can confirm that it passes locally |
This just tells you that things are not gettign compiled. The log looks like it is building
Can you see it in your tree? When I said "python" I means the pyodide python. I don't reallly know how pyodide works, but I am assuming you can use it like standard Python. Looks like it woudl be something like
|
Yes, I see in the logs that is is being copied into the wheel as well, after it gets compiled:
I do have
One can actually – in the workflow file, I activated a special virtual environment that was created with However, the Node.js code snippet you mentioned is usable directly too and should also return the same output. It looks like the compilation is successful, but it cannot be loaded on WASM – $ otool -vL statsmodels/robust/_qn.cpython-311-darwin.so
statsmodels/robust/_qn.cpython-311-darwin.so:
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.61.1)
time stamp 2 Thu Jan 1 05:30:02 1970 which means that the shared object is self-contained (which should be good). |
Is your feature request related to a problem? Please describe
Hi there! I am opening this feature request to gauge ideas and comments about out-of-tree Pyodide builds, i.e., wasm32 wheels via the Emscripten toolchain for
statsmodels
. In my most recent work assignment, I am working on improving the interoperability for the Scientific Python ecosystem of packages with Pyodide and with each other, which shall culminate with efforts towards bringing interactive documentation for these packages where they can then be run in JupyterLite notebooks, through nightly builds and wheels for these packages pushed to PyPI-like indices on Anaconda, at and during a later phase during the project.It looks like in-tree builds in the Pyodide have been built in the past as noted by the conversations in #7956 – and they seem to be maintained with every release for Pyodide. However, this issue proposes out-of-tree builds for
statsmodels
on its own CI and build infrastructure. I would be glad to work on this forstatsmodels
.Describe the solution you'd like
statsmodels
are pursuedDescribe alternatives you have considered
N/A
Additional context
This project is being tracked at Quansight-Labs/czi-scientific-python-mgmt#18 and has started out with packages like PyWavelets (PyWavelets/pywt#701) and NumPy (numpy/numpy#25894) being complete, thanks to @rgommers. Other packages besides
statsmodels
, such asmatplotlib
,zarr
,pandas
, and many more are planned to follow suit soon in the coming months.The text was updated successfully, but these errors were encountered: