[RFC] WASM / pyodide as a (somewhat) officially supported platform for scikit-learn #23727

ogrisel · 2022-06-22T13:06:41Z

We started having bug reports (at least one indirect, in real life report at a conference: #23707) from users of scikit-learn in WASM environment (e.g. pyodide / jupyterlite, pyscript...).

Shall we invest effort in setting CI tooling to properly test and maybe even handle packaging of scikit-learn to target that platform?

It's very likely that not all of scikit-learn will work out of the box, but with proper tooling in place we could maintain a public list of modules that have all their tests that pass and maybe a list of modules that required so patches to handle graceful degradation to target this platform (e.g. number of parallel worker threads with n_jobs).

@rth put some interesting info in the following comment on how to run the tests:

TSNE is broken in pyodide #23707 (comment)

Pros:

WASM is likely to be a very popular target platform, at least for education (can directly teach Python programming and ML concepts without having to teach how to install packages from the command line first).

Cons:

test execution is probably much slower that on our regular CI targets;
need to maintain a list of known issues / limitations;
more packaging, release process will be even more complicated;
SciPy is quite heavily patched because there is no working Fortran compiler on that platform (that might change soon with lfotran) so it relies on a semi-hackish Fortran to C transpilation step that introduces additional complexity.

ogrisel · 2022-06-22T13:09:51Z

Maybe we should start by listing the fraction of modules with failing test and scanning through them to estimate how many stem from a common known upstream limitation and how likely it is going to be lifted in the short to medium term or if there is a somewhat maintainable work-around we might want to include in the scikit-learn code base or as an external packaging patch.

rth · 2022-06-22T14:01:19Z

Yes, I think running it in CI is probably indeed a bit early (and also it would be really slow). The first more investigative steps could be a good start,

manually run the test suite module by module, see what fails (and report upstream). This would already be very helpful. Last time I did it was in 2018 in Package scikit-learn pyodide/pyodide#139 (comment) and the situation should be much better now.
figure out how to best run the test suite programmatically for a large package such as scikit-learn. Pyodide has a pytest plugin which will be better packaged once TST Make pyodide-test-runner installable pyodide/pyodide#2742 is merged. Once installed it exposes pytest fixtures that would allow running some Python code in Pyodide inside a browser (Chrome or Firefox) with selenium or Node.js. Though for now, this package has no users outside of Pyodide, so more work is likely necessary to make it standalone and re-usable by external packages.

Then once we have some way to run Python code inside the browser from a Python script (or pytest) on the host, the question remains how to best run the full scikit-learn test suite. The problem is that when running pytest.main over the full package directly it takes a while and no feedback is reported to the user until the run completes. Furthermore, if there is a fatal error in scipy somewhere (similar to a segfault in terms of outcome) then the whole session would crash. So it's probably better to run pytest inside WASM on smaller chunks, serialize back the results and concatenate them on the host. A bit similar things about which I was wondering in pytest-dev/pytest-xdist#336 as in the end the problem is very similar to running pytest on the remote node (except that communication is not happening over the network).

In any case, if anyone is interested in investigating this, I'd be happy to talk more about it.

ogrisel · 2022-06-22T15:46:40Z

Thanks for the summary, I agree with your plan.

ogrisel · 2022-06-22T15:53:35Z

Once the test runner tooling is improved, we could imagine a nightly run that would run the test suite of each top level scikit-learn module and consolidate a report of scikit-learn modules that work without any failure, run with some test failures or finally cause an unrecoverable crash of a fatal error of the WASM runtime environment (it would be great to automatically collect the post-mortem output of the JS console of the browser in such a case).

amueller · 2022-07-16T16:57:39Z

Btw, I think one of the benefits we'd get from WASM support is the ability to have interactive examples in the browser on the docs. I think that'll be a gamechanger for documentation.

lesteve · 2022-10-10T13:38:01Z

I put together a repo to run the scikit-learn inside Pyodide. I listed the issues I have spotted now there. This will need more investigation. Any feed-back, let me know!

https://github.com/lesteve/scikit-learn-tests-pyodide

github-actions bot added the Needs Triage Issue requires triage label Jun 22, 2022

ogrisel changed the title ~~RFC WASM / pyodide as a (somewhat) officially supported platform for scikit-learn~~ [RFC] WASM / pyodide as a (somewhat) officially supported platform for scikit-learn Jun 22, 2022

ogrisel added RFC Needs Decision - Include Feature Requires decision regarding including feature Needs Investigation Issue requires investigation and removed Needs Triage Issue requires triage labels Jun 22, 2022

rth mentioned this issue Jul 6, 2022

Finish out of tree build system (except xbuildenv deploy) pyodide/pyodide#2823

Merged

3 tasks

thomasjpfan removed the Needs Decision - Include Feature Requires decision regarding including feature label Nov 4, 2022

agriyakhetarpal mentioned this issue Mar 18, 2024

ENH: out-of-tree Pyodide builds in CI for pandas pandas-dev/pandas#57891

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] WASM / pyodide as a (somewhat) officially supported platform for scikit-learn #23727

[RFC] WASM / pyodide as a (somewhat) officially supported platform for scikit-learn #23727

ogrisel commented Jun 22, 2022 •

edited

ogrisel commented Jun 22, 2022 •

edited

rth commented Jun 22, 2022 •

edited

ogrisel commented Jun 22, 2022

ogrisel commented Jun 22, 2022

amueller commented Jul 16, 2022

lesteve commented Oct 10, 2022 •

edited

[RFC] WASM / pyodide as a (somewhat) officially supported platform for scikit-learn #23727

[RFC] WASM / pyodide as a (somewhat) officially supported platform for scikit-learn #23727

Comments

ogrisel commented Jun 22, 2022 • edited

ogrisel commented Jun 22, 2022 • edited

rth commented Jun 22, 2022 • edited

ogrisel commented Jun 22, 2022

ogrisel commented Jun 22, 2022

amueller commented Jul 16, 2022

lesteve commented Oct 10, 2022 • edited

ogrisel commented Jun 22, 2022 •

edited

ogrisel commented Jun 22, 2022 •

edited

rth commented Jun 22, 2022 •

edited

lesteve commented Oct 10, 2022 •

edited