Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scipy 1.12.0 #4499

Merged
merged 7 commits into from
Mar 8, 2024
Merged

Scipy 1.12.0 #4499

merged 7 commits into from
Mar 8, 2024

Conversation

hoodmane
Copy link
Member

@hoodmane hoodmane commented Feb 9, 2024

@lesteve
Copy link
Contributor

lesteve commented Feb 9, 2024

Seems like there are genuine CI issues with scipy.stats._page_trend_test (I checked it does exist in scipy 1.12).

Ping me when CI is green and I can give it a go in https://github.com/lesteve/scipy-tests-pyodide.

In an ideal world by the way, https://github.com/lesteve/scipy-tests-pyodide would be inside Pyodide org or repo so scipy tests can be run more easily without me having to intervene. Let me know if you have suggestions on how to do this!

@hoodmane
Copy link
Member Author

hoodmane commented Feb 9, 2024

scipy.stats._page_trend_test

Yeah we've had ongoing trouble with this one since it ends in test our test unvendoring system pulls it out. I'll probably add a special workaround in pyodide-build since it's probably easier than the current fix.

@hoodmane
Copy link
Member Author

hoodmane commented Feb 9, 2024

In an ideal world by the way, https://github.com/lesteve/scipy-tests-pyodide would be inside Pyodide org or repo so scipy tests can be run more easily without me having to intervene. Let me know if you have suggestions on how to do this!

I wonder whether we can set it up as an action to run only when scipy, numpy, openblas, or boost-cpp is touched.

@lesteve
Copy link
Contributor

lesteve commented Feb 9, 2024

FYI I had a quick look running the scipy tests it seems like most of the new issues are due to scipy.stats._page_trend_test and also some tests were added in scipy 1.12 that use threads or multiprocessing.

I wonder whether we can set it up as an action to run only when scipy, numpy, openblas, or boost-cpp is touched.

Seems reasonable, do you think that would fit more naturally in CircleCI or github actions (not super familiar with your CI setup)?

FYI this is the way I have it set-up for now:

  1. Build a Pyodide distribution in debug mode (useful for better stack-traces) up to scikit-learn. This takes roughly 40 minutes. Worfklow file. This uploads an artifact with the Pyodide dist.
  2. Run the scipy tests takes roughly 40 minutes. This reuses the Pyodide dist from previous workflow. Workflow file

Maybe you already have 1. set-up and I can only think how to integrate 2. in Pyodide CI?

Full disclosure I also test scikit-learn dev with Pyodide dev:
3. In 1. I actually build scikit-learn development version so I can test scikit-learn dev in https://github.com/lesteve/scikit-learn-tests-pyodide. I think this part is not super useful anymore since we test scikit-learn dev with Pyodide stable in scikit-learn CI. Worfklow file

@steppi
Copy link

steppi commented Feb 9, 2024

Thanks for the ping @hoodmane. Let me know if there's anything we can do on the SciPy side to help make things easier for you.

@hoodmane
Copy link
Member Author

hoodmane commented Feb 9, 2024

@steppi The main thing we're waiting on is for LFortran to compile all of scipy:
https://lfortran.org/blog/2024/01/lfortran-compiles-60-of-scipy/
Once that happens hopefully we can switch over to that from the horrible f2c hacks that we currently use and have a much better time.

@hoodmane
Copy link
Member Author

hoodmane commented Feb 9, 2024

@lesteve We don't currently build any packages in debug mode. I think it might be fine to just add your gha workflow as-is to a separate scipy-tests.yaml file and filter it with:

on:
  push:
    branches: [main]
    paths:
      - 'packages/libf2c/**'
      - 'packages/numpy/**'
      - 'packages/openblas/**'
      - 'packages/boost-cpp/**'
      - 'packages/scipy/**'
      - 'packages/scikit-learn/**'
  pull_request:
    branches: [main]
    paths:
      - 'packages/libf2c/**'
      - 'packages/numpy/**'
      - 'packages/openblas/**'
      - 'packages/boost-cpp/**'
      - 'packages/scipy/**'
      - 'packages/scikit-learn/**'

@hoodmane
Copy link
Member Author

hoodmane commented Feb 9, 2024

@steppi I think one simple thing that would help is if you open issues on Pyodide or otherwise ping us on github when scipy makes a release.

@steppi
Copy link

steppi commented Feb 9, 2024

@steppi The main thing we're waiting on is for LFortran to compile all of scipy:
https://lfortran.org/blog/2024/01/lfortran-compiles-60-of-scipy/
Once that happens hopefully we can switch over to that from the horrible f2c hacks that we currently use and have a much better time.

Cool. From the other side, we're also working on reducing SciPy's dependency on Fortran. There's a tracking issue of the progress so far, scipy/scipy#18566.

@steppi I think one simple thing that would help is if you open issues on Pyodide or otherwise ping us on github when scipy makes a release.

Sure thing, will do. As a heads up, the plan is for 1.13 to come out later this month, scipy/scipy#19880. It's an out-of-band release to support NumPy 2.0.

@hoodmane
Copy link
Member Author

hoodmane commented Feb 9, 2024

By the way @steppi another problem for us is that scipy is pretty big and has many dynamic libs and we have to preload .so modules and most of the time people only want a small subset of the features. I was wondering whether there has been any discussion about uploading the different components of scipy as separate wheels so you could have scipy-integrate, scipy-interpolate, ... and then just make scipy itself a list of dependencies? It would be nice if people had the option to be more specific about what parts they need.

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have some minor comments, otherwise LGTM.

packages/scipy/meta.yaml Show resolved Hide resolved
pyodide-build/pyodide_build/io.py Outdated Show resolved Hide resolved
@ryanking13
Copy link
Member

The main thing we're waiting on is for LFortran to compile all of scipy:
https://lfortran.org/blog/2024/01/lfortran-compiles-60-of-scipy/
Once that happens hopefully we can switch over to that from the horrible f2c hacks that we currently use and have a much better time.

I recently read a blog post about Scipy using llvm-flang (https://labs.quansight.org/blog/building-scipy-with-flang) that tells the result was successful. So it would be an interesting experiment to try llvm-flang.

@lesteve
Copy link
Contributor

lesteve commented Feb 12, 2024

There seems to be a new error in this PR, somehow the C++ exception is not raised as a Python exeption but creates a Pyodide fatal error. The same scipy test is passing in Pyodide latest console.

import micropip
await micropip.install('scipy-tests')
import pytest
pytest.main(['--pyargs', 'scipy.stats.tests.test_distributions', '-k', 'issue_14606', '-v'])

Stack-trace:

Pyodide has suffered a fatal error. Please report this to the Pyodide maintainers.
The cause of the fatal error was:
CppException boost::math::evaluation_error: Error in function boost::math::tgamma<double>(double,double): Series evaluation exceeded 1000000 iterations, giving up now.
    at convertCppException (/home/lesteve/dev/scipy-tests-pyodide/node_modules/pyodide/pyodide.asm.js:1006:627)
    at API.fatal_error (/home/lesteve/dev/scipy-tests-pyodide/node_modules/pyodide/pyodide.asm.js:1006:921)
    at API.maybe_fatal_error (/home/lesteve/dev/scipy-tests-pyodide/node_modules/pyodide/pyodide.asm.js:1006:1755)
    at Module.callPyObjectKwargs (/home/lesteve/dev/scipy-tests-pyodide/node_modules/pyodide/pyodide.asm.js:1002:907)
    at Module.callPyObject (/home/lesteve/dev/scipy-tests-pyodide/node_modules/pyodide/pyodide.asm.js:1002:1905)
    at _.apply (/home/lesteve/dev/scipy-tests-pyodide/node_modules/pyodide/pyodide.asm.js:1002:15982)
    at Object.apply (/home/lesteve/dev/scipy-tests-pyodide/node_modules/pyodide/pyodide.asm.js:1002:14221)
    at main (/home/lesteve/dev/scipy-tests-pyodide/scipy-pytest.js:55:24) {
  ty: 'boost::math::evaluation_error',
  pyodide_fatal_error: true
}

@hoodmane
Copy link
Member Author

So it would be an interesting experiment to try llvm-flang.

I read that blog too, but when I tried it out I found out that llvm-flang has no wasm backend currently...

@rgommers
Copy link
Contributor

rgommers commented Feb 12, 2024

another problem for us is that scipy is pretty big and has many dynamic libs and we have to preload .so modules and most of the time people only want a small subset of the features. I was wondering whether there has been any discussion about uploading the different components of scipy as separate wheels so you could have scipy-integrate, scipy-interpolate, ... and then just make scipy itself a list of dependencies?

The last time that this was briefly discussed is maybe 4 years ago. There would be significant overhead in dealing with separate packages, and there are risks with new failure modes when only one scipy-xxx component gets updated inside an env (pip & co. aren't known for reliability in such cases). There is also significant internal dependencies between packages; the largest ones like special, sparse, and linalg are basically almost always needed - so what you'd gain is only removing the most high-level and niche ones if they're not used.

The .so preloading is more painful though than only download bandwidth and disk space usage. We didn't consider this preloading issue before. One thing that conda-forge did recently is split off the test suite, making it separately installable as scipy-tests. It reduced package size by ~30%, so it's not an unreasonable thing for a distro to do (we may do it for SciPy as a whole in the future). I could imagine Pyodide breaking SciPy up into a few parts, with the rarely used modules (odr, io, etc.) split off. There are also other use cases where size really matters (e.g., AWS Lambda has a hard size limit), so we could even consider adding a build mode to make submodules configurable perhaps.

@hoodmane
Copy link
Member Author

hoodmane commented Feb 12, 2024

Thanks @rgommers. Also the benefits of splitting up scipy would be pretty low for anyone but us even if the different parts were pretty orthogonal, and Pyodide by itself probably doesn't justify such a large amount of work.

One thing that conda-forge did recently is split off the test suite, making it separately installable as scipy-tests.

We also do this. Downloading + preloading the test suite would be pretty painful and unnecessary.

I could imagine Pyodide breaking SciPy up into a few parts, with the rarely used modules (odr, io, etc.) split off.

It's already helpful that you know which modules are rarely used, since we sure don't. I think we should likely do this when someone finds the energy since it will help everyone else using scipy to have a less heavy application.

.so preloading

The other comment is that the holy grail for us is wasm stack switching which would allow us to not do this really painful preloading. Though it would require the user to either:

  1. use an async JavaScript API, or
  2. explicitly request that specific modules be preloaded somehow

so I think we'll still need some way to opt into the old auto preloading setup. And of course for browsers we have to wait until they stop requiring an experimental setting to enable it. In node it's pretty easy for people to pass the flag, so we might start considering stuff like this for node. Combined with the ability to mount site-packages from disk, it would enable using scipy without the enormous startup time.

@ryanking13
Copy link
Member

ryanking13 commented Feb 13, 2024

I read that blog too, but when I tried it out I found out that llvm-flang has no wasm backend currently...

Oh, I didn't know that thanks. Looking at this discussion, it seems that the wasm backend is currently blocked due to lack of manpower, but regarding the design of LLVM, it shouldn't be too difficult to add a backend.

I found that someone made a patched version of llvm that supports wasm backend that is used to build WASM R: https://github.com/r-wasm/flang-wasm. Maybe it is worth investigating.

@rgommers
Copy link
Contributor

It's already helpful that you know which modules are rarely used, since we sure don't. I think we should likely do this when someone finds the energy since it will help everyone else using scipy to have a less heavy application.

Sounds good. I'll try to find some time to figure out what a sensible split would be. Need to find my script to figure out what depends on what exactly - we don't keep track of that very well, and sometimes someone adds a new dependency between submodules by accident.

@steppi
Copy link

steppi commented Mar 8, 2024

Any news on when this might be merged?

@hoodmane
Copy link
Member Author

hoodmane commented Mar 8, 2024

I think I should just go ahead with it. My concern is @lesteve keeps finding more regressions, but we don't really have the energy to deal with them...

@hoodmane hoodmane merged commit 6c8888c into pyodide:main Mar 8, 2024
40 of 41 checks passed
@hoodmane hoodmane deleted the scipy-1.12.0 branch March 8, 2024 19:10
@hoodmane
Copy link
Member Author

hoodmane commented Mar 8, 2024

Thanks everyone!

@hoodmane
Copy link
Member Author

hoodmane commented Mar 8, 2024

Should have added a changelog...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants