META: FORTRAN Code inventory #18566

ilayn · 2023-05-27T09:25:20Z

steppi · 2023-05-27T20:10:33Z

specfunactually isn't so bad because each function is mostly self contained, and it can be converted bit by bit. I've already made good headway with hyp2f1 and that's arguably the most difficult function to replace. I'm determined to banish the F77 from special and have time to get back to it now.

I recently took a close look at ODEPACK to review gh-14552. Converting that will be a bear since it consists of large pieces of machinery with sophisticated control logic. I'd be up for helping to replace it with a C, C++, or Cython implementation if we can find someone else with sufficient, time, knowledge, and interest to work on it together with me. I think it would have to wait until the all of the deliverables are met for the SDG @tirthasheshpatel and I have to work on special though.

j-bowhay · 2023-05-28T06:16:25Z

I am interested in looking at dop over the summer as I am fairly familiar with Hairer's work however my Fortran knowledge is fairly minimal so won't block anyone else's efforts.

ilayn · 2023-05-28T08:12:53Z

You don't need to know Fortran. If you know about the functionality then you can approximate as closely as possible without following the code faithfully. You don't need to jump through the GOTOs but kind of get why and where it jumps. I am basically following the logic and not the code. Because you go quickly insane with Arithmetic IFs and Computed GOTOs.

Kai-Striega · 2023-05-28T08:57:19Z

I'd also be interested in picking up some of these. I'll take a close look during the week and pick somewhere to get started.

steppi · 2023-05-28T17:03:30Z

I'd be happy to review PRs for replacing old Fortran code. I think I can take a look at gh-18570 this week.

I am interested in looking at dop over the summer as I am fairly familiar with Hairer's work however my Fortran knowledge is fairly minimal so won't block anyone else's efforts.

Picking up the language itself is actually not too hard. You could look through https://web.stanford.edu/class/me200c/tutorial_77/ to get started, and look things up here https://docs.oracle.com/cd/E19957-01/805-4939/index.html as needed as you work through code. Like @ilayn said, it's the unstructured control flow that can make understanding the code so difficult.

You don't need to jump through the GOTOs but kind of get why and where it jumps. I am basically following the logic and not the code. Because you go quickly insane with Arithmetic IFs and Computed GOTOs.

What I tend to do is follow along with a notebook and try to work out flowcharts in an almost mechanical fashion, and then use these to reason about parts of the program. I've found trying to follow along and keep track of the state in ones head like one can do in structured programs is basically impossible.

zachjweiner · 2023-05-28T18:05:08Z

@j-bowhay The dop methods already have pure-Python implementations accessible via the "new" interface, solve_ivp. LSODA is the only wrapped Fortran method via the new interface. The *vode methods (also from odepack) are the only ones available in the old interface but not the new one, but I'm not sure how much use they get (or "should" get) in practice.

LSODA does provide unique functionality (automatic stiffness detection and switching between stiff/nonstiff methods), but I don't know that the algorithm itself is especially complex. The stiff and nonstiff methods individually should be straightforward (pure-Python BDF---LSODA's stiff solver---is already available via solve_ivp). Though the stiffness detection and logic to handle switching would be new, from skimming the algorithm description it only seems modestly involved.

But rather than simply directly reimplementing LSODA, I think it would be a better use of time to implement abstracted utilities for method switching/composition (which could then be used to replicate LSODA). DifferentialEquations.jl does this and finds that combinations of other stiff and nonstiff methods can outperform LSODA. Obviously this goes beyond the mission of this issue, but if effort is to be spent on the ODE solvers I don't think it should be to simply directly translate LSODA/odepack. With a proper, modular abstraction, I would expect the individual components (stiff and nonstiff solvers with their own dense output, stiffness detection, and logic for dynamically switching) to be simpler to tackle than the LSODA source would suggest.¹ (And I don't immediately see a reason these should be implemented in compiled code, at least at first.)

On a semi-related note: a year ago I rewrote solve_ivp (and just the Runge-Kutta methods) in Cython to see how much overhead could be mitigated. There were a number open questions/issues to address - just mentioning here if anyone is interested in discussing elsewhere.

There is an (apparently inactive) C rewrite of LSODA, which may be easier to reference for anyone digging into it. ↩

ilayn · 2023-05-28T18:47:19Z

I don't know enough solver knowledge to contribute to your analysis. But it sounds like we can indeed have a look at the Cython implementation to see whether we can improve or I don't know what you would like to get out of that exercise. Typically a verbose Cython code can get to wrapped Fortran speed with not so much effort. All depends on the sophistication.

The main issue we are trying to tackle is to get rid of this dormant unmaintained code which is blocking us to achieve more user friendly and feature rich API. In a way, I come to believe that the code you wrap forces you into its own API, good or bad. And it is impossible to troubleshoot these things. #14807 is one of my favorite bug reports I've encountered here. In other words, scientific python still smells like fortran-like API which is a shame if you compare it with other coding domains and frameworks.

Hence if domain experts think there are better alternatives, then we can switch to them. If they think it's a waste of time, then we deprecate it. Or it is a daunting task to do anything about but we still need the functionality, then a few of us who are crazy enough can sit down and give it a go. After spending last month with this fortran code, I'm simultaneously amazed that the authors provided such robust code which still works and being used by so many people, and lost almost all my respect to scientific computing community at the same time for letting this code untouched for the last 40 years making the same excuses.

Please let me know if I can provide any help in these issues. I'm more inclined to touch things on sparse.linalg and optimize because I know something about those things but I can try taking a stab at translations or cython code.

zachjweiner · 2023-05-28T19:46:01Z

But it sounds like we can indeed have a look at the Cython implementation to see whether we can improve or I don't know what you would like to get out of that exercise. Typically a verbose Cython code can get to wrapped Fortran speed with not so much effort. All depends on the sophistication.

I'm happy to share it but I should've been clearer that I think it's independent from the Fortran issue (I was just rewriting the pure-Python routines in Cython). Just wanted to mention for visibility to people thinking about compiled code for IVP solvers. (And indeed I was able to get the overhead to << 1 us but only with an API change, and I hadn't tackled dense output.) But the odepack routines come from compiled code only because odepack is an old, widely-used implementation, not because they intrinsically need to be compiled to be useful/reasonably performant.

Hence if domain experts think there are better alternatives, then we can switch to them. If they think it's a waste of time, then we deprecate it.

I should've also mentioned that, since a large fraction of the odepack-dependence comes only from the "old" ode API, it would probably be best to decide whether to actually, officially deprecate it first. After that, my only point is that I think it would be more worthwhile in the long run to enhance/add to the functionality under the solve_ivp umbrella (as I described above) than to painstakingly translate LSODA from Fortran.

And it is impossible to troubleshoot these things. #14807 is one of my favorite bug reports I've encountered here. In other words, scientific python still smells like fortran-like API which is a shame if you compare it with other coding domains and frameworks.

Thanks for sharing - would be glad to see the ecosystem move toward being not only more maintainable and robust to such correctness issues, but also more extensible and improvable.

j-bowhay · 2023-05-28T21:31:31Z

Thanks @zachjweiner for the summary. I did wonder what the overlap was with the existing Python code but was yet to put in any effort to investigate. If I understand you correctly, the only barrier to removing the dop directory is the legacy ode solvers? If so I will probably focus my efforts elsewhere on something hopefully more impactful.

ev-br · 2023-05-29T11:24:53Z

There is also some Fortran code in stats, all probably rather low-hanging fruit.

$ ll scipy/stats/*f
-rw-rw-r-- 1 br br 46387 мая  8  2022 scipy/stats/mvndst.f
-rw-rw-r-- 1 br br  2563 ноя 21  2022 scipy/stats/mvn.pyf
-rw-rw-r-- 1 br br  1528 ноя 21  2022 scipy/stats/statlib.pyf

EDIT: edited these in into the OP, with a subjective difficulty assessment.

ev-br · 2023-05-29T11:38:06Z

This is great @ilayn !
Now that we have a laundry list, I think it would be helpful to add two more classification indicators:

whether a module causes problems or not. For instance, ODRPACK seems stable and without open issues. So probably low prio.
alternatives, both external and interternal. For instance, VODE: I don't think it makes much sense to just transcribe the fortran implementation without taking a look at SUNDIALS which is in C++. Also solve_ivp, as discussed. (For ode solving specifically, I think it's best to discuss it in a separate issue @zachjweiner --- I think a cythonized solve_ivp might be very welcome).
One other example is QUADPACK. Maybe the best course of action is to assess what's missing in quad_vec, port the missing bits; once it's feature complete, we can just deprecate quad.

I can help with this sort of work, depending on what are you planning @ilayn .

ilayn · 2023-05-29T11:50:16Z

Indeed, I am very much all in for redesigning certain items if the codebase left us quite behind. The DifferentialEquations.jl is an example of how a specialized task force can take things way forward than what this fortran code can offer. So in that sense, I'm all in. One thing I am starting to notice is that, once you understand the codeflow you start to get rather grandiose ideas about how to generalize so better careful with that :)

Unfortunately, I'm seriously illiterate when it comes to anything that is not linalg or optimize. Hence, I am not sure if I can be of any help in designing the strategy. What I want is to get rid of the usual suspects, ARPACK/PROPACK and maybe (and by that I mean really maybe) L-BFGS-B stuff if I have steam left.

So TL;DR, let's

replace the ones for which a modern implementation exists,
translate the required/industry standard ones,
and deprecate the rest.

If integrate work can be consolidated as discussed, then let's do it. It shouldn't have been us doing this, we have to start somewhere anyways.

ilayn · 2023-05-29T11:54:37Z

One thing we have to be careful about stability statement is that though they are stable, they are fantastically old and use outdated ways of doing things in the absence of modern tooling (say fixed lapack bugs, threaded/reentrant C libs, and so on). Core of id_dist for example is doing hand coded householder trafos, and qr orthogonalizations which are now almost one-liners.

Long story short, though they are stable, these code have huge inertia for newcomers to implement things. Moreover they also carry a mystique that these code are battle-tested/should not be touched etc. That is demonstrably not true if you actually look at it 😃. I'm sure the algorithms are solid but implementations are far from perfect.

dschmitz89 · 2023-06-08T11:01:55Z

I had a look into the optimizers. The biggest problem is in my opinion l-bfgs-b. There are many l-bfgs implementations but no good ones that work with bounds (the l-bfgs-b part). For slsqp, I see a few possibilitites:

modern and actively maintained Fortran version here: no gotos but Fortran tooling problems still apply
auto translated version of the F77 code from nlopt: still full of gotos but at least in C and actively maintained. We also rely on nlopt for the DIRECT algorithm

I agree with @ilayn 's observation that the cores of these algorithms often look like homegrown linear algebra algorithms or copies of the classic F77 implementations which are nowadays available in LAPACK. Even the modern Fortran code includes again copies of BLAS routines and constrained least squares codes. To avoid linking complexity, this probably makes sense for Fortran or C but for scipy ..

matthew-brett · 2023-06-08T11:05:24Z

Just to clarify - does the modern and actively maintained Fortran project present any new problems in tooling? I think that's the key question.

ilayn · 2023-06-08T12:04:22Z

I don't know if we have any actively maintained new fortran code (maybe PRIMA will be if we manage to link it) so probably we would need to experience it to see if there are new issues.

Folks mention iso_c_binding and ctypes routes but I didn't see any demonstration yet. If we don't need to wrestle with ABI issues and/or linking strangeness, I'm fine with it. Like I mentioned before, this is not a crusade against Fortran, but to get rid of dormant code which happens to be fortran77.

ev-br · 2023-08-17T11:29:56Z

cc #19079 (comment) for a C++ port of AMOS, which seems to be license-compatible.

ivan-pi · 2023-08-30T07:59:14Z

A full Python interface for MINPACK is available since one year now: https://github.com/fortran-lang/minpack/tree/main/python

A few tests serving as examples can be found here: https://github.com/fortran-lang/minpack/blob/main/python/minpack/test_library.py

But I guess this doesn't really solve any of the problems you have with F77 which are:

easily installing a Fortran compiler on all OS's
the constraints the legacy codes impose when it comes to more feature rich and Pythonic interfaces
lack of developers interested in learning Fortran
the fact NumPy defaults to row-major

It appears easier for people to create something new in their own (closed) community than to work across programming languages and tools.

Some time ago I rewrote NNLS with modern Fortran flow-control constructs in an attempt to make it more maintainable and easy to interface. Unfortunately, I cannot live off of refactoring Fortran codes, hence I had no motivation to contribute it to SciPy, despite being a user.

inkydragon · 2024-01-26T23:09:02Z

blas: Convert: Very Hard (Depends on BLAS/LAPACK lib) A lot of historical wrapper inconsistencies

Maybe try BLIS which was written in C.
Performance (Compare with OpenBLAS, MKL, and Eigen)

BLIS is written in ISO C99 and available under a new/modified/3-clause BSD license. While BLIS exports a new BLAS-like API, it also includes a BLAS compatibility layer which gives application developers access to BLIS implementations via traditional BLAS routine calls.

AMD also builds their BLAS library (AOCL-BLAS) based on BLIS.

AOCL-BLAS is developed as a forked version of BLIS

ilayn · 2024-01-27T13:36:24Z

Yes BLIS is in my radar. I need to spare some time and see how we can find a fitting LAPACK for it. In the meantime, Rust folks are also covering quite some distance. I am looking into those. The absolutely last resort is going to be have a performant BLAS-like, and write a small subset of LAPACK with it. But I am really not looking forward to it.

Julia folks did some really nice progress with RecursiveFactorization.jl and I have something similar in mind but it is even more work then rewriting all Fortran code or so it seems to me.

cournape · 2024-02-25T03:13:34Z

For ARPACK, I wonder how feasible it would be use https://github.com/JuliaLinearAlgebra/ArnoldiMethod.jl as a reference and implement it in cython/etc.

According to https://discourse.julialang.org/t/ann-arnoldimethod-jl-v0-4/110604, it is more stable than ARPACK.

rgommers · 2024-02-25T08:14:48Z

Thanks for sharing @cournape. That looks great - way easier to understand than ARPACK and MIT-licensed.

haampie · 2024-03-05T09:38:47Z

Note that ArnoldiMethod.jl only implements the non-symmetric standard eigenproblem, it's lacking Lanczos and generalized problem. I don't know what the state of the art is, but as far as I remember Lanczos needs extra orthogonalization because it's unstable, and with that in place it looks rather similar to the non-symmetric case, so you might as well only do the non-symmetric case.

ilayn added the Fortran Items related to the internal Fortran code base label May 27, 2023

ilayn mentioned this issue May 28, 2023

ENH:optimize: Rewrite nnls in Python #18570

Merged

ev-br mentioned this issue May 29, 2023

ENH/DEP: Clean up quad return types #18574

Open

ilayn mentioned this issue Jun 14, 2023

MAINT:stats:Cythonize and remove Fortran statlib code #18679

Merged

This was referenced Aug 15, 2023

ENH: linalg: array library interoperability lucascolley/scipy#6

Closed

ENH: linalg: array library interoperability #19068

Open

ilayn mentioned this issue Aug 23, 2023

WIP:ENH:optimize:Rewrite SLSQP solver #19121

Draft

7 tasks

rgommers mentioned this issue Sep 26, 2023

ENH: Implemenation of other solvers from ODEPACK in scipy.integrate #19301

Open

mdhaber mentioned this issue Oct 22, 2023

BUG: fsolve + pchip + max, can't find root #19370

Open

ilayn mentioned this issue Nov 6, 2023

CI: test scipy/special/cdflib lfortran/lfortran#2743

Merged

ev-br mentioned this issue Nov 21, 2023

ENH: Warning: Fortran 2018 deleted feature: Shared DO termination #19557

Open

lucascolley mentioned this issue Nov 22, 2023

Continuing optimization without restarting #19522

Closed

ilayn mentioned this issue Nov 24, 2023

ENH:MAINT:special:Rewrite amos F77 code #19587

Merged

3 tasks

ilayn mentioned this issue Dec 28, 2023

ENH: Better support for complex numbers in scipy.signal #19771

Open

h-vetinari mentioned this issue Jan 6, 2024

DOC: Update toolchain roadmap #19815

Open

17 tasks

ilayn mentioned this issue Jan 7, 2024

ENH:Rewrite specfun F77 code in C #19824

Merged

ev-br mentioned this issue Jan 12, 2024

ENH: interpolate: Fitpack gen knots #19873

Closed

ev-br mentioned this issue Jan 20, 2024

ENH: Adding COBYQA to scipy.optimize? #19918

Closed

ilayn mentioned this issue Jan 28, 2024

SLICOT routines alter input parameters python-control/Slycot#208

Open

steppi mentioned this issue Feb 9, 2024

Scipy 1.12.0 pyodide/pyodide#4499

Merged

ilayn mentioned this issue Feb 23, 2024

Should we allow C++? Reference-LAPACK/lapack#988

Open

ilayn mentioned this issue Apr 23, 2024

ENH: linalg: Pythonize id_dist FORTRAN code #20558

Draft

dschmitz89 mentioned this issue May 3, 2024

ENH: odr: upgrade to ODRPACK95 (port to Python or beyond...) #7107

Open

ilayn mentioned this issue May 9, 2024

odeint (lsoda) emits too many warnings that are not filterable. #15940

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

META: FORTRAN Code inventory #18566

META: FORTRAN Code inventory #18566

ilayn commented May 27, 2023 •

edited

steppi commented May 27, 2023

j-bowhay commented May 28, 2023

ilayn commented May 28, 2023

Kai-Striega commented May 28, 2023

steppi commented May 28, 2023

zachjweiner commented May 28, 2023

ilayn commented May 28, 2023 •

edited

zachjweiner commented May 28, 2023

j-bowhay commented May 28, 2023

ev-br commented May 29, 2023 •

edited

ev-br commented May 29, 2023 •

edited

ilayn commented May 29, 2023

ilayn commented May 29, 2023 •

edited

dschmitz89 commented Jun 8, 2023

matthew-brett commented Jun 8, 2023

ilayn commented Jun 8, 2023

ev-br commented Aug 17, 2023

ivan-pi commented Aug 30, 2023

inkydragon commented Jan 26, 2024

ilayn commented Jan 27, 2024

cournape commented Feb 25, 2024

rgommers commented Feb 25, 2024

haampie commented Mar 5, 2024 •

edited

META: FORTRAN Code inventory #18566

META: FORTRAN Code inventory #18566

Comments

ilayn commented May 27, 2023 • edited

steppi commented May 27, 2023

j-bowhay commented May 28, 2023

ilayn commented May 28, 2023

Kai-Striega commented May 28, 2023

steppi commented May 28, 2023

zachjweiner commented May 28, 2023

Footnotes

ilayn commented May 28, 2023 • edited

zachjweiner commented May 28, 2023

j-bowhay commented May 28, 2023

ev-br commented May 29, 2023 • edited

ev-br commented May 29, 2023 • edited

ilayn commented May 29, 2023

ilayn commented May 29, 2023 • edited

dschmitz89 commented Jun 8, 2023

matthew-brett commented Jun 8, 2023

ilayn commented Jun 8, 2023

ev-br commented Aug 17, 2023

ivan-pi commented Aug 30, 2023

inkydragon commented Jan 26, 2024

ilayn commented Jan 27, 2024

cournape commented Feb 25, 2024

rgommers commented Feb 25, 2024

haampie commented Mar 5, 2024 • edited

ilayn commented May 27, 2023 •

edited

ilayn commented May 28, 2023 •

edited

ev-br commented May 29, 2023 •

edited

ev-br commented May 29, 2023 •

edited

ilayn commented May 29, 2023 •

edited

haampie commented Mar 5, 2024 •

edited