Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

META: FORTRAN Code inventory #18566

Open
13 of 36 tasks
ilayn opened this issue May 27, 2023 · 23 comments
Open
13 of 36 tasks

META: FORTRAN Code inventory #18566

ilayn opened this issue May 27, 2023 · 23 comments
Labels
Fortran Items related to the internal Fortran code base

Comments

@ilayn
Copy link
Member

ilayn commented May 27, 2023

This issue is meant for a central place for our Fortran (almost exclusively F77) codebase to track per module and if possible remove depending on contributor affinity to the subject matter and fortran-fu.

List has projected effort required just based on code length hence involves prejudice.

@ilayn ilayn added the Fortran Items related to the internal Fortran code base label May 27, 2023
@steppi
Copy link
Contributor

steppi commented May 27, 2023

specfunactually isn't so bad because each function is mostly self contained, and it can be converted bit by bit. I've already made good headway with hyp2f1 and that's arguably the most difficult function to replace. I'm determined to banish the F77 from special and have time to get back to it now.

I recently took a close look at ODEPACK to review gh-14552. Converting that will be a bear since it consists of large pieces of machinery with sophisticated control logic. I'd be up for helping to replace it with a C, C++, or Cython implementation if we can find someone else with sufficient, time, knowledge, and interest to work on it together with me. I think it would have to wait until the all of the deliverables are met for the SDG @tirthasheshpatel and I have to work on special though.

@j-bowhay
Copy link
Member

I am interested in looking at dop over the summer as I am fairly familiar with Hairer's work however my Fortran knowledge is fairly minimal so won't block anyone else's efforts.

@ilayn
Copy link
Member Author

ilayn commented May 28, 2023

You don't need to know Fortran. If you know about the functionality then you can approximate as closely as possible without following the code faithfully. You don't need to jump through the GOTOs but kind of get why and where it jumps. I am basically following the logic and not the code. Because you go quickly insane with Arithmetic IFs and Computed GOTOs.

@Kai-Striega
Copy link
Member

I'd also be interested in picking up some of these. I'll take a close look during the week and pick somewhere to get started.

@steppi
Copy link
Contributor

steppi commented May 28, 2023

I'd be happy to review PRs for replacing old Fortran code. I think I can take a look at gh-18570 this week.

I am interested in looking at dop over the summer as I am fairly familiar with Hairer's work however my Fortran knowledge is fairly minimal so won't block anyone else's efforts.

Picking up the language itself is actually not too hard. You could look through https://web.stanford.edu/class/me200c/tutorial_77/ to get started, and look things up here https://docs.oracle.com/cd/E19957-01/805-4939/index.html as needed as you work through code. Like @ilayn said, it's the unstructured control flow that can make understanding the code so difficult.

You don't need to jump through the GOTOs but kind of get why and where it jumps. I am basically following the logic and not the code. Because you go quickly insane with Arithmetic IFs and Computed GOTOs.

What I tend to do is follow along with a notebook and try to work out flowcharts in an almost mechanical fashion, and then use these to reason about parts of the program. I've found trying to follow along and keep track of the state in ones head like one can do in structured programs is basically impossible.

@zachjweiner
Copy link

@j-bowhay The dop methods already have pure-Python implementations accessible via the "new" interface, solve_ivp. LSODA is the only wrapped Fortran method via the new interface. The *vode methods (also from odepack) are the only ones available in the old interface but not the new one, but I'm not sure how much use they get (or "should" get) in practice.

LSODA does provide unique functionality (automatic stiffness detection and switching between stiff/nonstiff methods), but I don't know that the algorithm itself is especially complex. The stiff and nonstiff methods individually should be straightforward (pure-Python BDF---LSODA's stiff solver---is already available via solve_ivp). Though the stiffness detection and logic to handle switching would be new, from skimming the algorithm description it only seems modestly involved.

But rather than simply directly reimplementing LSODA, I think it would be a better use of time to implement abstracted utilities for method switching/composition (which could then be used to replicate LSODA). DifferentialEquations.jl does this and finds that combinations of other stiff and nonstiff methods can outperform LSODA. Obviously this goes beyond the mission of this issue, but if effort is to be spent on the ODE solvers I don't think it should be to simply directly translate LSODA/odepack. With a proper, modular abstraction, I would expect the individual components (stiff and nonstiff solvers with their own dense output, stiffness detection, and logic for dynamically switching) to be simpler to tackle than the LSODA source would suggest.1 (And I don't immediately see a reason these should be implemented in compiled code, at least at first.)

On a semi-related note: a year ago I rewrote solve_ivp (and just the Runge-Kutta methods) in Cython to see how much overhead could be mitigated. There were a number open questions/issues to address - just mentioning here if anyone is interested in discussing elsewhere.

Footnotes

  1. There is an (apparently inactive) C rewrite of LSODA, which may be easier to reference for anyone digging into it.

@ilayn
Copy link
Member Author

ilayn commented May 28, 2023

I don't know enough solver knowledge to contribute to your analysis. But it sounds like we can indeed have a look at the Cython implementation to see whether we can improve or I don't know what you would like to get out of that exercise. Typically a verbose Cython code can get to wrapped Fortran speed with not so much effort. All depends on the sophistication.

The main issue we are trying to tackle is to get rid of this dormant unmaintained code which is blocking us to achieve more user friendly and feature rich API. In a way, I come to believe that the code you wrap forces you into its own API, good or bad. And it is impossible to troubleshoot these things. #14807 is one of my favorite bug reports I've encountered here. In other words, scientific python still smells like fortran-like API which is a shame if you compare it with other coding domains and frameworks.

Hence if domain experts think there are better alternatives, then we can switch to them. If they think it's a waste of time, then we deprecate it. Or it is a daunting task to do anything about but we still need the functionality, then a few of us who are crazy enough can sit down and give it a go. After spending last month with this fortran code, I'm simultaneously amazed that the authors provided such robust code which still works and being used by so many people, and lost almost all my respect to scientific computing community at the same time for letting this code untouched for the last 40 years making the same excuses.

Please let me know if I can provide any help in these issues. I'm more inclined to touch things on sparse.linalg and optimize because I know something about those things but I can try taking a stab at translations or cython code.

@zachjweiner
Copy link

But it sounds like we can indeed have a look at the Cython implementation to see whether we can improve or I don't know what you would like to get out of that exercise. Typically a verbose Cython code can get to wrapped Fortran speed with not so much effort. All depends on the sophistication.

I'm happy to share it but I should've been clearer that I think it's independent from the Fortran issue (I was just rewriting the pure-Python routines in Cython). Just wanted to mention for visibility to people thinking about compiled code for IVP solvers. (And indeed I was able to get the overhead to << 1 us but only with an API change, and I hadn't tackled dense output.) But the odepack routines come from compiled code only because odepack is an old, widely-used implementation, not because they intrinsically need to be compiled to be useful/reasonably performant.

Hence if domain experts think there are better alternatives, then we can switch to them. If they think it's a waste of time, then we deprecate it.

I should've also mentioned that, since a large fraction of the odepack-dependence comes only from the "old" ode API, it would probably be best to decide whether to actually, officially deprecate it first. After that, my only point is that I think it would be more worthwhile in the long run to enhance/add to the functionality under the solve_ivp umbrella (as I described above) than to painstakingly translate LSODA from Fortran.

And it is impossible to troubleshoot these things. #14807 is one of my favorite bug reports I've encountered here. In other words, scientific python still smells like fortran-like API which is a shame if you compare it with other coding domains and frameworks.

Thanks for sharing - would be glad to see the ecosystem move toward being not only more maintainable and robust to such correctness issues, but also more extensible and improvable.

@j-bowhay
Copy link
Member

Thanks @zachjweiner for the summary. I did wonder what the overlap was with the existing Python code but was yet to put in any effort to investigate. If I understand you correctly, the only barrier to removing the dop directory is the legacy ode solvers? If so I will probably focus my efforts elsewhere on something hopefully more impactful.

@ev-br
Copy link
Member

ev-br commented May 29, 2023

There is also some Fortran code in stats, all probably rather low-hanging fruit.

$ ll scipy/stats/*f
-rw-rw-r-- 1 br br 46387 мая  8  2022 scipy/stats/mvndst.f
-rw-rw-r-- 1 br br  2563 ноя 21  2022 scipy/stats/mvn.pyf
-rw-rw-r-- 1 br br  1528 ноя 21  2022 scipy/stats/statlib.pyf

EDIT: edited these in into the OP, with a subjective difficulty assessment.

@ev-br
Copy link
Member

ev-br commented May 29, 2023

This is great @ilayn !
Now that we have a laundry list, I think it would be helpful to add two more classification indicators:

  • whether a module causes problems or not. For instance, ODRPACK seems stable and without open issues. So probably low prio.
  • alternatives, both external and interternal. For instance, VODE: I don't think it makes much sense to just transcribe the fortran implementation without taking a look at SUNDIALS which is in C++. Also solve_ivp, as discussed. (For ode solving specifically, I think it's best to discuss it in a separate issue @zachjweiner --- I think a cythonized solve_ivp might be very welcome).
    One other example is QUADPACK. Maybe the best course of action is to assess what's missing in quad_vec, port the missing bits; once it's feature complete, we can just deprecate quad.

I can help with this sort of work, depending on what are you planning @ilayn .

@ilayn
Copy link
Member Author

ilayn commented May 29, 2023

Indeed, I am very much all in for redesigning certain items if the codebase left us quite behind. The DifferentialEquations.jl is an example of how a specialized task force can take things way forward than what this fortran code can offer. So in that sense, I'm all in. One thing I am starting to notice is that, once you understand the codeflow you start to get rather grandiose ideas about how to generalize so better careful with that :)

Unfortunately, I'm seriously illiterate when it comes to anything that is not linalg or optimize. Hence, I am not sure if I can be of any help in designing the strategy. What I want is to get rid of the usual suspects, ARPACK/PROPACK and maybe (and by that I mean really maybe) L-BFGS-B stuff if I have steam left.

So TL;DR, let's

  • replace the ones for which a modern implementation exists,
  • translate the required/industry standard ones,
  • and deprecate the rest.

If integrate work can be consolidated as discussed, then let's do it. It shouldn't have been us doing this, we have to start somewhere anyways.

@ilayn
Copy link
Member Author

ilayn commented May 29, 2023

One thing we have to be careful about stability statement is that though they are stable, they are fantastically old and use outdated ways of doing things in the absence of modern tooling (say fixed lapack bugs, threaded/reentrant C libs, and so on). Core of id_dist for example is doing hand coded householder trafos, and qr orthogonalizations which are now almost one-liners.

Long story short, though they are stable, these code have huge inertia for newcomers to implement things. Moreover they also carry a mystique that these code are battle-tested/should not be touched etc. That is demonstrably not true if you actually look at it 😃. I'm sure the algorithms are solid but implementations are far from perfect.

@dschmitz89
Copy link
Contributor

I had a look into the optimizers. The biggest problem is in my opinion l-bfgs-b. There are many l-bfgs implementations but no good ones that work with bounds (the l-bfgs-b part). For slsqp, I see a few possibilitites:

  • modern and actively maintained Fortran version here: no gotos but Fortran tooling problems still apply
  • auto translated version of the F77 code from nlopt: still full of gotos but at least in C and actively maintained. We also rely on nlopt for the DIRECT algorithm

I agree with @ilayn 's observation that the cores of these algorithms often look like homegrown linear algebra algorithms or copies of the classic F77 implementations which are nowadays available in LAPACK. Even the modern Fortran code includes again copies of BLAS routines and constrained least squares codes. To avoid linking complexity, this probably makes sense for Fortran or C but for scipy ..

@matthew-brett
Copy link
Contributor

Just to clarify - does the modern and actively maintained Fortran project present any new problems in tooling? I think that's the key question.

@ilayn
Copy link
Member Author

ilayn commented Jun 8, 2023

I don't know if we have any actively maintained new fortran code (maybe PRIMA will be if we manage to link it) so probably we would need to experience it to see if there are new issues.

Folks mention iso_c_binding and ctypes routes but I didn't see any demonstration yet. If we don't need to wrestle with ABI issues and/or linking strangeness, I'm fine with it. Like I mentioned before, this is not a crusade against Fortran, but to get rid of dormant code which happens to be fortran77.

@ev-br
Copy link
Member

ev-br commented Aug 17, 2023

cc #19079 (comment) for a C++ port of AMOS, which seems to be license-compatible.

@ivan-pi
Copy link

ivan-pi commented Aug 30, 2023

A full Python interface for MINPACK is available since one year now: https://github.com/fortran-lang/minpack/tree/main/python

A few tests serving as examples can be found here: https://github.com/fortran-lang/minpack/blob/main/python/minpack/test_library.py

But I guess this doesn't really solve any of the problems you have with F77 which are:

  • easily installing a Fortran compiler on all OS's
  • the constraints the legacy codes impose when it comes to more feature rich and Pythonic interfaces
  • lack of developers interested in learning Fortran
  • the fact NumPy defaults to row-major

It appears easier for people to create something new in their own (closed) community than to work across programming languages and tools.

Some time ago I rewrote NNLS with modern Fortran flow-control constructs in an attempt to make it more maintainable and easy to interface. Unfortunately, I cannot live off of refactoring Fortran codes, hence I had no motivation to contribute it to SciPy, despite being a user.

@inkydragon
Copy link
Contributor

blas: Convert: Very Hard (Depends on BLAS/LAPACK lib) A lot of historical wrapper inconsistencies

Maybe try BLIS which was written in C.
Performance (Compare with OpenBLAS, MKL, and Eigen)

BLIS is written in ISO C99 and available under a new/modified/3-clause BSD license. While BLIS exports a new BLAS-like API, it also includes a BLAS compatibility layer which gives application developers access to BLIS implementations via traditional BLAS routine calls.

AMD also builds their BLAS library (AOCL-BLAS) based on BLIS.

AOCL-BLAS is developed as a forked version of BLIS

@ilayn
Copy link
Member Author

ilayn commented Jan 27, 2024

Yes BLIS is in my radar. I need to spare some time and see how we can find a fitting LAPACK for it. In the meantime, Rust folks are also covering quite some distance. I am looking into those. The absolutely last resort is going to be have a performant BLAS-like, and write a small subset of LAPACK with it. But I am really not looking forward to it.

Julia folks did some really nice progress with RecursiveFactorization.jl and I have something similar in mind but it is even more work then rewriting all Fortran code or so it seems to me.

@cournape
Copy link
Member

For ARPACK, I wonder how feasible it would be use https://github.com/JuliaLinearAlgebra/ArnoldiMethod.jl as a reference and implement it in cython/etc.

According to https://discourse.julialang.org/t/ann-arnoldimethod-jl-v0-4/110604, it is more stable than ARPACK.

@rgommers
Copy link
Member

Thanks for sharing @cournape. That looks great - way easier to understand than ARPACK and MIT-licensed.

@haampie
Copy link

haampie commented Mar 5, 2024

Note that ArnoldiMethod.jl only implements the non-symmetric standard eigenproblem, it's lacking Lanczos and generalized problem. I don't know what the state of the art is, but as far as I remember Lanczos needs extra orthogonalization because it's unstable, and with that in place it looks rather similar to the non-symmetric case, so you might as well only do the non-symmetric case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Fortran Items related to the internal Fortran code base
Projects
None yet
Development

No branches or pull requests

14 participants