Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: odr: upgrade to ODRPACK95 (port to Python or beyond...) #7107

Open
markcampanelli opened this issue Feb 28, 2017 · 38 comments
Open

ENH: odr: upgrade to ODRPACK95 (port to Python or beyond...) #7107

markcampanelli opened this issue Feb 28, 2017 · 38 comments
Labels
enhancement A new feature or improvement scipy.odr

Comments

@markcampanelli
Copy link

I am interested in volunteering to help port ODRPACK95 (http://dl.acm.org/citation.cfm?id=1268782) to scipy in order to get the bound constraints feature. I need some direction on how to proceed, however.

@rgommers rgommers added enhancement A new feature or improvement scipy.odr labels Feb 28, 2017
@rgommers
Copy link
Member

If anyone is looking for the source code: it's a zip file linked at the top of http://www.netlib.org/odrpack/. It doesn't include a license; not written by government employees so not public domain. ACM is usually a pain, did you find license info anywhere?

Looks like the most straightforward thing to do if the license is compatible is to include that Fortran code in the same way as done for ODRPACK now.

The trust-region algorithm that's the basis for ODRPACK95 is available in scipy.optimize, so another (possibly worthwhile) option is to implement the functionality you want in Python based on the paper.

@rgommers
Copy link
Member

Maybe wait for @rkern to comment before starting, he's the expert for odr.

@markcampanelli
Copy link
Author

Thanks @rgommers for your insight on the options. I presume that because the original ODR work was largely by NIST employees, the ACM TOMS copyright is not in force. I'll await input from @rkern while I consider how we might integrate the existing code with scipy.optimize's trust-region algorithm.

@rkern
Copy link
Member

rkern commented Feb 28, 2017

ODRPACK was indeed written by NIST folk, but ODRPACK95 was not. The main work was done by people at Virginia Tech. It used to be distributed as part of JigCell, but the download and SVN links no longer work, so I couldn't say if there was a license attached to that project independent of the TOMS publication. You can try contacting Layne T. Watson or Jason Zwolak and ask.

I have always thought that it would be nice to reimplement the algorithm in pure Python.

@ev-br
Copy link
Member

ev-br commented Feb 28, 2017 via email

@rkern
Copy link
Member

rkern commented Feb 28, 2017

Fits pretty well, I think, for a student that's comfortable implementing numerical algorithms from their papers.

@ev-br
Copy link
Member

ev-br commented Feb 28, 2017 via email

@rkern
Copy link
Member

rkern commented Feb 28, 2017

Not (co-)mentoring, no, but I might be able to write up a description.

@charris
Copy link
Member

charris commented Feb 28, 2017

Probably should establish the license first to avoid tears.

@rkern
Copy link
Member

rkern commented Feb 28, 2017

There isn't much interesting in the ODRPACK95 code itself that isn't in the ODRPACK code. The algorithmic innovation was adding bound constraints to its trust-region solver, but we'd probably have to do it differently to add that functionality to our own trust-region solvers in any case.

@markcampanelli
Copy link
Author

Sent an email inquiry about the license to Layne Watson at ltw@cs.vt.edu.

@mythsmith
Copy link

mythsmith commented Jun 9, 2017

Sent also a message to the same email on 3Feb2017, but got no replay. I am also interested in this upgrade and could dedicate some time.
Today I contacted also Paul T. Boggs and Jason W. Zwolak.

@markcampanelli
Copy link
Author

I would like to help here, but I think I am a bit oversubscribed to lead. I would certainly be able to help test. It would be really nice to clear up the copyright issues, including if we decide to drop the Fortran code altogether and write from scratch in Python/Cython.

@pv
Copy link
Member

pv commented Jun 9, 2017 via email

@markcampanelli
Copy link
Author

It is not clear to me that even writing from scratch would alleviate all the potential copyright/license issues. I tried reviewing the ACM TOMS policy, but I'm no lawyer: Would we need to be able to prove that we did NOT somehow "copy" the F95 bound constraint implementation over to Scipy?

@pv
Copy link
Member

pv commented Jun 9, 2017 via email

@mythsmith
Copy link

mythsmith commented Jun 9, 2017

Just got a reply from Jason:

Hi Daniele,
ODRPACK95 is public domain.
If you need any help with integration or other software development, I am available through my business: http://insilicalabs.com
Jason

Jason Zwolak

This sounds like a green semaphore.

@GuiiFerrari
Copy link

Hi everyone. Any updates on this problem? It would be really useful to have the bound constraints feature

@dschmitz89
Copy link
Contributor

scipy is trying to remove most Fortran codes to reduce our build complexity and maintenance burden (see #18566 ), so we are unlikely to accept ORDPACK95. Are there any ports to other languages?

@lucascolley lucascolley changed the title Upgrading scipy.odr to ODRPACK95 ENH: odr: upgrade to ODRPACK95 May 3, 2024
@markcampanelli
Copy link
Author

I am willing to revisit converting ODR to Python with the addition of parameter bounds. This could presumably build off of other scipy algorithms (e.g., aforementioned trust region optimization). However, I have found the existing Fortran-based code to be blazingly fast, and I make heavy use of the implicit model formulation option. I think speed and feature parity should be well screened as criteria for the translation.

@ilayn
Copy link
Member

ilayn commented May 3, 2024

Based on the existing translations, It's typically on par, sometimes faster (due to avoiding the glue), or at most 2x-ish slower but removal of the maintenance burden, unnecessary compiling olympics and removed code rot (since we can't fix bugs) is much more important for us.

We can always make the code run faster by going lower level or if there is any other better maintained code base.

@rgommers
Copy link
Member

rgommers commented May 3, 2024

Rewriting in Python, and if it's slow then accelerating the bottleneck with C/C++/Cython, would be great. And much easier to add any needed features then.

@ilayn
Copy link
Member

ilayn commented May 3, 2024

Also it goes much faster if there is someone who knows what ODR code should do. So please let me know if you would like to take stab @markcampanelli. I'd be happy to do the boring parts of the translation. It's somehow getting easier for me the more I do it (and making me more furious about what I am seeing 😅 )

@rkern
Copy link
Member

rkern commented May 3, 2024

FWIW, "trust region" is more of an adjective than a verb noun. It's a general strategy to modify nonlinear optimizers that use local approximations. The ODRPACKs implement a trust-region Levenberg-Marquardt. It's not clear to me that the trust region optimizers that we have are directly usable. The key insight of ODRPACK is that the innermost part of the LM algorithm can be applied to the ODR problem (which expands the number of parameters it needs to solve for from np to np+nx) efficiently by making use of the specific structure of the Jacobian matrix that's used to determine the LM step. It's not clear to me if the trust region methods we have implemented make use of the same Jacobian matrix or that we could intervene that deeply inside of our implementations of them to make use of the special structure.

Zwolak2006 has a clear explanation of the ODRPACK95 algorithm.

@markcampanelli
Copy link
Author

@rkern Thanks for shining a bit of light forward. I will read the paper and decide if I think I can do this. Making time is perhaps the biggest challenge, of course.

Out of curiosity, is the scipy team settled on preferentially using, say, Cython for compiled implementations? Being able to write some high performance Rust would be a deal sweetener for me, but C/Cython wouldn’t necessarily be a deal breaker.

@rgommers
Copy link
Member

rgommers commented May 3, 2024

Out of curiosity, is the scipy team settled on preferentially using, say, Cython for compiled implementations? Being able to write some high performance Rust would be a deal sweetener for me, but C/Cython wouldn’t necessarily be a deal breaker.

Cython/C/C++ are all acceptable. We're more and more coming to the conclusion that Cython generates binaries that are way too large, so C/C++ is nicer if you're equally comfortable there. If the code supports only float64, C is straightforward. If templating over dtypes is needed, C++ tends to be nicer.

Rust: I suspect that eventually we'll get there, but it's a big lift and we don't have the build infrastructure for it nor done the experiments to figure out what the total impact would be. Maybe in a couple of years.

@GuiiFerrari
Copy link

I can try to rewrite the code in python. @markcampanelli if you don't mind I can help you...

@jhdesantana
Copy link

@markcampanelli @GuiiFerrari, I am also interested in volunteering to help port ODRPACK95 to scipy.

@dschmitz89
Copy link
Contributor

Had not expected to see so much enthusiasm for porting ODRPACK to another language. This is highly appreciated folks! Feel free to coordinate a little here :)

@lucascolley lucascolley changed the title ENH: odr: upgrade to ODRPACK95 ENH: odr: upgrade to ODRPACK95 (port to Python or beyond...) May 5, 2024
@HugoMVale
Copy link

The ODRPACK95 code is actually very well structured in a module; nothing like older F77 code. Modern Fortran allows easy interfacing between C and Fortran (either direction). This being so, what is the advantage of converting well-written and validated code to C when the same result can be achieved just by writing the C bindings?
You can see a nice and recent example for PRIMA.

@ilayn
Copy link
Member

ilayn commented May 9, 2024

We are quite aware of PRIMA but every Fortran user who thinks it is a trivial task to do the bindings when attempt it quickly change their mind. it is not that straightforward and @zaikunzhang can comment on it better as the author of PRIMA. Also see #18118

@HugoMVale
Copy link

Doing the binding cannot be more complex than rewriting and validating the whole code! ;)
There is only one function to interface with.
If there is willingness from scipy to accept this route and the matter is not for yesterday, I could attempt it.

      SUBROUTINE ODR
     &   (FCN,
     &   N,M,NP,NQ,
     &   BETA,
     &   Y,X,
     &   DELTA,
     &   WE,WD,
     &   IFIXB,IFIXX,
     &   JOB,NDIGIT,TAUFAC,
     &   SSTOL,PARTOL,MAXIT,
     &   IPRINT,LUNERR,LUNRPT,
     &   STPB,STPD,
     &   SCLB,SCLD,
     &   WORK,IWORK,
     &   INFO,
     &   LOWER,UPPER)

C...Routine names used as subprogram arguments
C   FCN:     The user-supplied subroutine for evaluating the model.

C...Variable definitions (alphabetically)
C   BETA:    The function parameters.
C   DELTA:   The initial error in the X data
C   IFIXB:   The values designating whether the elements of BETA are 
C            fixed at their input values or not.
C   IFIXX:   The values designating whether the elements of X are 
C            fixed at their input values or not.
C   INFO:    The variable designating why the computations were stopped.
C   IPRINT:  The print control variable.
C   IWORK:   The integer work space.
C   JOB:     The variable controlling problem initialization and 
C            computational method.
C   LOWER:   The lower bound on BETA.
C   LUNERR:  The logical unit number for error messages.
C   LUNRPT:  The logical unit number for computation reports.
C   M:       The number of columns of data in the explanatory variable.
C   MAXIT:   The maximum number of iterations allowed.
C   N:       The number of observations.
C   NDIGIT:  The number of accurate digits in the function results, as
C            supplied by the user.
C   NP:      The number of function parameters.
C   NQ:      The number of responses per observation.
C   PARTOL:  The parameter convergence stopping tolerance.
C   SCLB:    The scaling values for BETA.
C   SCLD:    The scaling values for DELTA.
C   STPB:    The relative step for computing finite difference
C            derivatives with respect to BETA.
C   STPD:    The relative step for computing finite difference
C            derivatives with respect to DELTA.
C   SSTOL:   The sum-of-squares convergence stopping tolerance.
C   TAUFAC:  The factor used to compute the initial trust region 
C            diameter.
C   UPPER:   The upper bound on BETA.
C   WD:      The DELTA weights.
C   WD1:     A dummy array used when WD(1,1,1)=0.0E0_R8.
C   WE:      The EPSILON weights.
C   WORK:    The REAL (KIND=R8) work space.
C   X:       The explanatory variable.
C   Y:       The dependent variable.  Unused when the model is implicit.

@ilayn
Copy link
Member

ilayn commented May 9, 2024

Doing the binding cannot be more complex than rewriting and validating the whole code! ;)

You would be surprised. Especially when half of the functions are home-grown (but outdated) LAPACK functions. But I am too invested in removing F77 code from SciPy so I should not comment.

@ev-br
Copy link
Member

ev-br commented May 9, 2024

As a matter of fact, there exists at least one scipy dev who would be interested in seeing the result (me).

First of all, note that this interest does not guarantee the result is going to be accepted. The value of the exercise is to help us judge the pros and cons of this way of binding F90 code, to be considered together with other pros and cons of maintaining it. Even if it's not going to make it into scipy proper, a non-trivial real world worked example would be very useful in many circumstances.

So, if you're still interested :-), it'd be great to understand:

  • how the binding actually look like
  • how robust the result is (Windows, MacOS etc)
  • binary size
  • performance penalty
  • copying behavior (F vs C arrays, can you control copies)

@zaikunzhang
Copy link
Contributor

zaikunzhang commented May 9, 2024

The Python binding of PRIMA has been finished, thanks to the precious contribution of @nbelakovski.

The binding is in fact not that complicated. It works on all platforms, and its build system does not depend on anything that is "dirty" or outdated as far as I can see.

Note that PRIMA is a comprehensive package for (derivative-free) optimization. It needs to bind not only one but five solvers. This increases the complexity considerably.

I hope PRIMA provides an example of binding modern Fortran code to Python.

@HugoMVale
Copy link

Ok, I will give it a try. ;)
I've set up a project here. I'll keep you posted on the progress.

@rgommers
Copy link
Member

Sounds good! I think it's okay to replace old F77 code with new F95 code, as long as it's better quality and not a ton more code. That's a net win, and when we do get to rewriting it in another language (because we would really like a Fortran-free future!), that job will be easier to do when starting from better F95 code.

@DEUSD
Copy link

DEUSD commented Jun 5, 2024

Hi @HugoMVale,
Will it be possible to define upper and lower limits as in the Fortran version? We need them to customize the configuration of our adjustment solution.
Thanks in advance for working on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.odr
Projects
None yet
Development

No branches or pull requests