ENH: odr: upgrade to ODRPACK95 (port to Python or beyond...) #7107

markcampanelli · 2017-02-28T02:10:34Z

I am interested in volunteering to help port ODRPACK95 (http://dl.acm.org/citation.cfm?id=1268782) to scipy in order to get the bound constraints feature. I need some direction on how to proceed, however.

rgommers · 2017-02-28T08:46:42Z

If anyone is looking for the source code: it's a zip file linked at the top of http://www.netlib.org/odrpack/. It doesn't include a license; not written by government employees so not public domain. ACM is usually a pain, did you find license info anywhere?

Looks like the most straightforward thing to do if the license is compatible is to include that Fortran code in the same way as done for ODRPACK now.

The trust-region algorithm that's the basis for ODRPACK95 is available in scipy.optimize, so another (possibly worthwhile) option is to implement the functionality you want in Python based on the paper.

rgommers · 2017-02-28T08:47:21Z

Maybe wait for @rkern to comment before starting, he's the expert for odr.

markcampanelli · 2017-02-28T13:36:20Z

Thanks @rgommers for your insight on the options. I presume that because the original ODR work was largely by NIST employees, the ACM TOMS copyright is not in force. I'll await input from @rkern while I consider how we might integrate the existing code with scipy.optimize's trust-region algorithm.

rkern · 2017-02-28T19:29:59Z

ODRPACK was indeed written by NIST folk, but ODRPACK95 was not. The main work was done by people at Virginia Tech. It used to be distributed as part of JigCell, but the download and SVN links no longer work, so I couldn't say if there was a license attached to that project independent of the TOMS publication. You can try contacting Layne T. Watson or Jason Zwolak and ask.

I have always thought that it would be nice to reimplement the algorithm in pure Python.

ev-br · 2017-02-28T19:36:07Z

One more possible GSoC topic? How does the difficulty compare to a GSoC project scope?

…

On Tue, Feb 28, 2017 at 10:30 PM, Robert Kern ***@***.***> wrote: ODRPACK was indeed written by NIST folk, but ODRPACK95 was not. The main work was done by people at Virginia Tech. It used to be distributed as part of JigCell <http://jigcell.cs.vt.edu/index.php>, but the download and SVN links no longer work, so I couldn't say if there was a license attached to that project independent of the TOMS publication. You can try contacting Layne T. Watson or Jason Zwolak and ask. I have always thought that it would be nice to reimplement the algorithm in pure Python. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#7107 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACCPSNnzYIv7UnJKX1mG6qwCwTV60dYzks5rhHW5gaJpZM4MN4Rw> .

rkern · 2017-02-28T20:08:32Z

Fits pretty well, I think, for a student that's comfortable implementing numerical algorithms from their papers.

ev-br · 2017-02-28T20:58:54Z

Any chance you'd be interested in writing a project description / mentoring / co-mentoring it if there's a student? Fits pretty well, I think, for a student that's comfortable implementing

…

numerical algorithms from their papers.

rkern · 2017-02-28T21:32:49Z

Not (co-)mentoring, no, but I might be able to write up a description.

charris · 2017-02-28T21:50:38Z

Probably should establish the license first to avoid tears.

rkern · 2017-02-28T22:22:34Z

There isn't much interesting in the ODRPACK95 code itself that isn't in the ODRPACK code. The algorithmic innovation was adding bound constraints to its trust-region solver, but we'd probably have to do it differently to add that functionality to our own trust-region solvers in any case.

markcampanelli · 2017-03-11T23:36:59Z

Sent an email inquiry about the license to Layne Watson at ltw@cs.vt.edu.

mythsmith · 2017-06-09T08:24:41Z

Sent also a message to the same email on 3Feb2017, but got no replay. I am also interested in this upgrade and could dedicate some time.
Today I contacted also Paul T. Boggs and Jason W. Zwolak.

markcampanelli · 2017-06-09T09:37:17Z

I would like to help here, but I think I am a bit oversubscribed to lead. I would certainly be able to help test. It would be really nice to clear up the copyright issues, including if we decide to drop the Fortran code altogether and write from scratch in Python/Cython.

pv · 2017-06-09T09:56:12Z

If the copyright issues are not addressed, writing from scratch cannot be avoided. At this point it appears the authors of the package are not going to reply, so we can assume the copyright issues will not be resolved.

markcampanelli · 2017-06-09T10:32:31Z

It is not clear to me that even writing from scratch would alleviate all the potential copyright/license issues. I tried reviewing the ACM TOMS policy, but I'm no lawyer: Would we need to be able to prove that we did NOT somehow "copy" the F95 bound constraint implementation over to Scipy?

pv · 2017-06-09T11:43:18Z

IANAL, but if you have not ever looked at the F95 source code in question, it is unlikely there is any copyright issue https://en.wikipedia.org/wiki/Clean_room_design. Also simply from a moral point of view, if you are planning to work on it, do not read their source code.

mythsmith · 2017-06-09T12:45:21Z

Just got a reply from Jason:

Hi Daniele,
ODRPACK95 is public domain.
If you need any help with integration or other software development, I am available through my business: http://insilicalabs.com
Jason

Jason Zwolak

This sounds like a green semaphore.

GuiiFerrari · 2024-05-02T03:23:57Z

Hi everyone. Any updates on this problem? It would be really useful to have the bound constraints feature

dschmitz89 · 2024-05-03T05:05:36Z

scipy is trying to remove most Fortran codes to reduce our build complexity and maintenance burden (see #18566 ), so we are unlikely to accept ORDPACK95. Are there any ports to other languages?

markcampanelli · 2024-05-03T10:18:15Z

I am willing to revisit converting ODR to Python with the addition of parameter bounds. This could presumably build off of other scipy algorithms (e.g., aforementioned trust region optimization). However, I have found the existing Fortran-based code to be blazingly fast, and I make heavy use of the implicit model formulation option. I think speed and feature parity should be well screened as criteria for the translation.

ilayn · 2024-05-03T10:24:20Z

Based on the existing translations, It's typically on par, sometimes faster (due to avoiding the glue), or at most 2x-ish slower but removal of the maintenance burden, unnecessary compiling olympics and removed code rot (since we can't fix bugs) is much more important for us.

We can always make the code run faster by going lower level or if there is any other better maintained code base.

rgommers · 2024-05-03T10:34:21Z

Rewriting in Python, and if it's slow then accelerating the bottleneck with C/C++/Cython, would be great. And much easier to add any needed features then.

ilayn · 2024-05-03T10:45:08Z

Also it goes much faster if there is someone who knows what ODR code should do. So please let me know if you would like to take stab @markcampanelli. I'd be happy to do the boring parts of the translation. It's somehow getting easier for me the more I do it (and making me more furious about what I am seeing 😅 )

rkern · 2024-05-03T14:14:43Z

FWIW, "trust region" is more of an adjective than a ~~verb~~ noun. It's a general strategy to modify nonlinear optimizers that use local approximations. The ODRPACKs implement a trust-region Levenberg-Marquardt. It's not clear to me that the trust region optimizers that we have are directly usable. The key insight of ODRPACK is that the innermost part of the LM algorithm can be applied to the ODR problem (which expands the number of parameters it needs to solve for from np to np+nx) efficiently by making use of the specific structure of the Jacobian matrix that's used to determine the LM step. It's not clear to me if the trust region methods we have implemented make use of the same Jacobian matrix or that we could intervene that deeply inside of our implementations of them to make use of the special structure.

Zwolak2006 has a clear explanation of the ODRPACK95 algorithm.

markcampanelli · 2024-05-03T14:24:29Z

@rkern Thanks for shining a bit of light forward. I will read the paper and decide if I think I can do this. Making time is perhaps the biggest challenge, of course.

Out of curiosity, is the scipy team settled on preferentially using, say, Cython for compiled implementations? Being able to write some high performance Rust would be a deal sweetener for me, but C/Cython wouldn’t necessarily be a deal breaker.

rgommers · 2024-05-03T14:38:51Z

Out of curiosity, is the scipy team settled on preferentially using, say, Cython for compiled implementations? Being able to write some high performance Rust would be a deal sweetener for me, but C/Cython wouldn’t necessarily be a deal breaker.

Cython/C/C++ are all acceptable. We're more and more coming to the conclusion that Cython generates binaries that are way too large, so C/C++ is nicer if you're equally comfortable there. If the code supports only float64, C is straightforward. If templating over dtypes is needed, C++ tends to be nicer.

Rust: I suspect that eventually we'll get there, but it's a big lift and we don't have the build infrastructure for it nor done the experiments to figure out what the total impact would be. Maybe in a couple of years.

GuiiFerrari · 2024-05-03T18:01:00Z

I can try to rewrite the code in python. @markcampanelli if you don't mind I can help you...

jhdesantana · 2024-05-03T18:07:55Z

@markcampanelli @GuiiFerrari, I am also interested in volunteering to help port ODRPACK95 to scipy.

dschmitz89 · 2024-05-04T08:13:21Z

Had not expected to see so much enthusiasm for porting ODRPACK to another language. This is highly appreciated folks! Feel free to coordinate a little here :)

HugoMVale · 2024-05-09T14:30:06Z

The ODRPACK95 code is actually very well structured in a module; nothing like older F77 code. Modern Fortran allows easy interfacing between C and Fortran (either direction). This being so, what is the advantage of converting well-written and validated code to C when the same result can be achieved just by writing the C bindings?
You can see a nice and recent example for PRIMA.

ilayn · 2024-05-09T14:37:47Z

We are quite aware of PRIMA but every Fortran user who thinks it is a trivial task to do the bindings when attempt it quickly change their mind. it is not that straightforward and @zaikunzhang can comment on it better as the author of PRIMA. Also see #18118

HugoMVale · 2024-05-09T14:47:33Z

Doing the binding cannot be more complex than rewriting and validating the whole code! ;)
There is only one function to interface with.
If there is willingness from scipy to accept this route and the matter is not for yesterday, I could attempt it.

      SUBROUTINE ODR
     &   (FCN,
     &   N,M,NP,NQ,
     &   BETA,
     &   Y,X,
     &   DELTA,
     &   WE,WD,
     &   IFIXB,IFIXX,
     &   JOB,NDIGIT,TAUFAC,
     &   SSTOL,PARTOL,MAXIT,
     &   IPRINT,LUNERR,LUNRPT,
     &   STPB,STPD,
     &   SCLB,SCLD,
     &   WORK,IWORK,
     &   INFO,
     &   LOWER,UPPER)

C...Routine names used as subprogram arguments
C   FCN:     The user-supplied subroutine for evaluating the model.

C...Variable definitions (alphabetically)
C   BETA:    The function parameters.
C   DELTA:   The initial error in the X data
C   IFIXB:   The values designating whether the elements of BETA are 
C            fixed at their input values or not.
C   IFIXX:   The values designating whether the elements of X are 
C            fixed at their input values or not.
C   INFO:    The variable designating why the computations were stopped.
C   IPRINT:  The print control variable.
C   IWORK:   The integer work space.
C   JOB:     The variable controlling problem initialization and 
C            computational method.
C   LOWER:   The lower bound on BETA.
C   LUNERR:  The logical unit number for error messages.
C   LUNRPT:  The logical unit number for computation reports.
C   M:       The number of columns of data in the explanatory variable.
C   MAXIT:   The maximum number of iterations allowed.
C   N:       The number of observations.
C   NDIGIT:  The number of accurate digits in the function results, as
C            supplied by the user.
C   NP:      The number of function parameters.
C   NQ:      The number of responses per observation.
C   PARTOL:  The parameter convergence stopping tolerance.
C   SCLB:    The scaling values for BETA.
C   SCLD:    The scaling values for DELTA.
C   STPB:    The relative step for computing finite difference
C            derivatives with respect to BETA.
C   STPD:    The relative step for computing finite difference
C            derivatives with respect to DELTA.
C   SSTOL:   The sum-of-squares convergence stopping tolerance.
C   TAUFAC:  The factor used to compute the initial trust region 
C            diameter.
C   UPPER:   The upper bound on BETA.
C   WD:      The DELTA weights.
C   WD1:     A dummy array used when WD(1,1,1)=0.0E0_R8.
C   WE:      The EPSILON weights.
C   WORK:    The REAL (KIND=R8) work space.
C   X:       The explanatory variable.
C   Y:       The dependent variable.  Unused when the model is implicit.

ilayn · 2024-05-09T14:58:17Z

Doing the binding cannot be more complex than rewriting and validating the whole code! ;)

You would be surprised. Especially when half of the functions are home-grown (but outdated) LAPACK functions. But I am too invested in removing F77 code from SciPy so I should not comment.

ev-br · 2024-05-09T15:12:20Z

As a matter of fact, there exists at least one scipy dev who would be interested in seeing the result (me).

First of all, note that this interest does not guarantee the result is going to be accepted. The value of the exercise is to help us judge the pros and cons of this way of binding F90 code, to be considered together with other pros and cons of maintaining it. Even if it's not going to make it into scipy proper, a non-trivial real world worked example would be very useful in many circumstances.

So, if you're still interested :-), it'd be great to understand:

how the binding actually look like
how robust the result is (Windows, MacOS etc)
binary size
performance penalty
copying behavior (F vs C arrays, can you control copies)

zaikunzhang · 2024-05-09T15:45:16Z

The Python binding of PRIMA has been finished, thanks to the precious contribution of @nbelakovski.

The binding is in fact not that complicated. It works on all platforms, and its build system does not depend on anything that is "dirty" or outdated as far as I can see.

Note that PRIMA is a comprehensive package for (derivative-free) optimization. It needs to bind not only one but five solvers. This increases the complexity considerably.

I hope PRIMA provides an example of binding modern Fortran code to Python.

HugoMVale · 2024-05-09T16:58:36Z

Ok, I will give it a try. ;)
I've set up a project here. I'll keep you posted on the progress.

rgommers · 2024-05-10T20:03:50Z

Sounds good! I think it's okay to replace old F77 code with new F95 code, as long as it's better quality and not a ton more code. That's a net win, and when we do get to rewriting it in another language (because we would really like a Fortran-free future!), that job will be easier to do when starting from better F95 code.

DEUSD · 2024-06-05T08:26:29Z

Hi @HugoMVale,
Will it be possible to define upper and lower limits as in the Fortran version? We need them to customize the configuration of our adjustment solution.
Thanks in advance for working on this issue.

rgommers added enhancement A new feature or improvement scipy.odr labels Feb 28, 2017

Rigel7 mentioned this issue Jul 31, 2017

scipy.odr lack support for multi-variable regression #7666

Closed

lucascolley changed the title ~~Upgrading scipy.odr to ODRPACK95~~ ENH: odr: upgrade to ODRPACK95 May 3, 2024

lucascolley changed the title ~~ENH: odr: upgrade to ODRPACK95~~ ENH: odr: upgrade to ODRPACK95 (port to Python or beyond...) May 5, 2024

ENH: odr: upgrade to ODRPACK95 (port to Python or beyond...) #7107

ENH: odr: upgrade to ODRPACK95 (port to Python or beyond...) #7107

Comments

markcampanelli commented Feb 28, 2017

rgommers commented Feb 28, 2017

rgommers commented Feb 28, 2017

markcampanelli commented Feb 28, 2017

rkern commented Feb 28, 2017

ev-br commented Feb 28, 2017 via email

rkern commented Feb 28, 2017

ev-br commented Feb 28, 2017 via email

rkern commented Feb 28, 2017

charris commented Feb 28, 2017

rkern commented Feb 28, 2017

markcampanelli commented Mar 11, 2017

mythsmith commented Jun 9, 2017 • edited

markcampanelli commented Jun 9, 2017

pv commented Jun 9, 2017 via email

markcampanelli commented Jun 9, 2017

pv commented Jun 9, 2017 via email

mythsmith commented Jun 9, 2017 • edited

Hi Daniele, ODRPACK95 is public domain. If you need any help with integration or other software development, I am available through my business: http://insilicalabs.com Jason

GuiiFerrari commented May 2, 2024

dschmitz89 commented May 3, 2024

markcampanelli commented May 3, 2024

ilayn commented May 3, 2024

rgommers commented May 3, 2024

ilayn commented May 3, 2024 • edited

rkern commented May 3, 2024 • edited

markcampanelli commented May 3, 2024

rgommers commented May 3, 2024

GuiiFerrari commented May 3, 2024

jhdesantana commented May 3, 2024

dschmitz89 commented May 4, 2024

HugoMVale commented May 9, 2024

ilayn commented May 9, 2024

HugoMVale commented May 9, 2024

ilayn commented May 9, 2024 • edited

ev-br commented May 9, 2024

zaikunzhang commented May 9, 2024 • edited

HugoMVale commented May 9, 2024

rgommers commented May 10, 2024

DEUSD commented Jun 5, 2024

mythsmith commented Jun 9, 2017 •

edited

mythsmith commented Jun 9, 2017 •

edited

Hi Daniele,
ODRPACK95 is public domain.
If you need any help with integration or other software development, I am available through my business: http://insilicalabs.com
Jason

ilayn commented May 3, 2024 •

edited

rkern commented May 3, 2024 •

edited

ilayn commented May 9, 2024 •

edited

zaikunzhang commented May 9, 2024 •

edited