Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a Python reference implementation #34

Open
nbelakovski opened this issue Jul 8, 2023 · 5 comments
Open

Creating a Python reference implementation #34

nbelakovski opened this issue Jul 8, 2023 · 5 comments
Labels
python Issues related to the Python interface or implementation

Comments

@nbelakovski
Copy link
Contributor

nbelakovski commented Jul 8, 2023

Hi all,

I've been discussing with @zaikunzhang about creating a Python reference implementation and now I'm making it official by creating this issue!

I think a Python reference implementation would be a great asset. @zaikunzhang mentioned readability several times in his initial post on the Fortran discourse, and indeed many references to the idea that the code should be as similar as possible to the mathematical presentation. I'd argue that Python is an even better language for this than Fortran, but the goal here is not to replace one with the other, both are necessary for separate reasons.

There's some questions about how to go about this. I think that as a starting point COBYLA is the best place since the current version of SciPy contains the old F77 COBYLA implementation so we could use that for a few tests as a bootstrapping with (not for long term, just to get started. Long term it should probably be tested against the modern Fortran implementation).

@zaikunzhang suggested using tools to automatically translate the Fortran code into Python. The idea is that automatic translation might be easier to verify, but I'm not so sure about this for two reasons. First of all, I'm not aware of the existence of tools to translate Fortran to Python. I've done some Googling and not found anything compelling (i.e. something that didn't look like someone's toy project from a few years ago). I figured the most likely path for such a translation would be via LFortran and LLVM, but no luck there either. Secondly, regardless of how the code is created, the verification checks should be the same regardless.

One other point I'd like to bring up for discussion (and I will summarize these at the end) is the use of language features. I think @zaikunzhang wants to limit these, but we're entering into vague territory here and additional detail would be welcome. For example, Python has a number of features that improve readability, like lambda expressions, list comprehensions, numpy's @ symbol for matmul, etc., most of which I think are either readily understandable or quickly understandable with a little documentation. There are some more complicated ones, like the yield expression, which I would avoid unless it was both very intuitive AND the alternative would be cumbersome. If one chooses to use fancy language features when there are non-cumbersome alternatives, well that's just showing off 🤓.

So to summarize, here are 3 points for discussion:

  1. Are there tools for translating Fortran to Python that I'm not aware of that could help?
  2. What does verification look like? Should we just find a way to add the Python implementation to (some of) the existing Fortran tests?
  3. Re: language features, how should we frame our thinking on them? Perhaps @zaikunzhang has an example of a Fortran language feature he has explicitly chosen not to use for sake of readability?

Looking forward to working on this with you all!

@zaikunzhang
Copy link
Member

Hi all,

Thank @nbelakovski very much for the initiative!

I am quite busy until next weekend. So please excuse my laziness in giving the following short response.

In brief, both C and Python implementations are needed, but it will be challenging for one person to work on both at the same time.

I agree that the Python implementation will be more readable and easier to understand.

However, as mentioned by @nbelakovski ,

  1. We have to think about how to verify the correctness of the Python code.
  2. I prefer to have a tool to translate (most of) the Fortran code to Python. I know it is possible but nontrivial. It has been done for MATLAB using Ocml scripts. The result is at

https://github.com/libprima/prima/tree/main/matlab/interfaces/%2Bnewuoa_mat/private

In this way, it would be much easier to guarantee the correctness of the code.

The translation is possible because the modern Fortran code has a syntax quite close to MATLAB or Python. I tried to avoid any Fortran-specific constructs (language-specific constructs should be avoided in the Python or C implementation as much as possible). An example is the implied-do loop. It is surely possible to translate this into other languages, but it would not be so straightforward (although possible) to automate the translation.

I also avoided using derived types (structures) in the Fortran implementation. Again, it is surely possible to translate derived types to other languages, but it will become an obstacle if you want to automate the translation, e.g., using Ocml scripts. Derived types are also nontrivial when interfacing with other languages.

Many thanks,
Zaikun

P.S.: Here are a few links I shared with @nbelakovski and motivated this discussion.

  1. Something I wrote when I started to work on PRIMA:
    https://fortran-lang.discourse.group/t/fortran-an-ideal-language-for-writing-templates-of-numerical-algorithms/2054

  2. Coding as a way of documenting human knowledge:
    https://everythingfunctional.wordpress.com/2021/01/19/communicating-with-whom/

  3. Writing slow Go code:
    https://fortran-lang.discourse.group/t/writing-slower-go-programs-bitfield-consulting/5733

@nbelakovski
Copy link
Contributor Author

Thanks for the response and the example. These Ocml scripts you used to do the MATLAB translation, did you write them yourself? Are they available somewhere for inspection?

@zaikunzhang
Copy link
Member

It was written by a student. I can share it with you privately.

@nbelakovski
Copy link
Contributor Author

I took a look at them. I don't know Ocml so I can't dive into it too deeply, but from looking at the shell scripts used to run it and the comments, it seems like a fragile setup. Overall I'm skeptical of the auto-translation approach unless it's using existing tools. If you're making your own tools to do it, you basically have to go halfway towards building a compiler which is a very big task.

It definitely can't be a long term solution to maintaining multiple languages. The languages will evolve and the auto-translator would have to evolve with them. Without keeping up to date you'll be limited to a subset of the language and that will ultimately hurt readability.

As a short term solution to get started on the translation I think it would take longer to get these Ocml scripts working than just manually translating things to Python.

In the case of Fortran -> C translation things might be different, since I see that lfortran has a --show-c option. It's not working on the codebase at the moment because lfortran fails to compile the following short program due to the _RP usage, but hopefully this is something that can be fixed in the future (or maybe for translation the _RP could be removed in a local copy).

program a
  integer, parameter :: RP = kind(0.0)
  real(RP), parameter :: b = 1.0_RP
  print *, b

end program a

In this case translating fortran->c could be a good starting point to make initial progress, but I think the resulting code would still need to be hand-modified. Looking at lfortran translations of some other simple fortran programs shows that it's adding a header for "fortran intrinsics" and carrying over some fortran concepts with I think we would ultimately not want in a C translation.

This also makes me wonder about testing. Obviously it would be a pain to maintain a test suite in 3 languages. Perhaps the main test suite could be in Python and could decide at runtime whether to test the Python implementation, or the Fortran/C implementations via binding, but I'm getting ahead of myself here. We can cross that bridge when we get to it.

@zaikunzhang
Copy link
Member

zaikunzhang commented Jul 31, 2023

I took a look at them. I don't know Ocml so I can't dive into it too deeply, but from looking at the shell scripts used to run it and the comments, it seems like a fragile setup. Overall I'm skeptical of the auto-translation approach unless it's using existing tools. If you're making your own tools to do it, you basically have to go halfway towards building a compiler which is a very big task.

It definitely can't be a long term solution to maintaining multiple languages. The languages will evolve and the auto-translator would have to evolve with them. Without keeping up to date you'll be limited to a subset of the language and that will ultimately hurt readability.

To @nbelakovski and anyone interested in translating PRIMA to other languages,

I would like to reiterate this point.

We should not expect that the automatic translation will create a version of PRIMA that works immediately after the translation. Of course, we are happy if it does, but that is very difficult to achieve. The purpose of the automatic translation is to have a first version that is close to being correct and will work after minimal manual modifications (yes, manual modifications are expected). This is how the matlab version of NEWUOA was produced.

The advantage of automatic translation is that it has a much lower probability of containing bugs compared with manual translation. Let us do a simple calculation. Suppose that the package has $N$ lines of code, and a human translator commits a mistake with probability $\epsilon$ on each line independently. Then the probability of having at least one bug is $1-(1-\epsilon)^N$. With $N = 10^4$ and $\epsilon = 10^{-4}$, the probability of over $0.6$. Indeed, I will be extremely happy if my probability of committing a mistake in each line is as low as $10^{-4}$. Even worse, it will be extremely difficult to locate a mistake in $10^4$ lines of code once it is committed (I was very lucky to recognize that the definition of weight in cobyla/geometry.py is wrong, but we cannot expect that we will be always so lucky). On the other hand, as long as the automatic translator handles a certain number ($<< 10^4$) of patterns (e.g., if ... then ..., matprod, subroutines, intents...) properly, then the translation is correct. Note that I am not claiming that it is easy to have a translator handling the patterns properly --- it is hard, but it will work correctly forever once we have it. I do not see anything that is more beneficial in the long term.

What should we do if some updates are made in the Fortran code and we want to do the same in the translated code? Should we run the translator again? Of course not. We should make the updates manually. Once we have the first correctly translated version, any updates will become much easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Issues related to the Python interface or implementation
Projects
None yet
Development

No branches or pull requests

2 participants