Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RfC: Make pythran officially a dependency #18683

Open
ilayn opened this issue Jun 15, 2023 · 10 comments
Open

RfC: Make pythran officially a dependency #18683

ilayn opened this issue Jun 15, 2023 · 10 comments
Labels
Cython Issues with the internal Cython code base DX Everything related to making the experience of working on SciPy more pleasant Pythran Items related to internal Pythran code base RFC Request for Comments; typically used to gather feedback for a substantial change proposal

Comments

@ilayn
Copy link
Member

ilayn commented Jun 15, 2023

As reignited by the comments in #18679 , we think that it is time to lock the switch at on position regarding SciPy using Pythran unconditionally. Currently one can switch off the Pythran usage through a flag and use the non-Pythran code branch. This creates a double implementation burden when we have a valid Pythran code to be included.

As some of us prefer working with it to Cython or other tooling, this can actually increase Pythran usage in the codebase without the doubling necessity.

We would like to gather more feedback around what the consequences are, pros/cons, feasibility, tooling etc.

cc: @mdhaber @h-vetinari @rgommers

@ilayn ilayn added Cython Issues with the internal Cython code base Pythran Items related to internal Pythran code base DX Everything related to making the experience of working on SciPy more pleasant labels Jun 15, 2023
@rgommers
Copy link
Member

Cc @serge-sans-paille as Pythran author.

This creates a double implementation burden when we have a valid Pythran code to be included.

Given that Pythran code is valid Python code, it "just works" usually as pure Python and there's no double implementation - it'll just be 10-100x slower.

One bit of extra boiler plate is in the build system, where we need to handle the on/off options. E.g.:

if use_pythran
  _rbfinterp_pythran = custom_target('_rbfinterp_pythran',
    output: ['_rbfinterp_pythran.cpp'],
    input: '_rbfinterp_pythran.py',
    command: [pythran, '-E', '@INPUT@', '-o', '@OUTDIR@/_rbfinterp_pythran.cpp']
  )

  _rbfinterp_pythran = py3.extension_module('_rbfinterp_pythran',
    _rbfinterp_pythran,
    cpp_args: cpp_args_pythran,
    dependencies: [pythran_dep, np_dep],
    link_args: version_link_args,
    install: true,
    subdir: 'scipy/interpolate'
  )
else
  py3.install_sources(
    ['_rbfinterp_pythran.py'],
    subdir: 'scipy/interpolate'
  )
endif

It isn't that much extra, but if we're going to use Pythran a lot more it will be good to streamline it.

The main concerns that I think we have left are:

  1. Possibility of regressions and fragile builds:
    • Unlike our other dependencies (Cython/NumPy/BLAS/LAPACK), Pythran has multiple transitive dependencies and that has caused problems multiple times already.
    • The use of modern C++ constructs has broken builds in more niche and untested platforms/configurations. Cython emits very boring C code and is more stable as a result.
  2. Implementation-wise, dealing with integers is still annoying. I think we've found that it's better to have a pure Python frontend to the API, and then only do the heavy lifting in Pythran. For Cython that's easier, because it automatically upcasts integer inputs (and float32 as well I believe).

Bus-factor wise I'm not that concerned - at least compared to Cython. Effectively they both have a bus factor 1 I'd say, which is a concern in general - but the price we seem to have to pay in order to get to use something more friendly than raw C/C++.

One thing I'm not sure about right now is impact on build time compared to using equivalent Cython - we should check whether that's significant.

Upsides of Pythran:

  • a little easier than Cython for folks only familiar with Python
  • smaller binaries produced typically (sometimes 5-10x smaller)
  • can be faster than Cython (often equivalent though, it depends on what functionality is used)
  • it's nice in a way that it's more restricted than Cython: for simple functions/kernels it is good; once one gets to classes or multi-file constructs that are unsupported in Pythran, it often turns out that there Cython is also pretty painful and everything is a special case. Using C/C++ directly at that point is often preferred.

@ilayn
Copy link
Member Author

ilayn commented Jun 15, 2023

Implementation-wise, dealing with integers is still annoying. I think we've found that it's better to have a pure Python frontend to the API, and then only do the heavy lifting in Pythran. For Cython that's easier, because it automatically upcasts integer inputs (and float32 as well I believe).

Upcasting is not done automatically but through fused types it can handle the code generation automatically. That is to say,

ctypedef fused myfused_t:
    int
    np.int32_t
    np.float32

def func(myfused_t a, myfused_t b):
    ...

will generate 3 versions separately, (and not the cartesian product of types) and select the right implementation based on the argument type. It can also be enforced at the caller.

For Pythran I think this is done via adding more type version lines at the entry but I am not sure.

@tupui
Copy link
Member

tupui commented Jun 15, 2023

Bus-factor wise I'm not that concerned - at least compared to Cython. Effectively they both have a bus factor 1 I'd say, which is a concern in general - but the price we seem to have to pay in order to get to use something more friendly than raw C/C++.

This is personally my main concern. Cython's bus factor is effectively a bit more than 1 if you look at the insights page as opposed to Pythran which is clearly 1. Also due to the popularity of Cython, there is a bit more guarantee in the sense that a lot of projects would have to help or do something if there were anything to happen.

But otherwise, I am +1 as it's very approachable compare to Cython and also provide good performances.

My only ask is that we add some clear doc on when to chose what. Right now it's not clear, and once you made a decision it's also not easy to know all the gotchas. We do have a doc for Cython, but not sure how up to date it is (update for the noexcept thing.)

@ilayn
Copy link
Member Author

ilayn commented Jun 15, 2023

My only ask is that we add some clear doc on when to chose what.

That's an excellent point, it actually took me quite a while to tame Cython to be used in conjunction with NumPy types. Cython documentation while comprehensive, never addresses the very tricky C-isms and assumes full affinity with C/C++ (not that anything wrong with that). But makes the entry bar too high. Pythran on the other hand does things very nicely but you don't get any feedback about what you did and requires you to dive into generated C code to see whether there is something not supported eating away the performance. The yellow annotation feature of Cython is top-notch.

Another issue with Cython is that version 3.0 is about to be released for the last 3 years. It's not fair to say anything bad about that since it's a very specialized package and capacity matters a lot. But for us many features/bug fixes are now locked in that alpha version. Hence there is some maintenance burden regarding that. Pythran is moving faster and breaking a lot but it is progressing bit by bit.

@rgommers
Copy link
Member

Let's just contribute to the Pythran docs as needed, and write guidance in our contributor guide that's as minimal as possible, pointing to the upstream docs. I don't think tutorial material like in http://scipy.github.io/devdocs/dev/contributor/cython.html#adding-cython belongs in the SciPy docs.

Another issue with Cython is that version 3.0 is about to be released for the last 3 years.

There was an announcement that the goal is to release before the SciPy'23 conference - so we should see a 3.0 release within a month or so🤞🏼.

@lucascolley lucascolley added RFC Request for Comments; typically used to gather feedback for a substantial change proposal needs-decision Items that need further discussion before they are merged or closed labels Dec 22, 2023
@h-vetinari
Copy link
Member

Now that Cython 3 has been out for a while, and our use of pythran has only grown (and could grow further, c.f. serge-sans-paille/pythran#2141), I think it's time to reconsider making our dependence on pythran official.

@ilayn
Copy link
Member Author

ilayn commented Jan 6, 2024

Not used it in a very long while but if the consensus is there (which seems so) then +1 from me. @mdhaber Any thoughts?

@rgommers
Copy link
Member

rgommers commented Jan 6, 2024

I'm +1 on this as well - we've had very few problems recently, and the ones that showed up were all resolved pretty smoothly.

@lucascolley lucascolley removed the needs-decision Items that need further discussion before they are merged or closed label Jan 6, 2024
@lucascolley
Copy link
Member

Sounds like this is ready for the mailing list?

@h-vetinari
Copy link
Member

While we get used to the discourse transition, I've made two posts:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython Issues with the internal Cython code base DX Everything related to making the experience of working on SciPy more pleasant Pythran Items related to internal Pythran code base RFC Request for Comments; typically used to gather feedback for a substantial change proposal
Projects
None yet
Development

No branches or pull requests

5 participants