Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priorities within CPython development for CZI proposal? #26

Closed
brainwane opened this issue Jun 11, 2020 · 31 comments
Closed

Priorities within CPython development for CZI proposal? #26

brainwane opened this issue Jun 11, 2020 · 31 comments

Comments

@brainwane
Copy link
Contributor

TL;DR: what are a few priority areas for CPython development where one full-time person or a few part-time people, working for one year, could make a big difference? Let's decide by July 7th and apply for CZI money.

Chan Zuckerberg Initiative's Essential Open Source for Science grant is going to open their next funding cycle to applications soon -- June 16th to August 4th, per the request for applications.

The PSF can apply for $50k-$250k USD for a 1-year project. The PSF has successfully gotten one of these grants already: $200,000 to improve the user experience and debuggability of pip (more details), and we've been welcomed to apply again.

I think the PSF should consider proposing at least one project that improves CPython. In my experience, projects need at least a few weeks to put together a good application, sometimes a month. So I would love to use this issue to generate some ideas, and then, by July 7th, have a consensus, aided by the Steering Council, on what to pursue. Then the new Project Funding Working Group (currently being formed) can help with advising proposal-writers and helping them get Board and/or Ewa's approval, and help get the proposal submitted before August 4th.

I suggest a few criteria:

  1. CPython maintainers want it: there's already consensus among CPython devs, and if a PEP is involved it's been accepted.
  2. fairly well-scoped
  3. fundable: would happen much faster if the PSF got funding to implement the work. (So, it has to be legal and physically possible.)
  4. applicable to biomedical research: so, for example, performance and reproducibility work is probably more relevant here than work on security or improving Python's teachability in the classroom.

My three suggestions to kick off discussion are:

  1. make a chunk of progress on the GILectomy
  2. make a proper start on property-based testing for Python builtins and the standard library (per this year's Language Summit)
  3. hire a full-time core workflow manager and coordinator for one year

(Budget estimate: If we ask for and get $250K, and the PSF takes 15% overhead, we get $212,500. If we assume that PSF hires contractors for this work at an hourly pay rate at USD$115-$200 per hour, that's enough for about 1000 to 1800 hours of work. CZI funds one-year projects, so, at that pay rate, this ends up being the range between one half-time person and one full-time person.)

@gvanrossum
Copy link
Member

I'm just a lurker now, but why biomedical research?

And some ideas for the brainstorm:

In Python 3.10 we'll be able to use the full power of the PEG parser. But 3rd party tooling like Black doesn't have a PEG parser yet. We could work on a 3rd party PEG parser suitable to replace lib2to3.

Improve mypy -- it may be losing much of its corporate funding, and we need to upgrade the type system to support numpy.

More directly CPython: make it work properly on mobile and ensure that it keeps working there. ( And in web browsers for that matter.)

Burn down the list of open PRs and issues, and design a new workflow to prevent them from getting so clogged again. (Is the bpo -> GitHub transition funded yet?)

Modernize the buildbot farm (yes, we need OS diversity, but many of the hosts are ancient and too slow or memory starved to be useful). Maybe some mobile 'bots, too.

Fund Serhiy for a year to make improvements across the board.

Accelerate work on HPy.

Incorporate several features from trio into asyncio (nurseries, multi-exceptions), and other asyncio work (better ways of connecting asyncio and threads?).

Improve the docs (maybe migrate to markdown???).

@v-python
Copy link

v-python commented Jun 11, 2020 via email

@brainwane
Copy link
Contributor Author

The reason that relevance to biomedical research is important for this particular question, right now is because the August 4th deadline is to ask for money from Chan Zuckerberg Initiative's Essential Open Source Software for Science program which "invites applications for open source software projects that are essential to biomedical research". The previous requests for applications explain:

CZI currently supports several areas of basic science and technology with the goal of making it possible to cure, prevent, or manage all diseases by the end of this century. This program aims to support software tools that are essential to this mission. Applications for two broad categories of open source software projects will be considered in scope:

  • Domain-specific software for analyzing, visualizing, and otherwise working with the specific data types that arise in biomedical science (e.g., genomic sequences, microscopy images, molecular structures). Software will be considered out of scope if it primarily serves domains outside biomedical science (e.g., physics, astronomy, earth sciences). While we appreciate that other communities may want to explore new extensions of their software to the life sciences, such applications are unlikely to be selected.
  • Foundational tools and infrastructure that enable a wide variety of downstream software across several domains of science and computational research (e.g., numerical computation, data structures, workflows, reproducibility). While foundational tools will be considered in scope for this program, they must have demonstrated impact on some area(s) of biomedical research.

The more persuasively that our grant application can say "here is how this will help the biomedical researchers who use Python," the more likely we are to get money from this program.

I remember @freakboy3742's Language Summit presentation on Python on mobile this year -- Russell, if you can speak up with a paragraph about why this is particularly interesting from a biomedical research perspective, I'd love that!

@ericholscher
Copy link

I'd vote strongly for this:

Burn down the list of open PRs and issues, and design a new workflow to prevent them from getting so clogged again. (Is the bpo -> GitHub transition funded yet?)

Something similar to the Django fellow, funded for a year as a test would be huge. We only then need to sell the larger story of Python being impactful to biomedical research, which is obviously true. I know there are other groups that have applied for this grant for this type of funding in the past, so I can definitely help share that knowledge from the Django & historical CZI grant side. 👍

@brettcannon
Copy link
Member

Is the bpo -> GitHub transition funded yet?

Yes, it's funded. We are going through job applicants for the PM position now and we have an initial list of candidates to join the WG to manage the transition.

@warsaw
Copy link
Member

warsaw commented Jun 12, 2020

Accelerate work on HPy.
Improve mypy -- it may be losing much of its corporate funding, and we need to upgrade the type system to support numpy.

I'll put in votes for these.

@gvanrossum
Copy link
Member

And what are your own brainstorm ideas?

@freakboy3742
Copy link

@brainwane

Here's an attempt at a pitch for why "mobile python" is of interest to science/biomed


Python is already a well established tool in science and biomedical research, where it is used for data analysis, visualization, machine learning and pattern recognition. Scientists and researchers also use Python to develop and maintain complex database-backed websites in support of their research goals. However, this usage is largely restricted to laptops and servers. The most widely used types of computing device - phones and tablets - are not currently well served by the Python ecosystem.

At present, developing apps for phones and tablets generally requires specialist skills, and often requires mastering multiple programming languages. As a result, developing a mobile application to support their research isn't an option available to most researchers. A "mobile enabled" Python would enable scientists and researchers to leverage their existing programming skills to develop mobile applications. These applications could simply make existing data analysis and visualization techniques available on new platforms; or they could combine these techniques with the unique capabilities of mobile devices, such as photo capture, geolocation, and augmented reality. This would open up new opportunities for in-the-field data gathering and analysis that hasn't been previously possible.


If I was to pitch potential projects for the grant with a mobile focus:

  • Officially integrating iOS and Android into the CPython build
  • Developing iOS and Android wheel formats that are compatible with mobile app distribution.
  • Developing a "Minimal Viable Python" (a minimal Python install, with standard library pieces being opt-in)

@gpshead
Copy link
Member

gpshead commented Jun 14, 2020 via email

@gpshead
Copy link
Member

gpshead commented Jun 14, 2020 via email

@bskinn
Copy link
Contributor

bskinn commented Jun 14, 2020

I know absolutely nothing about the complexities of developing for mobile, but FWIW I can freely pip install in the Python that comes with Termux, for Android.

Not everything works, but everything I've tried will install.

Further, as long as I pkg install clang (and maybe a couple of other of Termux system packages) first, I'm able to pip install coverage and get the compiled C extensions.

So, maybe this is irrelevant to the current conversation, but superficially it seems to argue against @gpshead's position.

@freakboy3742
Copy link

@gpshead I completely agree that the workflow for using Python on a mobile device would be different. However, that doesn't mean that wheels aren't possible, or wouldn't be useful. If anything, they're more useful on mobile platforms because of the way mobile apps need to operate.

Going into specifics will likely rathole this entire discussion. I'm happy to elaborate; let me know where the better place would be.

Suffice to say that the three points I listed are the pain points that BeeWare has with CPython at present. At the core of all of them is elevating "interpreter embedded in an app sandbox" as a first-class distribution story for Python - i.e., a Python installation story that doesn't include or involve python.exe. And while that story is the only way Python works on mobile, it's also a useful story for standalone app distribution on desktop platforms.

@gpshead
Copy link
Member

gpshead commented Jun 14, 2020 via email

@gvanrossum
Copy link
Member

gvanrossum commented Jun 15, 2020 via email

@freakboy3742
Copy link

@gvanrossum Yes, that's the intended use case. End-user wheel installs may not even be possible. I believe Apple's App Store guidelines would reject an app that allowed the installation of wheels (and especially binary wheels) after the app has gone through review.

@warsaw
Copy link
Member

warsaw commented Jun 15, 2020

Ideas:

  • Make single-file app distribution story official. I'm not talking about zipapps like shiv which although great, still require unpacking in order to support shared libraries. I'm talking about something much more akin to PyOxidizer.
  • Support development and inclusion into Python of Cython or a Cython-like tool which lets us strongly direct extension writers away from using the C API directly. This could be done in conjunction with the HPy work.

@brettcannon
Copy link
Member

Quick update: we discussed potential ideas to submit proposals for. We want to wait until after our next meeting next week when everyone is able to attend before we publicly state what we think the priorities should be.

Feel free to continue to discuss things until then.

@brainwane
Copy link
Contributor Author

Thanks @brettcannon!

Project Funding WG has a work-in-progress list of funders. So if core developers and the Steering Council come up with some great ideas that aren't well suited to a CZI application, it's worth collecting those because some might be well-suited to a Mozilla Open Source Support Award application, or a Comcast Innovation Fund application, etc.

@brainwane
Copy link
Contributor Author

Also, @brettcannon, I presume that once the Steering Council publishes the Vision Deck that was mentioned in a past update, that would be helpful for this discussion and related funding-seeking discussions. Will we be seeing that soon?

@brettcannon
Copy link
Member

The vision deck was scrapped for plans to present at PyCon US on the topic, which then got dashed due to funding concerns for the PSF. I suspect, though, the list we present back to you all will encompass what we would have put in that document and presentation.

@vstinner
Copy link
Member

I consider that the performance of the Python runtime matters to ensure that Python will remain relevant in 5 or 10 years. I identified three projects which are realistic:

  • Subinterpreters: rework Python internals to get one "GIL" per interpreter. Be able to run multiple Python programs in parallel to be able to use multiple CPUs. For example, run one interpreter per thread and have as many interpreters as CPUs. PEP 554 is related to this.
  • Hide implementation details from the C API: see my PEP draft: https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst
  • HPy: https://github.com/pyhandle/hpy New C API written correctly from the start. It's related to the previous project, but with a different approach, it's a different tradeoff in terms of backward compatibility (the new C API is fully incompatible on purpose).

@brainwane
Copy link
Contributor Author

@brettcannon wrote on June 15th:

Quick update: we discussed potential ideas to submit proposals for. We want to wait until after our next meeting next week when everyone is able to attend before we publicly state what we think the priorities should be.

Feel free to continue to discuss things until then.

Thanks! Should we expect the Steering Council's public statement of priorities soon? In order to submit a good proposal by August 4th, I figure we need either of these 2:

  1. by ~July 7th: the Steering Council designates 1-2 things and says "let's apply for funding for these"
  2. by ~July 1st: the Steering Council says, more broadly, "here are some priority areas", and collaborates with core Python developers on python-dev and/or Discourse to narrow this down to 1-2 specific things by ~July 7th

(I'm saying "thing" instead of "project" because of ideas like "hire a person for a year to do general code review/issue wrangling" which is a fundable thing but not a project.)

@brettcannon
Copy link
Member

Unfortunately we weren't able to get to this topic today as something more pressing came up and took up the whole meeting. I'll start an email thread to see what we can pull together.

And I am going to take from that last comment, @brainwane , that you are after a short list by July 7.

@brainwane
Copy link
Contributor Author

@brettcannon thanks for the update.

And I am going to take from that last comment, @brainwane , that you are after a short list by July 7.

Yes, I think that would be great, thanks.

In case this helps: I have faith that, whatever the topic area is, as long as CPython maintainers want it and it's applicable to biomedical research, we can find a way to scope work and make it feasible and plausible as a proposal.

@brainwane
Copy link
Contributor Author

@brettcannon should we expect a short list early this week? Thanks!

@brettcannon
Copy link
Member

Just finished our meeting and the two things we would suggest proposing are:

  1. Core developer in residence; help with the PR backlog, issue triaging, improving the development workflow, etc.
  2. Single binary distributions, i.e. developing the tools necessary so you can compile CPython and all of your dependencies into a single binary and that's what you send people

Let us know if you need any more clarifications on those.

@brainwane
Copy link
Contributor Author

Thanks @brettcannon -- very helpful!

We now have three weeks to try to find proposal-writers, write the proposal, edit, get Board approval, and submit. I hope that the Project Funding Working Group's members can do a lot of the lifting on writing this proposal, but that's not certain. I'm going to close this issue now because the Steering Council has set its priorities for this, but anyone who's interested in writing even a few paragraphs about why this is important and how much work it would take, please reply to this issue to volunteer.

@xmunoz
Copy link

xmunoz commented Jul 15, 2020

@warsaw @brettcannon

Make single-file app distribution story official. I'm not talking about zipapps like shiv which although great, still require unpacking in order to support shared libraries. I'm talking about something much more akin to PyOxidizer.

Is something like pants pex in the same vein as shiv? Looks like pex has the ability to generate python executables, but not sure if these executables fit the "first-class distribution story" described above.

@warsaw
Copy link
Member

warsaw commented Jul 15, 2020

shiv is a modernization of pex. At my job, we were using pex for tarball distributions (not really single file executables, see below), but pex had lots of performance problems, mostly due (IMHO) to its backward compatibility requirements. shiv dropped all that, supporting only Python 3 and using modern techniques and libraries (e.g. importlib.resources instead of pkg_resources) to get good performance.

As nice as shiv is, I wouldn't classify it as a "single file executable". Both pex and shiv are fundamentally tarballs with a special shebang line that Python knows how to execute, but 1) you still have to have the Python binaries installed out of band; 2) you still have to unpack the tarball to be able to import extension module shared libraries (since dlopen() can only link to physical file system files).

I want something like what PyOxidizer does, where you don't have to install anything else out-of-band or otherwise, and you don't have to unpack anything the first time you run it. It's just an executable that you can ship around and users wouldn't even have to know it was written in Python!

@brainwane
Copy link
Contributor Author

Thanks to @xmunoz for leading the writing of a grant proposal requesting that CZI support a core developer in residence for one year. The earliest we'll hear back on whether the PSF's proposal was accepted would be November and the earliest the project would start would be January 2021.

@xmunoz
Copy link

xmunoz commented Oct 22, 2020

Sad news fam, our proposal was not accepted :(
Screenshot from 2020-10-22 11-58-48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests