Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hosting jit_stencils.h #115869

Open
brandtbucher opened this issue Feb 23, 2024 · 6 comments
Open

Hosting jit_stencils.h #115869

brandtbucher opened this issue Feb 23, 2024 · 6 comments
Labels
3.13 bugs and security fixes build The build process and cross-build dependencies Pull requests that update a dependency file

Comments

@brandtbucher
Copy link
Member

brandtbucher commented Feb 23, 2024

While this is probably desirable, I'm not quite sure if it's feasible. With that said, several people (@vstinner at the sprint and @zooba during PR review) both expressed a desire to remove the LLVM build-time dependency for JIT builds. Let's have that conversation here.

Background

When building CPython with the JIT enabled, LLVM 16 LLVM 18 is used to compile Tools/jit/template.c many times, and process the resulting object files into a file called jit_stencils.h in the build directory.

A useful analogy

Because this file depends on Python.h (and thus pyconfig.h and many build-specific configuration options, including things like _DEBUG/NDEBUG/Py_DEBUG/etc.) and contains binary code, it is probably most useful to think of jit_stencils.h as a binary extension module.

If we could build, host, and manage compiled versions of, say, itertoolsmodule.c somewhere and have it work correctly for those who need it, then such a scheme would probably work for jit_stencils.h.

Open questions

  • Can this be done in a way that actually works correctly and is worth the trouble (the status quo being "download LLVM 18 if you want to build the JIT").
  • Should we just try to host these for the most common build configurations? Or "everything"?
  • Should all platforms be in one file (with each platform guarded by #ifdefs), or many files?
  • Should these files be checked in? Or hosted somewhere? Who/what builds them? How often?
  • Does this introduce any new attack vectors?
  • What should the workflow look like for:
    • ...those developing the JIT?
    • ...those changing header files that the JIT depends on?
    • ...those building CPython with a JIT from a random commit?
    • ...those building CPython with a JIT from a release tag?
@brandtbucher brandtbucher added build The build process and cross-build 3.13 bugs and security fixes dependencies Pull requests that update a dependency file labels Feb 23, 2024
@terryjreedy
Copy link
Member

terryjreedy commented Feb 23, 2024

A couple of naive questions:

  1. Is the part of LLVM needed to produce jit_stencils.h small enough to consider extracting it somehow into our repository? And is the algorithm to do so stable enough to not necessarily be a maintenance burden?

  2. What is the risk of the status quo? If a change to LLVM broke our usage of it, would that necessarily be considered a breakage of LLVM itself? Do we only depend on documented (and hopefully tested) behavior?

@zooba
Copy link
Member

zooba commented Feb 26, 2024

Should all platforms be in one file (with each platform guarded by #ifdefs), or many files?

This one is easy enough for me to answer right now: many files. If we have to download all the platforms every time, it may as well just be checked into the main repo.

The advantage here is space saved by not including every platform in the main repo (and then not having to decide which ones we do include). If that's not an interesting saving, then this isn't a question about hosting, it's a question about caching the code.

(And just for context, my sensitivity to time taken by a new process that runs in every clean CI build is about 15 seconds. If it takes more than that, I want a workaround that takes less than that. And my sensitivity for local development is basically 0 more installers to run - time matters less because there'll be caching.)

@savannahostrowski
Copy link
Contributor

I'd be interested to understand folks' goals for this and any additional rationale. Are there specific pain points we hope to resolve by eliminating the dependency?

@zooba
Copy link
Member

zooba commented May 1, 2024

We don't like having build-time dependencies that can fail due to networking issues or installation issues. Virtually all our network access is only to github.com (the exception being apt install on Linux, which is the most flaky part of that build), and is only accessing our own repositories (where we implicitly trust the potential contributors).

Docs builds are also a minor exception for now, but those are being separated out in official builds so that our supply chain is as clean and tight as we can make it.

So basically, the goal is build reliability, and our way of achieving that is to have all of our build-time dependencies somewhere under github.com/python.

A secondary goal is build time, which is why we have checked in generated files and regen them when their sources change. This is primarily for local dev builds (CI has already gotten way out of hand, I don't think we'll ever get PR builds down to a reasonably fast check anymore, but that used to be a goal). We also don't assume that our contributors have the ability/desire to install additional apps beyond their system compilers, or that they have sufficient internet capacity for anything non-essential or large (besides system compilers).

So for the sake of contribution, we don't want contributors to have to locate/download/install anything outside of their main compiler unless it's scripted as a normal part of build and highly reliable.1

Footnotes

  1. Or at least "as reliable as the rest of our dependencies", which more or less means if GitHub is down, builds can fail, but we don't want that list getting longer - not even PyPI.

@vstinner
Copy link
Member

vstinner commented May 1, 2024

I'd be interested to understand folks' goals for this and any additional rationale. Are there specific pain points we hope to resolve by eliminating the dependency?

Usually Linux distributions only include one LLVM version, like LLVM 17 and clang 17 (version used by Fedora 39). Before, the Python JIT compiler required clang 16 and so it didn't work. Now it requires clang 18 and so it still doesn't work.

If Python source code (ex: in the Git repository) contains code generated by LLVM, Python doesn't have to attempt to use the same LLVM version than the one used by Debian (stable / old-stable), Ubuntu (latest / LTS), Fedora (Rawhide / stable), etc. Spoiler: there is no single LLVM version available on all Linux distributions if you consider all flavors (especially development version vs stable version).

@eli-schwartz
Copy link
Contributor

eli-schwartz commented May 1, 2024

Some distros actually do provide multiple LLVM versions, but those tend to be more "advanced" distros. Even there, llvm is generally a fairly hefty burden to install. Especially if it's the only software you have that uses llvm, because your system uses a GCC toolchain. In contrast, cpython is an extremely fundamental package that is used extensively by the system stack.

(llvm is currently needed by... okay, well, I do need llvm 17 for a) mesa and b) gnome gjs / cinnamon cjs, which use mozilla spidermonkey. I don't need any other version of llvm, and I wouldn't need either one of those either if I was running a server system.)

The real kicker is that llvm depends on cpython. If cpython also depends on llvm, then which one do you build first? Answer: you have to build cpython twice, once without the JIT and once with the JIT. Dependency cycles are dreary and depressing to deal with, and may not be able to be fully automated at all. They are best avoided if it is possible to do, and every package that has to be added into the bootstrap set for extra-special handling is an extra burden.

Hosting the stencils just like any other generated code cpython uses, would allow sidestepping this worry. No need to pull llvm into the bootstrap set or add special cases to build it twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes build The build process and cross-build dependencies Pull requests that update a dependency file
Projects
None yet
Development

No branches or pull requests

7 participants
@vstinner @zooba @eli-schwartz @savannahostrowski @terryjreedy @brandtbucher and others