Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

METABUG: Copy-and-patch #588

Closed
28 of 51 tasks
mdboom opened this issue May 24, 2023 · 1 comment
Closed
28 of 51 tasks

METABUG: Copy-and-patch #588

mdboom opened this issue May 24, 2023 · 1 comment

Comments

@mdboom
Copy link
Contributor

mdboom commented May 24, 2023

Note
This issue is not for discussion, but to keep the various steps organized. To create new subtasks, create new issues and link to them from here by editing this comment.

Platform Support

  • We should probably add a CI step to be sure we're indeed testing the tier we think we are. Maybe just ./python -c 'import sysconfig; assert sysconfig.get_config_var("PY_SUPPORT_TIER") == "%{{ matrix.support_tier }}"'
    • This has no equivalent on Windows, so I've just added sys._support_tier for now and am using that instead.
  • The current CI for this should be cleaned up.
  • Discover installed LLVM tools in a good, cross-platform way.
  • Some runners have newer LLVMs installed, and are using those instead.
  • Release builds should build with PGO/LTO.

Tier One

Done. These are currently being tested in CI on copy-and-patch branches. The matrix includes both debug and release builds with stencils being generated by LLVM 14, 15, and 16.

  • i686-pc-windows-msvc/msvc
  • x86_64-pc-windows-msvc/msvc
  • x86_64-apple-darwin/clang
  • x86_64-unknown-linux-gnu/gcc

Tier Two

I consider this done. Since wide platform support is one of the selling points of copy-and-patch, seeing how the build system extends to these platforms is a good idea.

  • aarch64-apple-darwin/clang
    • No CI resources available, but all tests pass locally for the full matrix.
  • aarch64-unknown-linux-gnu/gcc
    • Using hardware emulation in CI. test_cmd_line, test_concurrent_futures, test_eintr, test_faulthandler, test_os, test_perf_profiler, test_posix, test_signal, test_socket, test_subprocess, and test_tools are being skipped since they fail under emulation (even on CPython main). I've verified that the all tests pass locally for the full matrix on native hardware.
  • aarch64-unknown-linux-gnu/clang
    • See above.
  • powerpc64le-unknown-linux-gnu/gcc
    • musttail produces an internal clang error (see "Upstream LLVM work" below). This probably won't happen until that issue is fixed.
  • x86_64-unknown-linux-gnu/clang

Tier Three

Interesting, but not planned at this time. Could be good projects for external contributors once the build steps have stabilized for tier two. The wasm32 builds sound... "fun".

  • aarch64-pc-windows-msvc/msvc
  • armv7l-unknown-linux-gnueabihf/gcc
  • powerpc64le-unknown-linux-gnu/clang
  • s390x-unknown-linux-gnu/gcc
  • wasm32-unknown-emscripten/clang
  • wasm32-unknown-wasi/clang
  • x86_64-unknown-freebsd/clang

Benchmarking

  • Install LLVM (any of 14, 15, or 16) on at least one of our benchmarking machines.
  • Get comparisons/stats vs current main. It doesn't have to be faster yet, but it would help to know where we stand (nice, only 1.75% slower, even with a naive tracing implementation and no speed tricks).

Upstream LLVM Work

  • musttail + ghccc + aarch64 produces an internal clang error.
  • musttail + powerpc64le produces an internal clang error.
  • We have to compile with -fomit-frame-pointer, since the GHC calling convention uses %rbp as an argument-passing register. This feels like a bug.
  • It would be nice if clang supported __attribute__((ghccc)).
  • --elf-output-style=JSON isn't supported for COFF and Mach-O, but basically works (it prints slightly broken JSON that can be recovered using string replacement). It would be really nice if it were properly supported:
    • Mach-O
    • COFF

3.13 Integration

  • Start by rebasing current work on main to use the new optimizer/executor model, rather than the current specialization of JUMP_BACKWARD.
  • We currently pass lots of extra flags when compiling. It would be nice if we didn't have to:
    • -fno-asynchronous-unwind-tables
    • -fno-pic
    • -fno-stack-protector
    • -fomit-frame-pointer (see "Upstream LLVM work" above)
    • -g0
    • -mcmodel=large
  • Begin handling side exits and explore trace-tree management.
  • We still need a notion of relocation "types" when patching.
  • Maybe dump comments with a human-readable disassembly in Python/jit_stencils.h?

Other Interesting Ideas

  • Basic TOS caching for several items. This is hard and inefficient to get right, since every stack shrink/grow invalidates most of the cached values. It (sort of) works, but it doesn't appear to be a big win in its current form (so it's been disabled for now).
    • Benchmark what we have anyways.
  • Try caching the bottom values on the stack. This requires compiling several stencil variants for different stack sizes and choosing the right one, but requires much less invalidation logic (since the mapping of registers to stack slots never changes).
  • Our stencils don't benefit from PGO/LTO, so we should either explore how difficult it is to get this to work, or manually add likely/unlikely attributes to the template scaffolding.
  • Maybe get cross-builds working? The emulated tier 2 platforms are super slow...
@brandtbucher
Copy link
Member

Closing and moving this tracking to the various new issues over at on the CPython repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants