Skip to content

Numba Vision Discussion

esc edited this page Mar 16, 2022 · 1 revision

NUMBA VISION

NOTE: will be a living document, so needs updates as what to changed.

Goals (Where do we want to go?)

  • lower the barrier to performance for Python devs

  • Minimal core package

  • a compiler toolkit for python

    • craete new Backends (HW and Synthetic)
    • extensible for many uses-cases (Users need to accept that they will need to do some work)
    • flexible enough to be extended
    • Supporting both NumPy and Python
    • covering large spectrum of usecase
  • Comprehensive coverage of Python basics

  • How can we make it easier for people to contribute?

  • Increase transparency: Roadmap/List of actively developed areas, queue of high-priority tasks

  • Releases continue to be produced consistently and the codebase remains as stable as possible

  • Well defined governance document with well established process

Specific/Concrete Goals

  • Simple and intuitive errors
  • Production qualify code base
  • So new contributors can be effective earlier
  • Compiler development features, e.g. Pipelines, Pass Managers etc..
  • Other needs related to the compiler toolkit
    • profiler
    • coverage tools
    • debugger --- cross hw, cross language (Python interpreted code/JIT compiled Pyython/C)
    • numba-scipy
    • numba-extras
  • Numba needs to be faster at compiling (mindful of not being slow)
  • Other languages and libraries, e.g. C++
  • Better AOT story
  • Caching
  • Structured Exceptions
  • Make sure Numba is stable and receievs updates for it's core dependencies: LLVM, NumPy (NEP 29), Python
  • A Functioning, sane and maintained performance monitoring / integration service. (Performance Stability)

Future Considerations

  • split NumPy into an extension (maybe)
  • transition to MLIR

Use-cases

  • Scientific
    • Numerical simulation
    • Compiler research
  • Hardware
    • Bootstrap a compiler for new hardware

Values (How do we get there?)

  • Meritocracy
  • Transparency
  • Reliability
  • Consistency

What Numba is?

I had this trifecta: you'd like (1) to get close to hardware speed, so ~100× faster than Python, (2) to support a whole programming language, and (3) have that language be Python. Numba does (1) and (3), PyPy does (2) and (3), Julia does (1) and (2).

That pigeonholes Numba's role: it's (1) and (3), which is a specific range of uses. (E.g. yes number-crunching, no web-serving.)

Who is it for?

  • It needs to be sufficiently extensible to cover the spectrum of use cases, including but not limited to:
    • Typical users, just wanting to @jit some numerical functions (many many users, the most common use case)
    • Those providing libraries for domain specific use (e.g. researchers - TARDIS-SN)
    • Those providing libraries for use in scientific computing as part of the numerical python scaffold (e.g. pydata-sparse)
    • Those writing more advanced libraries containing their own data types etc (e.g. AwkwardArray)
    • Compiler extenders wanting to write and explore new compiler use cases/needing a custom compiler (e.g. Bodo, omnisci-db)
    • Hardware vendors wanting to extend Numba support to their custom silicon (e.g. NVIDIA, Intel)

Community? Who uses Numba (Powered by page)

  • Bodo
    • Uses Numba as part of a high-performance data science platform.
    • Numba is a very critical component: 1) JIT is the main user interface; 2) Numba is used as a compiler toolkit and for implementing kernels (in addition to C++)

How we relate to other projects?

  • Luk: Julia is closest to Numba. Questions why Numba cannot do what Julia can.

Context

  • Need to consider that Numba the project is different to Numba the code base.
    • The project itself needs long term commitment and funding from industry:

      • The time it takes to get someone up to speed with the code base is large, particularly to the point where they can exercise judgement about the impact of a proposed change. Casual contributors are unlikely to be in a position to be able to assess most involved changes, this is similar to LLVM.
      • The maintenance burden is high and get’s higher:
        • Python/NumPy/LLVM impact this
        • New features impact this
        • Extending/increasing flexibility impacts this
      • A number of engineers are heavily personally invested in the project and Numba would struggle survive without them, same can be said for Anaconda and the infrastructure resources it supplies.
    • Non-physical aspects of the project also need considering:

      • The Numba community is built around some often observed OSS values (cooperation, collaboration, fairness, elements of meritocracy etc).

      • It is important to preserve these values as it helps maintain and bind the community that built and continues to build Numba.

      • In summary, there are many places Numba can go in the future, but the key is for it to be supported well and to focus on flexibility/extensibility to let other’s build what they want. Numba is already part of the HPC/scientific computing “furniture” and with some effort it can become the same for implementing compilers.

Numba Vision (Anarcho Utilitarianism)

  • A kick-ass Python compiler. (Money, Fame,

  • A tiny Numba core, barebones library, with useful abstractions/contracts/interfaces (Essentialism)

  • A compiler library to build other compilers on top of. (Framework/Structure)

  • A platform for compiler profiling and debugging research. (Creativity)

  • A platform for exploring novel scientific algorithms. (Innovation)

  • An ecosystem of Numba open extensions/add-ons/utilities/libraries for specific use-cases under a common umbrella organization but each with their own specifics (branching structure, code guidelines, documentation etc..) (Freedom)

  • Significant and healthy (maintained) automation that allows continued development of the project even with a rotating skeleton crew. (Resilience)

  • A maximally automated and documented testing and release process using only publically available infrstructure. (Portability/Exit-Strategy)

  • A main branch so stable a release could be cut anytime. (Correctness Stability)

  • A Functioning, sane and maintained performance monitoring / integration service. (Performance Stability)

  • A healthy and productive integration into and collaboration with the greater Python/HPC/Science/Finance/Database ecosystem. (Bonding, Networking)

  • A community owned and governed structure with a clear and enforced manifesto. (Transparency)

  • A well defined development process with clear roles and a pre-defined, unambiguous process for being assigned or assuming such a role. (structure)

  • A minimal but distributed set of people with commit access to limit the blast radius (defensiveness to human error)

  • Clear and transparent communication of current development activities and their impact via a public roadmap (accountability)

  • A clear definition of a "bad actor" and a small crew of indviduals to take proactive steps to protect the community from such bad actors (assertiveness)

  • A project that places the needs and wants of its users and stakeholders (in that order) at the center of the roadmap (maximise utility)

  • An endorsed free marketplace for people who can provide Numba experitise and people who seek Numba expertise can connect, meet and discuss and then decide if they want/need/can collaborate, each on their own terms. (Anarchy)

Clone this wiki locally