Numba Vision Discussion
NOTE: will be a living document, so needs updates as what to changed.
-
lower the barrier to performance for Python devs
-
Minimal core package
-
a compiler toolkit for python
- craete new Backends (HW and Synthetic)
- extensible for many uses-cases (Users need to accept that they will need to do some work)
- flexible enough to be extended
- Supporting both NumPy and Python
- covering large spectrum of usecase
-
Comprehensive coverage of Python basics
-
How can we make it easier for people to contribute?
-
Increase transparency: Roadmap/List of actively developed areas, queue of high-priority tasks
-
Releases continue to be produced consistently and the codebase remains as stable as possible
-
Well defined governance document with well established process
- Simple and intuitive errors
- Production qualify code base
- So new contributors can be effective earlier
- Compiler development features, e.g. Pipelines, Pass Managers etc..
- Other needs related to the compiler toolkit
- profiler
- coverage tools
- debugger --- cross hw, cross language (Python interpreted code/JIT compiled Pyython/C)
- numba-scipy
- numba-extras
- Numba needs to be faster at compiling (mindful of not being slow)
- Other languages and libraries, e.g. C++
- Better AOT story
- Caching
- Structured Exceptions
- Make sure Numba is stable and receievs updates for it's core dependencies: LLVM, NumPy (NEP 29), Python
- A Functioning, sane and maintained performance monitoring / integration service. (Performance Stability)
- split NumPy into an extension (maybe)
- transition to MLIR
- Scientific
- Numerical simulation
- Compiler research
- Hardware
- Bootstrap a compiler for new hardware
- Meritocracy
- Transparency
- Reliability
- Consistency
I had this trifecta: you'd like (1) to get close to hardware speed, so ~100× faster than Python, (2) to support a whole programming language, and (3) have that language be Python. Numba does (1) and (3), PyPy does (2) and (3), Julia does (1) and (2).
That pigeonholes Numba's role: it's (1) and (3), which is a specific range of uses. (E.g. yes number-crunching, no web-serving.)
- It needs to be sufficiently extensible to cover the spectrum of use cases, including but not limited to:
- Typical users, just wanting to @jit some numerical functions (many many users, the most common use case)
- Those providing libraries for domain specific use (e.g. researchers - TARDIS-SN)
- Those providing libraries for use in scientific computing as part of the numerical python scaffold (e.g. pydata-sparse)
- Those writing more advanced libraries containing their own data types etc (e.g. AwkwardArray)
- Compiler extenders wanting to write and explore new compiler use cases/needing a custom compiler (e.g. Bodo, omnisci-db)
- Hardware vendors wanting to extend Numba support to their custom silicon (e.g. NVIDIA, Intel)
- Bodo
- Uses Numba as part of a high-performance data science platform.
- Numba is a very critical component: 1) JIT is the main user interface; 2) Numba is used as a compiler toolkit and for implementing kernels (in addition to C++)
- Luk: Julia is closest to Numba. Questions why Numba cannot do what Julia can.
- Need to consider that Numba the project is different to Numba the code base.
-
The project itself needs long term commitment and funding from industry:
- The time it takes to get someone up to speed with the code base is large, particularly to the point where they can exercise judgement about the impact of a proposed change. Casual contributors are unlikely to be in a position to be able to assess most involved changes, this is similar to LLVM.
- The maintenance burden is high and get’s higher:
- Python/NumPy/LLVM impact this
- New features impact this
- Extending/increasing flexibility impacts this
- A number of engineers are heavily personally invested in the project and Numba would struggle survive without them, same can be said for Anaconda and the infrastructure resources it supplies.
-
Non-physical aspects of the project also need considering:
-
The Numba community is built around some often observed OSS values (cooperation, collaboration, fairness, elements of meritocracy etc).
-
It is important to preserve these values as it helps maintain and bind the community that built and continues to build Numba.
-
In summary, there are many places Numba can go in the future, but the key is for it to be supported well and to focus on flexibility/extensibility to let other’s build what they want. Numba is already part of the HPC/scientific computing “furniture” and with some effort it can become the same for implementing compilers.
-
-
-
A kick-ass Python compiler. (Money, Fame,
-
A tiny Numba core, barebones library, with useful abstractions/contracts/interfaces (Essentialism)
-
A compiler library to build other compilers on top of. (Framework/Structure)
-
A platform for compiler profiling and debugging research. (Creativity)
-
A platform for exploring novel scientific algorithms. (Innovation)
-
An ecosystem of Numba open extensions/add-ons/utilities/libraries for specific use-cases under a common umbrella organization but each with their own specifics (branching structure, code guidelines, documentation etc..) (Freedom)
-
Significant and healthy (maintained) automation that allows continued development of the project even with a rotating skeleton crew. (Resilience)
-
A maximally automated and documented testing and release process using only publically available infrstructure. (Portability/Exit-Strategy)
-
A main branch so stable a release could be cut anytime. (Correctness Stability)
-
A Functioning, sane and maintained performance monitoring / integration service. (Performance Stability)
-
A healthy and productive integration into and collaboration with the greater Python/HPC/Science/Finance/Database ecosystem. (Bonding, Networking)
-
A community owned and governed structure with a clear and enforced manifesto. (Transparency)
-
A well defined development process with clear roles and a pre-defined, unambiguous process for being assigned or assuming such a role. (structure)
-
A minimal but distributed set of people with commit access to limit the blast radius (defensiveness to human error)
-
Clear and transparent communication of current development activities and their impact via a public roadmap (accountability)
-
A clear definition of a "bad actor" and a small crew of indviduals to take proactive steps to protect the community from such bad actors (assertiveness)
-
A project that places the needs and wants of its users and stakeholders (in that order) at the center of the roadmap (maximise utility)
-
An endorsed free marketplace for people who can provide Numba experitise and people who seek Numba expertise can connect, meet and discuss and then decide if they want/need/can collaborate, each on their own terms. (Anarchy)