Minutes_2023_08_01

Numba Meeting: 2023-08-01

Attendees: Jim Pivarski, Guilherme, Matthew Murray, Siu Kwan Lam, Da Li, Kaustubh, Graham Markall, Todd A. Anderson, Val, Ianna Osborne FPOC (last week): Guilherme FPOC (incoming): Graham

NOTE: All communication is subject to the Numba Code of Conduct.

Please refer to this calendar for the next meeting date.

0. Discussion

0.58rc1 Release
- PRs to go:
  - NumPy 1.25 support merged: https://github.com/numba/numba/pull/9105
  - Some dependent follow ups
- Still aiming to tag next Monday and push out RC artifacts
- No very high risk items
- PRs that may move to the next milestone:
  - Cache invalidation: https://github.com/numba/numba/pull/8396
  - Support GUFunc inside jit: https://github.com/numba/numba/pull/8984
- Sync with Stuart next Monday, aiming not to delay release.
maybe talk about LPython?
- https://lpython.org/blog/2023/07/lpython-novel-fast-retargetable-python-compiler/
- Numba + CUDA interface prototype: https://github.com/lcompilers/lpython/compare/main...gmarkall:lpython:nvvm?expand=1#diff-471d874c2f8ebdebd76e25083cc809cb229060d20d2820c6210bd808d16451eb with small Numba patch: https://github.com/numba/numba/compare/main...gmarkall:numba:lpython?expand=1
- Observations on benchmarks:
  - Numba and LPython same speed on x86 for summation benchmark, but Numba slower on M1
  - Some benchmarks use dicts and lists, which have handwritten LLVM code implementations that may be more optimized for specific cases
  - Conclusion - things to learn for Numba from both these.
- Question: does the LPython compiled code require any external dependencies / shared libs?
  - Potentially rolling in some other LLVM IR to compiled code
- Jim: LPython starts from an AST, planning to investigate how this works with a decorator.
PR 9095: Support dtype keyword in arange_parallel_impl (Todd)
- (Discussion clarifying PR)
PR 9108: Add noalias option to jit decorator (Todd)
- All tests passing except for test_use_of_ir_uknown_loc.
- Question: Why is the test now searching for "Unknown location" in the output?
- Next steps: check with Stuart next week

call cfunc from c++, directly from cfunc or from cfunc.address, has different performance (Da)

avg= 7.13159 ns/call (dummy c++)
avg= 232.824 ns/call (cfunc)
avg= 7.27619 ns/call (casted cfunc)
cfunc:

auto cfunc = py::reinterpret_borrow<py::function>(py_module.attr("add_cfunc"));

address:

typedef int32_t (*c_func)(int32_t, int32_t);
c_func cfunc_address = reinterpret_cast<c_func>(py_module.attr("add_cfunc").attr("address").cast<intptr_t>());

; Function Attrs: nofree norecurse nounwind writeonly
define i32 @_ZN8__main__3addB2v2B44c8tJTIcFHzwl2ILiXkcBV0KBSiNiHkkANTZhEkUQPZoAEdd(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture readnone %excinfo, double %arg.x, double %arg.y) local_unnamed_addr #0 {
entry:
  %.6 = fadd double %arg.x, %arg.y
  store double %.6, double* %retptr, align 8
  ret i32 0
}

; Function Attrs: nofree norecurse nounwind writeonly
define double @cfunc._ZN8__main__3addB2v2B44c8tJTIcFHzwl2ILiXkcBV0KBSiNiHkkANTZhEkUQPZoAEdd(double %.1, double %.2) local_unnamed_addr #0 {
entry:
  %.4 = alloca double, align 8
  store double 0.000000e+00, double* %.4, align 8
  %.8 = call i32 @_ZN8__main__3addB2v2B44c8tJTIcFHzwl2ILiXkcBV0KBSiNiHkkANTZhEkUQPZoAEdd(double* nonnull %.4, { i8*, i32, i8* }** undef, double %.1, double %.2) #2
  %.18 = load double, double* %.4, align 8
  ret double %.18
}

Calling from cfunc calls through Python, whereas cfunc.address is more direct.

New "Ready for Review" PRs

1. New Issues

numba#9091 - Reminder to add deprecation warning for new_style error capture
numba#9092 - Still problematic: non-deterministic NaN values in scipy.integrate.solve_ivp when compiling function with numba #8931
numba#9097 - Dispatch error
numba#9098 - tuple of tuple arguments not allowed in parfor loop
numba#9102 - Towncrier rendering
numba#9103 - Potential memory leak? (Potentially related to Generator type)
numba#9104 - Ommited keyword argument can't be a literal
numba#9107 - No implementation of function Function(<built-in function mul>) found for signature: >>> mul(array(float32, 1d, C), array(float32, 1d, C))
numba#9109 - various seg faults on Debian arm64 with numba 0.57.1

Closed Issues

numba#9110 - ValueError: cannot compute fingerprint of empty list

2. New PRs

numba#9089 - Fix segfault on passing None for args in PythonAPI.call
numba#9090 - Add deprecation notice for new_style error capturing.
numba#9094 - Add support for a 'max' level to NUMBA_OPT environment variable.
numba#9095 - Support dtype keyword in arange_parallel_impl
numba#9099 - Make all parameter names in @overloads match the API being overloaded.
numba#9100 - Add towncrier news snippets for PRs that are missing them.
numba#9101 - Add misc script to find missing towncrier news files
numba#9106 - CUDA: Add overloads generated by specialization to the current dispatcher.
numba#9108 - Add noalias option to jit decorator.
numba#9111 - Fixes ReST syntax error in PR#9099
numba#9112 - Fixups for PR#9100
numba#9113 - Add support for np.diagflat
numba#9114 - update np min to 122

Closed PRs

numba#9093 - Updates the minimum supported NumPy to 1.22.
numba#9096 - Debug azure-ci conda json problem
merged - numba#9105 - NumPy 1.25 support (PR #9011) continued

3. Short-term Roadmap

gantt: https://github.com/numba/numba/issues/8971

Provide feedback

Saved searches

Use saved searches to filter your results more quickly