Minutes_2020_01_14

Numba Meeting: 2020-01-14

Attendees: Stuart, Graham, Aaron, Pearu, Todd, Val

NBEP for GPU memory manager https://github.com/numba/numba/issues/3247#issuecomment-572948356
- Draft is self contained doc
- looking for feedback to see if it matches usage expectations
- Secondary concern to update it with what Numba changes would be required.
- Interface is currently designed to be register and opt-in and it takes over memory management
- Suggestions for other libraries that provide an allocator to test as plugin RAPID RMM, CuPy is on TODO.
- Feedback is great from now PRs or issues are fine.
- put in gitter and on mailing list + numba to tweet
__cuda_array_interface__ https://github.com/numba/numba/issues/4933#issuecomment-572934894 - choices (more than 1 is possible) - a) safe default with implicit sync - b) unsafe default and let user sync - c) new interface to provide a) or b)
- Graham noted that if there's e.g. aynchronous memory management on streams at some point this will have to be addressed.
lint NumPy funcs vs numba overloads (#2558)
- no takers
#4724 parfor DCE in header patch conclusion, do doc, merge?!
- OK to merge
#4615 threadmasking state
- ready for review
#4967 First class function support.
- Siu left feedback
- some issues around generality of type objects resolved
- Should first part land in 0.48. (Q. for Siu)
- Docs for release.
Pearu: Talks about whether struct is of interest.

#5067 - Avoidable loop vectorization failure with float datatype (x3.5 performance penalty)
- maybe related to: https://github.com/numba/numba/issues/2196
- need to check llvm9
**** #5065 - Calling jitted function with *args in a prange fails
- regression
- likely bytecode changes related to tuple
- Todd will take a look
#5064 - overload inlining fails for builtin operators
- new failure
** #5063 - Is there any interest in CFG for native assembly?
- can accept if hightlight/warn about indirect jumps
- make sure its ok cross platform
- Take a look at radare2
- Will it work for CUDA?
#5054 - how can I do this?
#5052 - Factor common infrastructure between CUDA simulator and hardware target implementations
#5051 - error
#5050 - Can @jitclass support numpy.array([string])?
#5045 - Looplift fail on if-branch after for-loop in py3.8
**** #5043 - @overload cannot replace previously defined or built-in implementations
- need policy
- probably ban for safety, lift this later
**** #5042 - Make DummyType type factory for testing
- discuss conditional on API gist
#5041 - Catch the use of global TypedList in JITed functions.

#5066 - Improve max() translation on x86-64
#5057 - Cuda device api: copy_to_host might not cast boolean correctly
#5056 - ValueError: Argument types for wrap_index must match in @njit(parallel=True) when using int32 as index
#5055 - Shared memory not support boolean type
#5038 - Switch off report requesting in error messages for non-lowering problems

#5068 - Remove Python 3.4 backports from utils
#5062 - Update docs for updated version requirements
#5061 - Prevent kernel launch with no configuration, remove autotuner
#5060 - [WIP] enables np.sum for timedelta64
#5059 - Docs: Explain how to use Memcheck with Numba, fixups in CUDA documentation
#5053 - Convert from arrays to names in define() and don't invalidate for multiple consistent defines.
#5047 - Add del's to reductions.
#5044 - R&D thread masking
Actions:
- Share plan of chainsawing
- TODOs for others orthogonal to chainsaw
- Ask Siu about Function type PR going into 0.48
- Stu look at radare2 for asm CFG