Minutes_2020_08_11

Numba Meeting: 2020-08-11

Attendees: Siu, Sergey M, Andreas, Dipto D, Ehsan, Eric W, Frank S, Graham M, Hameer A, Hannes P, Ivan B, Juan G, Luk F-A, Matti P, Nick R, Pearu P, Sahil, Sergey P, Todd A, Val, Stu, Alex, Mike W, Mathieu, Reazul H, Alexander, Keith K.

Presentation by Dipto from Intel.

Slides: https://drive.google.com/file/d/1YUFRy8FqLLBajqbTUuEXzYIYwXU67BFL/view?usp=sharing
PyDPPL - wrapper library around SYCL and OpenCL (primarily SYCL)
- Predominant idea is to act as an interop-layer across the Python stack, e.g. NumPy, DAAL, Numba
- All libraries can end up sharing the same buffers, USM, queues
- Q: Hameer: with statements are notoriously hard for thread-safety
  - A: Dipto: Won't inherit queue/resources across threads.
- Q: Eric: with statements need to be TLS and context local too (there's a PEP 567) (Filed https://github.com/IntelPython/pydppl/issues/11)
  - A: Dipto: Will check it out.
- Q: Ehsan: Numba function will run on accelerator
  - A: Dipto: Will be answered in the next few slides
- Q: Ehsan: Will passes be modular?
  - A: Dipto: At present, separate pass.
- Q: Ehsan: possibility of adding option like dppl=True/gpu to the @jit context?
  - A: Dipto: DIY wrapper should be easy to create, it will contain the context.
- Q: Ehsan: concerns over (lack of) fine grained control and integrating it in an existing compiler pipeline.
  - A: Dipto: Bit like parallel=True, reliance on the optimiser to do the right thing.
- Q: Mike W: how to arrange computations on the GPU esp with respect to e.g. branch divergence.
  - A: Dipto: At present not so much fine grained control. Bulk of efforts once work finalised will go into this. Reliance on driver/compiler to do this (igc).
- Q: Hameer: User can put with statement at top of program, but library author could put parallel=False in, which should win? Hameer thinks library should win over user control.
  - A: Dipto: Useful example for consideration, please add to discourse discussion.
- Q: Hameer: with statement makes it quite hard to mix CPU and GPU code, have you thought about that?
  - A: Dipto: kernels are synchronous at present, have to undo that to permit mixing contexts.
- Q: Hameer: Will this be allowed inside a @njit'ed function?
  - A: Dipto: Initially not permitted. Harder example... with context inside a prange loop!? To start with just ban it. May change once more understanding is developed about scenarios.
  - A: Todd A: Multi-GPU case is another reason why the with-context is a good idea. e.g. CPU prange dispatch to a number of GPUs
  - A: Dipto: Needs more design time on Intel's side. Please post usecases on discourse!
  - A: Hameer: Main case was once inside a function hard to switch contexts.
  - A: Todd: there's cases where both make sense/don't make sense, more work needed.
- Q: Siu: If we want to try Numba with DPPL what sort of hardware/OS combination is the best option.
  - A: Dipto: Works only on linux right now, windows on it's way (week or so!). Gen 9 Intel GPUs (integrated graphics) should be fine on latest CPUs. Also dependency on OneAPI beta 8 being installed. Intel Python has this as part of it's stack. OpenCL CPU driver also required.
- Q: Hameer: Will the code work on other GPUs or is it just Intel hardware?
  - A: Dipto: Should be platform agnostic but starting with Intel hardware. CUDA support in DPC++ as one option or add CUDA support to PyDPPL.
- Q: Hameer: Could someone else write a SYCL compiler and use that in the same infrastructure?
  - A: Dipto: TBD. USM is a DPC++ extension for example. Extension support level will somewhat determine this (largely USM at present). LLVM SYCL compiler is a long term ideal.

0. Feature Discussion

0.51.0rc1
New bug labels:
- ice, miscompile, incorrect behavior, segfault, failure to compile

1. New Issues

#6104 - Numba can't properly match ListType of Arrays in function signature
#6102 - parallel function compiles with v0.50.1 and fails with v0.51.0rc1
#6100 - Incorrect results on skylake with AVX512 and icc_rt=2018.0.2
#6095 - numpy max for arrays of several dimensions not implemented for parallelized code
**** #6094 - Numba 0.51: avoid subclassing NamedTuple in LiteralStrKeyDict
#6093 - Invalid cache replay from function defined in closure capturing another function
- patch: 6097
#6091 - Applying numba.typed.List to a nested Python list doesn't result in a nested typed-list
- need to stop segfault?
- the full fix again leads to the reflected x typed list problem
#6088 - NameError from _unlit_non_poison
- should unban flake8 on it to prevent future mistake
#6085 - LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
#6077 - StructRef initialization SIGABRT while using types.deferred_type
- probably a problem of ordering of LLVM API
#6103 - Unable to pass a local memory array to a device function?

Closed Issues

#6105 - What's the input of Numba
#6098 - A problem has occurred in Numba's internals
#6082 - list of list (nested build_list) fails on master
#6079 - numba cuda how to lock code block
#6076 - Shared memory persists between kernel launches
#6069 - Test failure with Numba master and Numpy 1.19

2. New PRs

#6101 - Restrict lower limit of icc_rt version due to assumed SVML bug.
#6099 - Restrict upper limit of TBB version due to ABI changes.
#6097 - Add function code and closure bytes into cache key
#6096 - [WIP] remove deprecated tbb::task_scheduler_init, use new api
#6092 - CUDA: Add mapped_array_like and pinned_array_like
#6090 - doc: Add doc on direct creation of Numba typed-list
#6089 - Fix invalid reference to TypingError
#6087 - remove invalid sanity check from randrange tests
#6086 - Add more accessible version information
#6075 - add np.float_power and np.cbrt
#6074 - Add support for math.isclose() and numpy.isclose()

Closed PRs

#6084 - Update CHANGE_LOG for 0.51.0
#6083 - Fix bug in initial value unify.
#6081 - Fix issue with cross drive use and relpath.
#6080 - CUDA: Prevent auto-upgrade of atomic intrinsics
#6078 - Duplicate NumPy's PyArray_DescrCheck macro
#6073 - Fixes invalid C prototype in helper function.
#6072 - Fix for #6005
#6071 - Remove f-strings in setup.py
#6070 - Fix overspecialized containers
#6068 - Add unliteral to despecialize containers with initial_value

3. Next Release: Version 0.51.0, RC=5 Aug, Final 12 Aug?

Requests for 0.51
high risk stuff for 0.51.
0.51 potential tasks (To be updated)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly