Skip to content
Valentin Haenel edited this page Aug 11, 2020 · 1 revision

Numba Meeting: 2020-08-11

Attendees: Siu, Sergey M, Andreas, Dipto D, Ehsan, Eric W, Frank S, Graham M, Hameer A, Hannes P, Ivan B, Juan G, Luk F-A, Matti P, Nick R, Pearu P, Sahil, Sergey P, Todd A, Val, Stu, Alex, Mike W, Mathieu, Reazul H, Alexander, Keith K.

Presentation by Dipto from Intel.

  • Slides: https://drive.google.com/file/d/1YUFRy8FqLLBajqbTUuEXzYIYwXU67BFL/view?usp=sharing
  • PyDPPL - wrapper library around SYCL and OpenCL (primarily SYCL)
    • Predominant idea is to act as an interop-layer across the Python stack, e.g. NumPy, DAAL, Numba
    • All libraries can end up sharing the same buffers, USM, queues
    • Q: Hameer: with statements are notoriously hard for thread-safety
      • A: Dipto: Won't inherit queue/resources across threads.
    • Q: Eric: with statements need to be TLS and context local too (there's a PEP 567) (Filed https://github.com/IntelPython/pydppl/issues/11)
      • A: Dipto: Will check it out.
    • Q: Ehsan: Numba function will run on accelerator
      • A: Dipto: Will be answered in the next few slides
    • Q: Ehsan: Will passes be modular?
      • A: Dipto: At present, separate pass.
    • Q: Ehsan: possibility of adding option like dppl=True/gpu to the @jit context?
      • A: Dipto: DIY wrapper should be easy to create, it will contain the context.
    • Q: Ehsan: concerns over (lack of) fine grained control and integrating it in an existing compiler pipeline.
      • A: Dipto: Bit like parallel=True, reliance on the optimiser to do the right thing.
    • Q: Mike W: how to arrange computations on the GPU esp with respect to e.g. branch divergence.
      • A: Dipto: At present not so much fine grained control. Bulk of efforts once work finalised will go into this. Reliance on driver/compiler to do this (igc).
    • Q: Hameer: User can put with statement at top of program, but library author could put parallel=False in, which should win? Hameer thinks library should win over user control.
      • A: Dipto: Useful example for consideration, please add to discourse discussion.
    • Q: Hameer: with statement makes it quite hard to mix CPU and GPU code, have you thought about that?
      • A: Dipto: kernels are synchronous at present, have to undo that to permit mixing contexts.
    • Q: Hameer: Will this be allowed inside a @njit'ed function?
      • A: Dipto: Initially not permitted. Harder example... with context inside a prange loop!? To start with just ban it. May change once more understanding is developed about scenarios.
      • A: Todd A: Multi-GPU case is another reason why the with-context is a good idea. e.g. CPU prange dispatch to a number of GPUs
      • A: Dipto: Needs more design time on Intel's side. Please post usecases on discourse!
      • A: Hameer: Main case was once inside a function hard to switch contexts.
      • A: Todd: there's cases where both make sense/don't make sense, more work needed.
    • Q: Siu: If we want to try Numba with DPPL what sort of hardware/OS combination is the best option.
      • A: Dipto: Works only on linux right now, windows on it's way (week or so!). Gen 9 Intel GPUs (integrated graphics) should be fine on latest CPUs. Also dependency on OneAPI beta 8 being installed. Intel Python has this as part of it's stack. OpenCL CPU driver also required.
    • Q: Hameer: Will the code work on other GPUs or is it just Intel hardware?
      • A: Dipto: Should be platform agnostic but starting with Intel hardware. CUDA support in DPC++ as one option or add CUDA support to PyDPPL.
    • Q: Hameer: Could someone else write a SYCL compiler and use that in the same infrastructure?
      • A: Dipto: TBD. USM is a DPC++ extension for example. Extension support level will somewhat determine this (largely USM at present). LLVM SYCL compiler is a long term ideal.

0. Feature Discussion

  • 0.51.0rc1

  • New bug labels:

    • ice, miscompile, incorrect behavior, segfault, failure to compile

1. New Issues

  • #6104 - Numba can't properly match ListType of Arrays in function signature
  • #6102 - parallel function compiles with v0.50.1 and fails with v0.51.0rc1
  • #6100 - Incorrect results on skylake with AVX512 and icc_rt=2018.0.2
  • #6095 - numpy max for arrays of several dimensions not implemented for parallelized code
  • **** #6094 - Numba 0.51: avoid subclassing NamedTuple in LiteralStrKeyDict
  • #6093 - Invalid cache replay from function defined in closure capturing another function
    • patch: 6097
  • #6091 - Applying numba.typed.List to a nested Python list doesn't result in a nested typed-list
    • need to stop segfault?
    • the full fix again leads to the reflected x typed list problem
  • #6088 - NameError from _unlit_non_poison
    • should unban flake8 on it to prevent future mistake
  • #6085 - LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
  • #6077 - StructRef initialization SIGABRT while using types.deferred_type
    • probably a problem of ordering of LLVM API
  • #6103 - Unable to pass a local memory array to a device function?

Closed Issues

  • #6105 - What's the input of Numba
  • #6098 - A problem has occurred in Numba's internals
  • #6082 - list of list (nested build_list) fails on master
  • #6079 - numba cuda how to lock code block
  • #6076 - Shared memory persists between kernel launches
  • #6069 - Test failure with Numba master and Numpy 1.19

2. New PRs

  • #6101 - Restrict lower limit of icc_rt version due to assumed SVML bug.
  • #6099 - Restrict upper limit of TBB version due to ABI changes.
  • #6097 - Add function code and closure bytes into cache key
  • #6096 - [WIP] remove deprecated tbb::task_scheduler_init, use new api
  • #6092 - CUDA: Add mapped_array_like and pinned_array_like
  • #6090 - doc: Add doc on direct creation of Numba typed-list
  • #6089 - Fix invalid reference to TypingError
  • #6087 - remove invalid sanity check from randrange tests
  • #6086 - Add more accessible version information
  • #6075 - add np.float_power and np.cbrt
  • #6074 - Add support for math.isclose() and numpy.isclose()

Closed PRs

  • #6084 - Update CHANGE_LOG for 0.51.0
  • #6083 - Fix bug in initial value unify.
  • #6081 - Fix issue with cross drive use and relpath.
  • #6080 - CUDA: Prevent auto-upgrade of atomic intrinsics
  • #6078 - Duplicate NumPy's PyArray_DescrCheck macro
  • #6073 - Fixes invalid C prototype in helper function.
  • #6072 - Fix for #6005
  • #6071 - Remove f-strings in setup.py
  • #6070 - Fix overspecialized containers
  • #6068 - Add unliteral to despecialize containers with initial_value

3. Next Release: Version 0.51.0, RC=5 Aug, Final 12 Aug?

  • Requests for 0.51

  • high risk stuff for 0.51.

  • 0.51 potential tasks (To be updated)

4. Upcoming tasks

Clone this wiki locally