Skip to content

gmarkall/advent-of-numba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advent of Numba

Solutions to Advent of Code using Numba. Some notes on the solutions:

  • Numba 0.52 is required to run the solutions.
    • I may use features that only appear in the master branch in later days.
  • I will try to use CUDA for each solution.
  • Most solutions will be the easiest for me to write.
    • This might mean a lot of brute force, due to the capabilities of a GPU and the low effort needed to invent brute force solutions.
    • The solutions will probably not be close to performance-optimal.
  • I will try to demonstrate something "interesting" about the CUDA target for each solution. E.g.:
    • Atomic operations
    • Cooperative grids
    • etc.
  • I will try to annotate each solution to explain to a beginner the rationale behind the implementation.
  • I will probably fall a few days behind.
  • I am not optimistic about finishing all 24 days.

Please direct comments / questions / criticisms / veneration to: @gmarkall.

Solutions

Links to solutions and some interesting features of them:

  • Day 1: 2D / 3D grids, atomic exchange for stores.
  • Day 2: Atomic increment, structured arrays.
  • Day 3: Building reduction kernels with @cuda.reduce, host to device transfers to elide unnecessary copying
  • Day 4: I didn't finish doing this on the GPU.
  • Day 5: Cooperative Groups (grid group / grid sync) and device functions.
  • Day 6: Python solution only so far. Will need to re-visit to complete a CUDA implementation - should be doable, but I'm under time constraints.
  • Day 7: Python solution only so far. Will probably not do a CUDA implementation of this one as it doesn't easily map to a GPU.
  • Day 8: Sharing the core computation of an implementation on both the CPU and GPU targets by calling an @njit function from a @cuda.jit function.
  • Day 9: Demonstrates some changes and workarounds needed when porting a pure Python code to the CUDA target - e.g. involving lists, array slicing, array functions.

Other approaches

Notes

I'm using this section to collect thoughts I have whilst working on solutions about improving the usability and accessibility of Numba and the CUDA target.

Nice-to-haves:

  • Ability to call atomic inc without specifying a maximum (e.g. cuda.atomic.max(arr, idx) (day 2).
  • The ability to return things from kernels (every day).
    • Kernel launches are asynchronous, so this could return a future.
    • Alternatively, allow an optional blocking launch to directly return the result.
  • A library of small sort functions (day 4).
    • E.g. a function for a block to cooperate sorting a small array,
    • A whole-grid sort for larger arrays,
    • etc.
  • Better string op support (day 2).
    • E.g. allow passing strings or arrays of bytes to kernels.
    • Lots of lowering of string operations missing in CUDA (but probably present for nopython mode).
  • Support for a better print, for "prinf debugging" (all days)
    • There is a printf-like function somewhere (in libdevice?) that can format strings that could be used.

About

Solutions to Advent of Code 2020 using Numba and CUDA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages