Fetching contributors…
Cannot retrieve contributors at this time
25 lines (20 sloc) 853 Bytes
- fill in ParFor.read_only/write_only to avoid extra copying to/from the GPU
- Indexing by boolean masks
- LShift/RShift prims
- Short-Circuit Or/And expressions
- Multiple comparisons chained with short-circuit And
- Coarse parallelism for groups of IndexReduce/IndexScan results
- Fine grained tree-structured parallelism for IndexReduce/IndexScan inside CUDA kernels
- Support 'output' parameter of ufuncs
- PreallocArrays to move locally used allocation of arrays out functions into their calling scope
- Garbage collection (or, at least, statically inferred deallocations)
On pause:
- Adverb semantics for conv
- Code generation for conv
Maybe never?
- Adverb-level vectorization
- Revive the LLVM backend but using the Python C API at the extension boundary, use as default for Windows
- Run tiling on perfectly nested code