- Generalized StrideSpecialization to ValueSpecialization (a simpler version of https://code.google.com/p/jit-value-specialization/)
- Significantly decreased overhead of calling into Parakeet, though still ~500x slower than a normal Python call (#15)
The last release added experimental CUDA support but the performance was terrible. This release includes lots of tweaks and optimizations necessary for getting beneficial speedups on the GPU. However, the default backend remains OpenMP since some program constructs don't work on the GPU and the nvcc compile times are unacceptably slow.
- Expanded and generalized fusion optimization
- Filled in missing methods from shape inference
- Using ShapeElimination on every function (repurposes the shape inference results as a symbolic execution optimization)
- Fixed lots of small bugs in other optimizations exposed by ShapeElimination
- Shaved off small amount of compile time by moving away from Node pseudo-ASTs to regular Python constructors
- It's a bit hackish, but added int24 just as a sentinel for default values in reductions that need to cast up to int32 from bool, int8, int16.
- Eliminate redundant & constant array operator arguments with SpecializeFnArgs
- Added OpenMP backend (runs most map-like computations across multiple threads)
- Stack-allocate representations for all structured types in C
- Disabled Flattening -- tricky transform needs careful audit
- Debugged and enabled CopyElimination
- Fixed negative step in slices
- Added RLock around AST translation to play nice with Python threads (thanks Russell Power)
- Fixed link argument order for building on cygwin in Windows (thanks Yves-Rémi Van Eycke)
- Added support for binding multiple variables in a for loop (i.e. "for (x,(y,z)) in enumerate(zip(ys,zs)):")
- More array constructors support 'dtype' argument
- Lots of little bug fixes and misc. improvements
- Slightly better support for negative indexing but negative step sizes are still mostly broken
- Moved version info into submodule so setup.py can run without full dependencies (thanks rjpower).
- Fixed support for references to global arrays.
- Make C backend respect runtime changes to config flags.
- Got rid of unncessary linking against libpython.