Skip to content
This repository has been archived by the owner on Nov 3, 2020. It is now read-only.

v0.12.1

@aisummary aisummary tagged this 05 Jan 20:42
- The summation of gradients based on the parameter index in CudaLookup is now deterministic.
- Removed the hash table kernel
- Replaced the use of the hash table with a pointer to the parameter indices
- Rewrote the group sum kernel based on information about the indices of the first occurrence of a parameter and its remaining occurrences
- Added a kernel two add up two arrays
- Fixed backward propagation in CudaStack by replacing the cuBLAS axpy operation with the use of the addition kernel
- The input memory can now store information about duplicate occurrences.
- Improved the name of the setters in InputMemory
- The optimizer kernels now check if the count is strictly positive.
- Moved reusable batch size and output entries members to BaseCudaEntryPoint
- Increased the batch size to 16 and changed hyperparameters in the TREC demos with two filter widths.
- Mentioned the CUDA TREC demo with two filters in the README
Assets 2