This repository has been archived by the owner on Nov 3, 2020. It is now read-only.
v0.12.1
- The summation of gradients based on the parameter index in CudaLookup is now deterministic. - Removed the hash table kernel - Replaced the use of the hash table with a pointer to the parameter indices - Rewrote the group sum kernel based on information about the indices of the first occurrence of a parameter and its remaining occurrences - Added a kernel two add up two arrays - Fixed backward propagation in CudaStack by replacing the cuBLAS axpy operation with the use of the addition kernel - The input memory can now store information about duplicate occurrences. - Improved the name of the setters in InputMemory - The optimizer kernels now check if the count is strictly positive. - Moved reusable batch size and output entries members to BaseCudaEntryPoint - Increased the batch size to 16 and changed hyperparameters in the TREC demos with two filter widths. - Mentioned the CUDA TREC demo with two filters in the README
Assets 2
-
2018-01-05T20:42:11Z -
2018-01-05T20:42:11Z -