thing is largely a work in progress
Explaination and motivation: https://arxiv.org/abs/1611.01144
There has recently been a trick that allows train networks with quasi-discrete categorical activations via gumbel-softmax or gumbel-sigmoid nonlinearity. A great explaination of how it works can be found here.
The trick is to add a special noize to the softmax distribution that favors almost-1-hot outcomes. Such noize can be obtained from gumbel distribution. Since sigmoid can be viewed as a special case of softmax of 2 classes(logit and 0), we can use the same technique to implement an LSTM network with gates that will ultimately be forced to converge to 0 or 1. Here's a demo of gumbel-sigmoid on a toy task.
Such network can then be binarized: multiplication can be replaced with if/else operations and fp16 operations to drastically improve execution speed, especially when implemented in a special-purpose device, see here and here.
- Lambda Lab
- Arseniy Ashukha (advice & useful comments)
- hellolzc
- Python 3
- PyTorch
- numpy
- scikit-learn