An implementation of Neural Arithmetic Logic Units (https://arxiv.org/pdf/1808.00508.pdf)
Train an identity mapping on [-5, 5] and test it on [-20, 20]
python3 failure.py
- Most non linear activation functions fail to exprapolate except that PReLU can learn to be highly linear.
Input a 100-dimensional vertex x, learn y = func(a, b)
,
where
, and func = +, -, x, /, ...
. Test the ability to interpolate and extrapolate.
python3 learn_function.py
- RMSE (normalized to a random baseline)
ReLU | Sigmoid | NAC | NALU | |
---|---|---|---|---|
a + b | 0.00 | 0.11 | 0.00 | 0.01 |
a - b | 0.16 | 0.85 | 0.00 | 0.12 |
a x b | 1.86 | 1.21 | 13.42 | 0.00 |
a / b | 0.88 | 0.12 | 1.89 | 0.01 |
a ^ 2 | 3.56 | 0.32 | 20.56 | 0.00 |
sqrt(a) | 0.60 | 0.14 | 2.56 | 0.02 |
- RMSE (normalized to a random baseline)
ReLU | Sigmoid | NAC | NALU | |
---|---|---|---|---|
a + b | 0.00 | 62.55 | 0.00 | 0.42 |
a - b | 59.23 | 60.64 | 0.00 | 0.43 |
a x b | 57.13 | 88.27 | 75.73 | 0.00 |
a / b | 3.07 | 1.32 | 23.82 | 0.36 |
a ^ 2 | 57.99 | 81.51 | 76.48 | 0.00 |
sqrt(a) | 16.58 | 18.08 | 63.17 | 0.17 |
- Use two separate weight matrices for the adder and the multiplier
- The gate is independent of the input
See nalu.py for more details. I found these modifications can help the performance on the static simple function learning task.