Wrong initialisation gain for SNNs/SELU

## 🐛 Bug

The default initialisation gain for the SELU activation function breaks [SNNs](https://papers.nips.cc/paper/2017/hash/5d44ee6f2c3f71b73125876103c8f6c4-Abstract.html), which explicitly require initial weights to have variance `1 / fan_in` (rather than `0.75 / fan_in`).

## To Reproduce

Steps to reproduce the behaviour:

1. Create multi-layer SNN
2. Use default initialisation
3. Print mean and variance in every layer

```python
from torch import nn

def init(m):
    if isinstance(m, nn.Linear):
        nn.init.kaiming_uniform_(m.weight, nonlinearity='selu')

layers = [30] + [100] * 20 + [10]
snn = nn.Sequential(*(
    nn.Sequential(
        nn.Linear(n_in, n_out, bias=False),
        nn.SELU(),
    ) for n_in, n_out in zip(layers[:-1], layers[1:])
))
snn.apply(init)

x = torch.randn(1000, 30)
for layer in snn:
    x = layer(x)
    print(f"{x.mean().item():.5f}, {x.var().item():.5f}")
```

## Expected behavior

Normalised activations in every layer, i.e. zero mean and unit variance. This can be obtained by using `nonlinearity='linear'`.

## Environment

 - PyTorch Version (e.g., 1.0): 1.8.0
 - OS (e.g., Linux): Linux
 - How you installed PyTorch (`conda`, `pip`, source): conda
 - Build command you used (if compiling from source):
 - Python version: 3.8

## Additional context

This issue was introduced with PR #50664 which aimed to tackle #24991. In this PR, an empirical value for the gain was chosen so that gradients "behave better" (i.e., do not grow or shrink that much). I already proposed to fix this with PR #53694, which would be BC breaking and therefore requires more incentive to be fixed. A discussion as to why it does not make sense to use the empirical value is included in my PR. The goal of this issue is therefore to provide the incentive to fix this bug by collecting upvotes for this issue.

TL;DR: Trying to collect upvotes to get this bug fixed


cc @albanD @mruberry @jbschlosser

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong initialisation gain for SNNs/SELU #54055

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrong initialisation gain for SNNs/SELU #54055

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions