-
Couldn't load subscription status.
- Fork 25.7k
Description
Summary
We propose the introduction of a fixed scaling sigmoid activation function to be called LELU — the Logistic Error Linear Unit. LELU serves as a lower-cost, analytically consistent alternative to GELU that is derived from the fixed-point correspondence between the logistic sigmoid and its associated cumulative distribution function (CDF) and probability density function (PDF).
Motivation
The Gaussian Error Linear Unit (GELU) is designed around the Gaussian CDF, offering smoother activation transitions than ReLU or ELU. However, GELU requires relatively costly operations (erf/tanh + cubic terms) and is only an approximation of the true Gaussian gate.
In contrast, LELU is grounded in the Logistic CDF which is analytically simpler, fully differentiable, and exhibits similar curvature to the Gaussian but with computational advantages. The logistic sigmoid family offers a natural match to many normalized training distributions observed in deep networks.
Derivation Overview
LELU is defined analogously to GELU:
LELU(x) = x * 0.5 * (1 + tanh(a * (x)))
where a = π / (2√3) arises from the equivalence between the variance of the standard logistic distribution and that of the standard normal distribution.
Proposal
Add torch.nn.LELU and torch.nn.functional.lelu
Documentation and tests can mirror the structure of torch.nn.GELU, with additional notes on logistic variance equivalence and performance observations.
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
class LELU(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x: torch.Tensor) -> torch.Tensor:
return x * torch.sigmoid((math.pi / math.sqrt(3.0) )* x)cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki