Unify normalization layers #3

seanmor5 · 2021-02-23T19:26:08Z

The current API has 4 normalization layers:

Batch Normalization
Instance Normalization
Group Normalization
Layer Normalization

All of these implementations are built on a fundamental formula:

defn normalize(input, mean, variance, gamma, bias, opts \\ []) do
  opts = keyword!(opts, epsilon: 1.0e-6)
  scale =
    variance
    |> Nx.add(opts[:epsilon])
    |> Nx.rsqrt()
    |> Nx.multiply(gamma)

  input
  |> Nx.subtract(mean)
  |> Nx.multiply(scale)
  |> Nx.add(bias)
end

But differ in how the compute the mean and variance across the input:

Batch Normalization - calculated for each individual channel across all samples and spatial dimensions.
- reduction_axes: [:batch, :height, :width, ...]
Instance Normalization - calculated for each individual channel for each individual sample across both spatial dimensions.
- reduction_axes: [:height, :width, ...]
Layer Normalization - calculated for each individual sample across all channels and both spatial dimensions.
- reduction_axes: [:channels, :height, :width, ...]
Group Normalization - calculated across groups of channels and both spatial dimensions for the given group size.
- reduction_axes: [:groups, :height, :width, ...] (after some reshaping to get :groups)

Additionally, some of these layers are stateful (batch/instance norm) and some are stateless (layer/group norm). Stateful normalization layers return the transformed input and a running average mean and variance adjusted with momentum, relying on the state to compute the next iteration of normalization. Stateless normalization layers return just the transformed input.

In order to unify these normalization layers under the lower-level functional API, rather than have individualized functions for each layer we will instead have:

In the layers API:

normalize - see above

In a separate module:

batch_norm_stats(input, ra_mean, ra_var, opts \\ []) - returns {mean, var}
instance_norm_stats(input, ra_mean, ra_var, opts \\ []) - returns {mean, var}
group_norm_stats(input, opts \\ []) - returns {mean, var}
layer_norm_stats(input, opts \\ []) - returns {mean, var}

In a separate module (probably an updates.ex or something that has gradient/parameter transforms):

ema(x, momentum) - returns a scaled x, exponential moving average

I think this limits code reuse and still enables us to easily build these normalization layers into a high level API

The text was updated successfully, but these errors were encountered:

seanmor5 · 2021-03-21T20:37:52Z

Mostly resolved here with shared normalize and mean_and_var functions: 5428ec2

ema will be addressed with other state management.

seanmor5 closed this as completed Mar 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify normalization layers #3

Unify normalization layers #3

seanmor5 commented Feb 23, 2021 •

edited

Loading

seanmor5 commented Mar 21, 2021

Unify normalization layers #3

Unify normalization layers #3

Comments

seanmor5 commented Feb 23, 2021 • edited Loading

seanmor5 commented Mar 21, 2021

seanmor5 commented Feb 23, 2021 •

edited

Loading