# Noisy networks

In the Noisy Networks paper, the authors proposed a quite simple solution that,
nevertheless, works well. They add noise to the weights of fully connected layers
of the network and adjust the parameters of this noise during training using
backpropagation.

The authors proposed two ways of adding the noise, both of which work according
to their experiments, but they have different computational overheads:
1. Independent Gaussian noise: for every weight in a fully connected layer, we
have a random value that we draw from the normal distribution. Parameters
of the noise, 𝜇𝜇 and 𝜎𝜎 , are stored inside the layer and get trained using
backpropagation in the same way that we train weights of the standard
linear layer. The output of such a "noisy layer" is calculated in the same way
as in a linear layer.
2. Factorized Gaussian noise: to minimize the number of random values to
be sampled, the authors proposed keeping only two random vectors: one
with the size of the input and another with the size of the output of the
layer. Then, a random matrix for the layer is created by calculating the outer
product of the vectors.

In PyTorch, both methods can be easily implemented in a very straightforward
way. What we need to do is create our own nn.Linear layer equivalent with
the additional random values sampled every time forward() gets called.

I've implemented both noisy layers, and their implementations are in Chapter08/
lib/dqn_extra.py in classes NoisyLinear (for independent Gaussian noise)
and NoisyFactorizedLinear (for the factorized noise variant).

``` python
class NoisyLinear(nn.Linear):
    def __init__(self, in_features, out_features,
    sigma_init=0.017, bias=True):
    super(NoisyLinear, self).__init__(in_features, out_features, bias=bias)
    w = torch.full((out_features, in_features), sigma_init)
    self.sigma_weight = nn.Parameter(w)
    z = torch.zeros(out_features, in_features)
    self.register_buffer("epsilon_weight", z)
    if bias:
        w = torch.full((out_features,), sigma_init)
        self.sigma_bias = nn.Parameter(w)
        z = torch.zeros(out_features)
        self.register_buffer("epsilon_bias", z)
    self.reset_parameters()
    
    def reset_parameters(self):
        std = math.sqrt(3 / self.in_features)
        self.weight.data.uniform_(-std, std)
        self.bias.data.uniform_(-std, std)

    def forward(self, input):
        self.epsilon_weight.normal_()
        bias = self.bias
        if bias is not None:
            self.epsilon_bias.normal_()
            bias = bias + self.sigma_bias * \
                   self.epsilon_bias.data
        v = self.sigma_weight * self.epsilon_weight.data + \
            self.weight
        return F.linear(input, v, bias)
```

In the constructor, we create a matrix for 𝜎. (Values of 𝜇 will be stored in a matrix
inherited from nn.Linear.) To make sigmas trainable, we need to wrap the tensor
in an nn.Parameter.

The register_buffer method creates a tensor in the network that won't be
updated during backpropagation, but will be handled by the nn.Module machinery
(for example, it will be copied to GPU with the cuda() call). An extra parameter
and buffer are created for the bias of the layer. The initial value for sigmas (0.017)
was taken from the Noisy Networks article cited in the beginning of this section. At
the end, we call the reset_parameters() method, which was overridden from
nn.Linear and is supposed to perform the initialization of the layer.

In the reset_parameters method, we perform initialization of the nn.Linear
weight and bias according to the recommendations in the article.

In the forward() method, we sample random noise in both weight and bias
buffers, and perform linear transformation of the input data in the same way
that nn.Linear does.
Factorized Gaussian noise works in a similar way and I haven't found much
difference in the results, so I'll just include its code here for completeness.

``` python
class NoisyFactorizedLinear(nn.Linear):
    """
    NoisyNet layer with factorized gaussian noise

    N.B. nn.Linear already initializes weight and bias to
    """
    def __init__(self, in_features, out_features,
                 sigma_zero=0.4, bias=True):
        super(NoisyFactorizedLinear, self).__init__(
            in_features, out_features, bias=bias)
        sigma_init = sigma_zero / math.sqrt(in_features)
        w = torch.full((out_features, in_features), sigma_init)
        self.sigma_weight = nn.Parameter(w)
        z1 = torch.zeros(1, in_features)
        self.register_buffer("epsilon_input", z1)
        z2 = torch.zeros(out_features, 1)
        self.register_buffer("epsilon_output", z2)
        if bias:
            w = torch.full((out_features,), sigma_init)
            self.sigma_bias = nn.Parameter(w)

    def forward(self, input):
        self.epsilon_input.normal_()
        self.epsilon_output.normal_()

        func = lambda x: torch.sign(x) * \
                         torch.sqrt(torch.abs(x))
        eps_in = func(self.epsilon_input.data)
        eps_out = func(self.epsilon_output.data)

        bias = self.bias
        if bias is not None:
            bias = bias + self.sigma_bias * eps_out.t()
        noise_v = torch.mul(eps_in, eps_out)
        v = self.weight + self.sigma_weight * noise_v
        return F.linear(input, v, bias)

```

From the implementation point of view, that's it. What we now need to do to
turn the classic DQN into a noisy network variant is just replace nn.Linear
(which are the two last layers in our DQN network) with the NoisyLinear layer
(or NoisyFactorizedLinear if you wish). Of course, you have to remove all the
code related to the epsilon-greedy strategy.

To check the internal noise level during training, we can monitor the signal-tonoise
ratio (SNR) of our noisy layers, which is a ratio of RMS(𝜇) / RMS(𝜎), where
RMS is the root mean square of the corresponding weights. In our case, SNR shows
how many times the stationary component of the noisy layer is larger than the
injected noise.

In [1]:
import sys
sys.path.append("../Chapter08/")

In [2]:
import gym
import ptan
import argparse
import random

import torch
import torch.optim as optim

from ignite.engine import Engine

from lib import common, dqn_extra

NAME = "04_noisy"
NOISY_SNR_EVERY_ITERS = 100



random.seed(common.SEED)
torch.manual_seed(common.SEED)
params = common.HYPERPARAMS['pong']

device = torch.device("cuda")

env = gym.make(params.env_name)
env = ptan.common.wrappers.wrap_dqn(env)
env.seed(common.SEED)

net = dqn_extra.NoisyDQN(env.observation_space.shape, env.action_space.n).to(device)

tgt_net = ptan.agent.TargetNet(net)
selector = ptan.actions.ArgmaxActionSelector()
agent = ptan.agent.DQNAgent(net, selector, device=device)

exp_source = ptan.experience.ExperienceSourceFirstLast(
    env, agent, gamma=params.gamma)
buffer = ptan.experience.ExperienceReplayBuffer(
    exp_source, buffer_size=params.replay_size)
optimizer = optim.Adam(net.parameters(), lr=params.learning_rate)

def process_batch(engine, batch):
    optimizer.zero_grad()
    loss_v = common.calc_loss_dqn(batch, net, tgt_net.target_model,
                                  gamma=params.gamma, device=device)
    loss_v.backward()
    optimizer.step()
    if engine.state.iteration % params.target_net_sync == 0:
        tgt_net.sync()
    if engine.state.iteration % NOISY_SNR_EVERY_ITERS == 0:
        for layer_idx, sigma_l2 in enumerate(net.noisy_layers_sigma_snr()):
            engine.state.metrics[f'snr_{layer_idx+1}'] = sigma_l2
    return {
        "loss": loss_v.item(),
    }

engine = Engine(process_batch)
common.setup_ignite(engine, params, exp_source, NAME, extra_metrics=('snr_1', 'snr_2'))
engine.run(common.batch_generator(buffer, params.replay_initial, params.batch_size))

Episode 1: reward=-21, steps=758, speed=0.0 f/s, elapsed=0:00:34
Episode 2: reward=-21, steps=762, speed=0.0 f/s, elapsed=0:00:34
Episode 3: reward=-21, steps=757, speed=0.0 f/s, elapsed=0:00:34
Episode 4: reward=-21, steps=820, speed=0.0 f/s, elapsed=0:00:34
Episode 5: reward=-21, steps=759, speed=0.0 f/s, elapsed=0:00:34
Episode 6: reward=-20, steps=840, speed=0.0 f/s, elapsed=0:00:34
Episode 7: reward=-21, steps=818, speed=0.0 f/s, elapsed=0:00:34
Episode 8: reward=-21, steps=761, speed=0.0 f/s, elapsed=0:00:34
Episode 9: reward=-21, steps=816, speed=0.0 f/s, elapsed=0:00:34
Episode 10: reward=-21, steps=816, speed=0.0 f/s, elapsed=0:00:34
Episode 11: reward=-21, steps=760, speed=0.0 f/s, elapsed=0:00:34
Episode 12: reward=-21, steps=760, speed=0.0 f/s, elapsed=0:00:34
Episode 13: reward=-21, steps=818, speed=48.7 f/s, elapsed=0:00:39
Episode 14: reward=-21, steps=1032, speed=48.7 f/s, elapsed=0:01:00
Episode 15: reward=-21, steps=878, speed=48.8 f/s, elapsed=0:01:17
Episode 16: rew

KeyboardInterrupt: 