We study the effect of various BNNs on the adversarial loss.
Implement other architectures, for our datasets.
- MNIST
- CIFAR10
- ImageNet
- FGVC
Implement adversarial attack for BNNs.
Implement training of BNNs.
- Gaussian NN
- Dropout NN
- Ensemble
- SWAG
This file contains the Bayesian layers, especially the variant of nn.Linear and nn.Conv2D.
All layers have additional parameter init_dict
, which allows to initialize the weights with different mean/variance. By default, He initialization is used.
Weight Uncertainties in Neural Networks
Gaussian_Linear
:nn.Linear
Gaussian_Conv2D
,Gaussian_Conv2D_LRT
:nn.Conv2D
These classes requires one new parameter var_type
, which is the model of using its variance. "sq"
uses the sigma parameter as std,
weight_var = torch.square(weight_sigma)
and "exp"
uses the sigma parameter as log std.
weight_var = torch.exp(weight_sigma)
The difference between Gaussian_Conv2D
and Gaussian_Conv2D_LRT
is that, LRT uses Local Reparameterization Trick (Variational Dropout and the Local Reparameterization Trick). LRT is much faster, however it does not exactly follow the weight sharing of convolution.
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Dropout_Linear
:nn.Linear
Dropout_Conv2D
:nn.Conv2D
These classes requires two new parameter dropout_rate
and dropout_type
. The dropout_rate
is the probability that we drop the entries. The dropout_type
is the mode of how we dropout the values. "w"
dropouts the weights, "f"
dropouts the activations, "c"
dropouts the channels. "w"
is slow, but matches the original paper, where "f"
, "c"
is suggested in the above paper variational dropout.
Currently, all the implemented models are ResNet18, with input dimension (1, 224, 224). The output has shape (20), 10 for the output, and 10 for the uncertainty.
Based on Gaussian Layers.
Based on Dropout Layers.
(Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles)[https://arxiv.org/abs/1612.01474]
Generate num_ensemble
numbers of Vanilla Resnet18. Its output is (20) * num_ensemble
.
(A Simple Baseline for Bayesian Uncertainty in Deep Learning)[https://arxiv.org/abs/1902.02476]
First, train the vanilla ResNet18. Then use get_swag_parameter
to get the parameters for SWAG. You need training details like train_dataloader
, lr
, loss_fn
.
This function will sample num_samples
parameters to approximate mean and variance, each parameters are sampled per frequency
SGD steps. The rank
adds low rank approximation for covariance. Higher rank
is slower, but stronger.
Then applying these parameters to swag_resnet18_inference
gives results.
Implement the adversarial attacks.
We may assume that we have N
samples of output and their uncertainties.