Skip to content

iAmKankan/Regularization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Index

deep

Regularization

deep

  • Deep neural networks typically can have several of thousands of parameters.
  • With so many parameters, the network has an incredible amount of freedom and can fit a huge variety of complex datasets.
  • But this great flexibility also means that it is prone to overfitting the training set.
  • Regularization is a technique that reduces Overfitting.

Different regularization techniques

light

  • One of the best regularization techniques is Early stopping.
  • Even though Batch Normalization was designed to solve the vanishing/exploding gradients problems, is also acts like a pretty good regularizer.
  • Other popular regularization techniques for neural networks:

ℓ1 and ℓ2 Regularization

light

  • We can use ℓ1 and ℓ2 regularization to constrain a neural network’s connection weights (but typically not its biases).

Apply ℓ2 regularization to a Keras layer’s connection weights, using a regularization factor of 0.01:

layer = keras.layers.Dense(100, activation="elu", 
                           kernel_initializer="he_normal", 
                           kernel_regularizer=keras.regularizers.l2(0.01))
                         
  • The l2() function returns a regularizer that will be called to compute the regularization loss, at each step during training.

  • This regularization loss is then added to the final loss.

  • You can just use keras.regularizers.l1() if you want ℓ1 regularization, and if you want both ℓ1 and ℓ2 regularization, use keras.regu larizers.l1_l2() (specifying both regularization factors).

from functools import partial
RegularizedDense = partial(keras.layers.Dense,
                          activation="elu",
                          kernel_initializer="he_normal",
                          kernel_regularizer=keras.regularizers.l2(0.01))
model = keras.models.Sequential([
                     keras.layers.Flatten(input_shape=[28, 28]),
                     RegularizedDense(300),
                     RegularizedDense(100),
                     RegularizedDense(10, activation="softmax",
                     kernel_initializer="glorot_uniform")

Since you will typically want to apply the same regularizer to all layers in your network, as well as the same activation function and the same initialization strategy in all hidden layers, you may find yourself repeating the same arguments over and over. This makes it ugly and error-prone. To avoid this, you can try refactoring your code to use loops. Another option is to use Python’s functools.partial() function: it lets you create a thin wrapper for any callable, with some default argument values. For

Dropout

light

Monte-Carlo (MC) Dropout

light

Max-Norm Regularization

light

Summary and Practical Guidelines

deep

L1 Regularization (L1 = lasso):

  • The main objective of creating a model(training data) is making sure it fits the data properly and reduce the loss.

  • Sometimes the model that is trained which will fit the data but it may fail and give a poor performance during analyzing of data (test data). This leads to overfitting. Regularization came to overcome overfitting.

  • Lasso Regression (Least Absolute Shrinkage and Selection Operator) adds “Absolute value of magnitude” of coefficient, as penalty term to the loss function.

  • Lasso shrinks the less important feature’s coefficient to zero; thus, removing some feature altogether.

  • So,this works well for feature selection in case we have a huge number of features.

  • Methods like Cross-validation, Stepwise Regression are there to handle overfitting and perform feature selection work well with a small set of features.

  • These techniques are good when we are dealing with a large set of features.

  • Along with shrinking coefficients, the lasso performs feature selection, as well. (Remember the ‘selection‘ in the lasso full-form?) Because some of the coefficients become exactly zero, which is equivalent to the particular feature being excluded from the model.

L2 Regularization(L2 = Ridge Regression):

  • Ridge regression adds “squared magnitude of the coefficient" as penalty term to the loss function. Here the box part in the above image represents the L2 regularization element/term.

R square(where to use and where not)

  • R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.

Data Augmentation or Training Set Expansion

light

  • Data augmentation artificially increases the size of the training set by generating many realistic variants of each training instance.
  • This reduces overfitting, making this a regularization technique.
  • The generated instances should be as realistic as possible:
    • ideally, given an image from the augmented training set, a human should not be able to tell whether it was augmented or not.
  • Simply adding white noise will not help; the modifications should be learnable (white noise is not).
  • For example, you can slightly shift, rotate and resize every picture in the training set by various amounts and add the resulting pictures to the training set.
    • This forces the model to be more tolerant to variations in the position, orientation and size of the objects in the pictures.
    • For a model that’s more tolerant of different lighting conditions, you can similarly generate many images with various contrasts.
  • In general, you can also flip the pictures horizontally (except for text, and other asymmetrical objects).
  • By combining these transformations, you can greatly increase the size of your training set.
  • Text Augmentation

References

About

Tutorial to handle Overfitting-Underfitting

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published