### Important Points:
### Adding weight regularization (adding to the loss function of a cost associated with having large weights):

L1 Regularization: The cost added is proportional to the absolute value of the weight coefficients.

L2 Regularization: The cost added is proportional to the square of the value of the weight coefficients.
### Example: Adding weight regularization to the model.

In [8]:
from keras import regularizers
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001),
                      activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001),activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

#### Note: l2(0.001) means every coefficient in the weight matrix of the layer will add 0.001 * weight_coefficient_value to the total loss of the network.

In [9]:
# regularizers.l1(0.001)
# regularizers.l1_l2(l1=0.001, l2=0.001)

### Dropout:
Dropout is one of the most effective and most commonly used regularization techniques for neural networks. Dropout, applied to a layer, consists of randomly dropping out (setting to zero) a number of output features of the layer during training. The dropout rate is the fraction of the features that are zeroed out; it’s usually set between 0.2 and 0.5. At test time, no units are dropped out; instead, the layer’s output values are scaled down by a factor equal to the dropout rate, to balance for the fact that more units are active than at training time.

In [10]:
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))

#### The most common ways to prevent overfitting in neural networks:
1. Get more training data
2. Reduce the Network capacity
3. Add Weight Regularization
4. Add Dropout

### The universal workflow of machine learning:
1. Defining the problem and assembling a dataset:-
 - What will your input data be? What are you trying to predict? You can only learn to predict something if you have available training data. 
 - What type of problem are you facing? Is it binary classification? Multiclass classification? Scalar regression? Vector regression? Multiclass, multilabel classification? or Something else, like clustering, generation, or reinforcement learning? Identifying the problem type will guide the choice of model architecture, loss function, and so on.
2. Choosing a measure of sucess:-
 - For balanced-classification problems, where every class is equally likely, accuracy and area under the receiver operating characteristic curve (ROC AUC) are common metrics. For class-imbalanced problems, you can use precision and recall. For ranking problems or multilabel classification, you can use mean average precision. And it isn’t uncommon to have to define your own custom metric by which to measure success.
3. Deciding on an evaluation protocol (Pick one of the following):-
 - Maintaining a hold-out validation set.
 - Doing K-fold cross-validation.
 - Doing iterated K-fold validation.
4. Preparing your data:-
 - Data should be formatted as tensor.
 - Scaled to small values [0, 1] range.
 - Features must be normalized.
 - Do some feature engeneering, especially for small-data problems.
5. Developing a Model:-
![image.png](attachment:image.png)
6. Scaling up: Developing a model that overfit:-
7. Regularizing your model and tuning hyperparameters
 - Add Dropout
 - Try different architectures: add or remove layers
 - Add L1 or/and L2 regularization
 - Try different hyperparameters (such as the number of units per layer or the learning rate of the optimizer) to find the optimal configuration.
 