# Optimizers

Optimizers are algorithms used to adjust the parameters of a model in order to minimize the loss function during the training process. They determine how the model learns and updates its weights. Some commonly used optimizers include Stochastic Gradient Descent (SGD), Adam, RMSprop, and Adagrad. Each optimizer has its own update rule and hyperparameters.
for example there is forward and backward propagation in neural nets where you update each of the weights and biases by finding partial derivatives.

In case Gradient Descenet Algorithm:
Weights and biases are updated using gradient descent algorithm:
𝑤𝑖+1 = 𝑤𝑖 − 𝛼 𝜕𝐿/𝜕𝑤
𝑏𝑖+1 = 𝑏𝑖 − 𝛼 𝜕𝐿/𝜕𝑏

Propagation equations along with backpropagation algorithm is used to find the partial derivatives.

# Built-in Optimizer

In [None]:
import tensorflow as tf

# Define your model architecture
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(input_dim,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(output_dim, activation='softmax')
])

# Define your optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

# Compile the model
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_val, y_val))


Now in the code above, we can see "Adam" optimizer, relu activation function used in hidden layers and softmax function used in output as its multi class classification where "categorical_crossentropy" loss function is used.

Lets delve in to activation and loss functions one by one.


# Activation Functions 

The last layer activation refers to the activation function applied to the output layer of a neural network. It is responsible for mapping the input to a suitable range or representation for the task at hand. The choice of activation function depends on the type of problem being solved. For regression tasks, linear activation or identity function is commonly used. For binary classification, sigmoid activation is often used. For multi-class classification, softmax activation is typically employed to produce a probability distribution over the classes.


# Sigmoid Activation function

Typically used in output layer for binary classification
• Sigmoid Function:
𝜎(𝑧) = 1/(1+𝑒^(−𝑧))

Gives a value between zero and one
Output can be interpreted as a probability of positive label
Positive label if value greater than 0.5

• Has vanishing gradient problem: It is a phenomenon that occurs in neural networks when using the sigmoid activation function (specifically the standard sigmoid or logistic sigmoid function). It refers to the issue where the gradients of the error function with respect to the weights and biases become extremely small as they are backpropagated to earlier layers of the network.


# Softmax Activation Function

Used in output layer for multiclass classification problems
Number of output neurons equal to the number of classes

The sum of values of all output neurons is equal to 1
The maximum value represents the predicted class


# ReLU Activation Function

Used in output layer for regression tasks
Preferred choice in hidden layers
Number of output neurons is equal to one

• ReLU function:
𝜎(𝑧) = max(0,𝑧)

The output is either zero or a linear function
The output represents the prediction
• Might result in exploding gradient problem

# Loss functions

Loss functions, also known as objective functions or cost functions, quantify the error between the predicted outputs of a model and the actual ground truth labels. They serve as a measure of how well the model is performing. The choice of loss function depends on the type of problem being solved. Mean Squared Error (MSE) is commonly used for regression tasks, while Binary Cross-Entropy and Categorical Cross-Entropy are frequently used for binary and multi-class classification tasks, respectively. There are also specialized loss functions for specific tasks such as ranking, sequence generation, and object detection.

# Binary Cross Entropy Loss

Cross entropy is a measure of difference between two probability distributions.

• Binary cross entropy loss:
𝐿(𝑑,𝑧) = −[𝑑log(𝑧) + (1 − 𝑑)log(1−𝑧)]
𝑑 is the actual output and 𝑧 is the prediction

Used for binary classification problems.


# Cross Entropy Loss

Also known as Categorical Cross Entropy
Used for multiclass classification problems

The cross entropy loss measures the dissimilarity between the true label and the predicted probability distribution. It encourages the predicted probabilities to align with the true label by penalizing large differences between the two. The loss is minimized when the predicted probabilities closely match the one-hot encoded true label.



# Mean Squared Error Loss

Used for regression problems.

• Mean squared error loss:
L(d,z) = 1/2𝑁 ∑(𝑖=1 to 𝑁) (𝑧𝑖−𝑑𝑖)2

𝑑 is the actual output and 𝑧 is the prediction


# Evaluation Metrics

Evaluation metrics are used to measure the performance of a machine learning model. They provide quantitative measures that assess the accuracy, precision, recall, and other aspects of the model's predictions. The choice of evaluation metric depends on the specific problem and the desired outcome. Some commonly used evaluation metrics include accuracy, precision, recall, F1 score, area under the ROC curve (AUC-ROC), mean average precision (mAP), and mean squared error (MSE).

Commonly used evaluation metrics for binary classification tasks, along with their formulas in terms of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN):

1. Accuracy:
Accuracy measures the overall correctness of the model's predictions.
Formula: Accuracy= TP+TN/TP+TN+FP+FN

2. Precision:
Precision measures the proportion of correctly predicted positive instances among all instances predicted as positive. It quantifies how precise the positive predictions are.
Formula: Precision= TP/TP+FP

3. Recall (Sensitivity or True Positive Rate):
Recall measures the proportion of correctly predicted positive instances out of all actual positive instances. It quantifies how well the model identifies positive instances.
Formula: Recall= TP/TP+FN

4. F1 Score:
The F1 score is the harmonic mean of precision and recall. It provides a balanced measure that combines both metrics.
Formula:  F1 Score = 2×Precision×Recall/Precision+Recall

Other evaluation metrics for Regression problems are:
MSE, RMSE, MAE, R1 Score etc.

