# Choice of Activation Function for Output Layer

We need to choose the softmax function for the output layer with 4 output neurons in our case. This is because we are dealing with a multi-class classification problem with non-overlapping categories. A function which seems reasonable to use is the sigmoid, but this is not appropriate here. Sigmoid is best suited for binary classification or for multi-class classification where the categories are not mutually exclusive (for example, does this picture contain a dog or a cat, or both?). 

Source: https://stats.stackexchange.com/questions/218542/which-activation-function-for-output-layer

> Regression: linear (because values are unbounded)
> Classification: softmax (simple sigmoid works too but softmax works better)
> Use simple sigmoid only if your output admits multiple "true" answers, for instance, a network that checks for the presence of various objects > in an image. In other words, the output is not a probability distribution (does not need to sum to 1).

Source: [Introduction to Statistical Learning](https://www.statlearning.com/)

Checking page 140, equation 4.11, we see that the sigmoid is a special case of the softmax function. In particular, the sigmoid function [1] is defined by $$\sigma(z) = \frac{1}{1+e^{-z}}$$ while the softmax function [1, 2] is the vector defined as $$\textrm{softmax}(z)_i = \frac{e^{z_i}}{1 + \sum_{j=1}^{K} e^{-z_j}}.$$

Hence, if $J=2$, we get that the softmax function gives the vector $(\sigma(z_1), 1-\sigma(z_1))$. This is equivalent to using the sigmoid function.


# References 

[1] https://www.pinecone.io/learn/softmax-activation/

[2] https://www.singlestore.com/blog/a-guide-to-softmax-activation-function/

# Appendix

Here's a simple code to convince you that this is correct.

In [14]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

# Generate a custom dataset
np.random.seed(42)
n_samples = 1000
X = np.random.rand(n_samples, 3)  # 100 samples, 3 features (x1, x2, x3)
y = (X[:, 0] + X[:, 1]**2 + X[:, 2]**3 > 1).astype(int)  # y = 1 if x1 + x2^2 + x3^3 > 1, else 0

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Build a simple neural network model
def create_model(output_activation):
    model = Sequential([
        tf.keras.Input(shape=(X_train.shape[1],)),  # Define input shape here to avoid the warning
        Dense(10, activation='relu'),
        Dense(5, activation='relu'),
        Dense(1, activation=output_activation)
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Train model with sigmoid activation
model_sigmoid = create_model(output_activation='sigmoid')
model_sigmoid.fit(X_train, y_train, epochs=50, verbose=0)
sigmoid_probs = model_sigmoid.predict(X_test)  # Probabilities from sigmoid
sigmoid_preds = (sigmoid_probs > 0.5).astype(int)  # Convert probabilities to binary predictions
sigmoid_accuracy = np.mean(sigmoid_preds.flatten() == y_test)
print(f"Sigmoid Model Accuracy: {sigmoid_accuracy:.2f}")

# Print sigmoid predictions, true labels, and probabilities for review
print("\nSigmoid Model Predictions (First 5 Examples):")
for i in range(5):
    print(f"Prediction: {sigmoid_preds[i][0]}, True Label: {y_test[i]}, Probability: {sigmoid_probs[i][0]:.4f}")

# Train model with softmax activation
# Note: For softmax, we need 2 output neurons to represent the two classes
def create_model_softmax():
    model = Sequential([
        tf.keras.Input(shape=(X_train.shape[1],)),  # Define input shape here to avoid the warning
        Dense(10, activation='relu'),
        Dense(5, activation='relu'),
        Dense(2, activation='softmax')  # Two neurons for binary classes 0 and 1
    ])
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

model_softmax = create_model_softmax()
model_softmax.fit(X_train, y_train, epochs=50, verbose=0)
softmax_probs = model_softmax.predict(X_test)  # Probabilities from softmax
softmax_preds = np.argmax(softmax_probs, axis=1)  # Convert probabilities to class predictions
softmax_accuracy = np.mean(softmax_preds == y_test)
print(f"\nSoftmax Model Accuracy: {softmax_accuracy:.2f}")

# Print softmax predictions, true labels, and probabilities for review
print("\nSoftmax Model Predictions (First 5 Examples):")
for i in range(5):
    print(f"Prediction: {softmax_preds[i]}, True Label: {y_test[i]}, Probabilities: {softmax_probs[i]}")


[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
Sigmoid Model Accuracy: 0.98

Sigmoid Model Predictions (First 5 Examples):
Prediction: 1, True Label: 1, Probability: 0.7786
Prediction: 1, True Label: 1, Probability: 0.9997
Prediction: 1, True Label: 0, Probability: 0.6135
Prediction: 0, True Label: 0, Probability: 0.4418
Prediction: 0, True Label: 0, Probability: 0.0033
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step

Softmax Model Accuracy: 0.99

Softmax Model Predictions (First 5 Examples):
Prediction: 1, True Label: 1, Probabilities: [0.15318818 0.84681183]
Prediction: 1, True Label: 1, Probabilities: [2.7668298e-06 9.9999726e-01]
Prediction: 0, True Label: 0, Probabilities: [0.54315895 0.45684102]
Prediction: 0, True Label: 0, Probabilities: [0.72805595 0.27194405]
Prediction: 0, True Label: 0, Probabilities: [0.99405706 0.00594293]


The differences stem from the loss function. You need to use the sparse categorical cross-entropy for softmax. There are also 2 output neurons for the NN trained with softmax output layer, so the backpropagation (fitting stage) will be slightly different. Softmax outputs a probability distribution over two classes, while sigmoid outputs a single probability value for one class.