<hr style="border:2px solid gray"> </hr>
<img src="https://mma.prnewswire.com/media/1095203/East_Tennessee_State_University_Logo.jpg?p=facebook" width=200 height=200 />

<div class="alert alert-block alert-info">
    <h1 style="text-align: center">CSCI 5270 - Machine Learning</h1>
</div>

### <center>Supervised Learning - FeedForward Networks and Multilayer Perceptron (MLP) </center>

<center>Dr. Ahmad Al-Doulat </center>
<center>Department of Computing </center>
<center>East Tennessee State University</center>

<hr style="border:2px solid lightblue"> </hr>

<div class="alert alert-block alert-success">
    <h2 style="text-align: left">Regression with Multilayer Perceptron Using sklearn</h2>
</div>

#### Dataset:
We'll use the Housing dataset, which contains information about housing. The task is to predict the median value of owner-occupied homes.

#### Step 1: Data Loading and Preprocessing


In [27]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the dataset
data = pd.read_csv('housing.csv')

# Split the data into features and target
X = data.drop('medv', axis=1)
y = data['medv']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

#### Step 2: Model Building and Training

In [28]:
from sklearn.neural_network import MLPRegressor

# Create MLP Regressor
mlp_regressor = MLPRegressor(hidden_layer_sizes=(100, 50),
                             activation='tanh',
                             solver='adam',
                             max_iter=2000,
                             alpha=0.0001,
                             random_state=42)

# Train the model
mlp_regressor.fit(X_train_scaled, y_train)



#### Step 3: Model Evaluation

In [29]:
from sklearn.metrics import mean_squared_error

# Predictions
y_pred = mlp_regressor.predict(X_test_scaled)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

Mean Squared Error: 23.694546443841087


#### Step 4: Hyperparameter Tuning

#### Hyperparameters:

1. `'hidden_layer_sizes'`: This parameter specifies the number of neurons in each hidden layer of the Multi-layer Perceptron (MLP) model. It is a list of tuples, where each tuple represents a different configuration of hidden layer sizes. For example,

    - `(50,)` indicates one hidden layer with 50 neurons, 
    - `(100,)` indicates one hidden layer with 100 neurons, 
    - `(50, 50)` indicates two hidden layers with 50 neurons each, and 
    - `(100, 50)` indicates two hidden layers with 100 and 50 neurons, respectively.

2. `'activation'`: This parameter specifies the activation function for the neurons in the MLP model. It is a list containing strings representing different activation functions. In this case, it includes `'relu'` (Rectified Linear Unit) and `'tanh'` (Hyperbolic Tangent). Some common activation functions used in neural networks along with their descriptions:

    1.1. **ReLU (Rectified Linear Unit)**:
   - Formula: $f(x) = \max(0, x)$
   - Description: ReLU is one of the most widely used activation functions. It introduces non-linearity to the model by outputting the input directly if it is positive, and zero otherwise. ReLU is computationally efficient and helps mitigate the vanishing gradient problem.

    1.2. **Tanh (Hyperbolic Tangent)**:
   - Formula: $f(x) = \frac{{e^x - e^{-x}}}{{e^x + e^{-x}}}$
   - Description: Tanh is another popular activation function. It squashes the input values between -1 and 1, making it zero-centered. Tanh is often used in hidden layers of neural networks, especially when the data is normalized, as it helps with training stability.

    1.3. **Sigmoid (Logistic)**:
   - Formula: $f(x) = \frac{1}{{1 + e^{-x}}}$
   - Description: Sigmoid function is commonly used in binary classification problems. It squashes the input values between 0 and 1, interpreting them as probabilities. However, it suffers from the vanishing gradient problem, especially during backpropagation.

    1.4. **Softmax**:
   - Formula: $f(x_i) = \frac{{e^{x_i}}}{{\sum_{j} e^{x_j}}}$
   - Description: Softmax is often used in the output layer of neural networks for multi-class classification problems. It normalizes the output values into a probability distribution, where each output represents the probability of a class. Softmax ensures that the sum of all probabilities is equal to 1.

    1.5. **Leaky ReLU**:
   - Formula: $f(x) = \begin{cases} x & \text{if } x > 0 \\ \text{constant} \times x & \text{otherwise} \end{cases}$
   - Description: Leaky ReLU is a variant of ReLU that addresses the "dying ReLU" problem. It allows a small, non-zero gradient when the input is negative, preventing neurons from becoming inactive during training.

These are some of the most commonly used activation functions in neural networks, each with its own characteristics and suitability for different types of problems and architectures.

3. `'solver'`: This parameter specifies the optimization algorithm used to update the weights of the neural network during training. It is a list containing strings representing different solvers. In this case, it includes `'adam'` (adaptive moment estimation) and `'sgd'` (stochastic gradient descent).

    3.1. **Adam (Adaptive Moment Estimation)**:
    - Description: Adam is an adaptive learning rate optimization algorithm that combines the advantages of two other extensions of stochastic gradient descent, namely Adagrad and RMSprop. It maintains adaptive learning rates for each parameter and computes the first and second moments of the gradients to adaptively adjust the learning rates.

    3.2. **SGD (Stochastic Gradient Descent)**:
    - Description: Stochastic Gradient Descent is a fundamental optimization algorithm used for training neural networks. It updates the model parameters by computing the gradient of the loss function with respect to the parameters for each training example and adjusting the parameters in the opposite direction of the gradient. It operates on batches of training data, and the learning rate determines the step size in the parameter space.

4. `'alpha'`: This parameter represents the L2 penalty (regularization term) parameter. It is a list containing float values representing different regularization strengths. In this case, it includes 0.0001, 0.001, and 0.01. Regularization is a technique used to prevent overfitting in machine learning models. Regularization techniques impose additional constraints on the model, discouraging overly complex solutions that may lead to overfitting.

These parameters and their corresponding values define a grid of hyperparameters that will be searched exhaustively by `GridSearchCV` to find the best combination of hyperparameters for the MLPRegressor model. The best combination will be determined based on the performance metric (mean squared error in this case) evaluated through cross-validation.

In [30]:
from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)],
    'activation': ['relu', 'tanh'],
    'solver': ['adam', 'sgd'],
    'alpha': [0.0001, 0.001, 0.01],
}

# Perform Grid Search
grid_search = GridSearchCV(MLPRegressor(max_iter=2000, random_state=42), param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train)

# Best parameters
print("Best Parameters:", grid_search.best_params_)

# Evaluate model with best parameters
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print("Best Model Mean Squared Error:", mse)

Best Parameters: {'activation': 'relu', 'alpha': 0.01, 'hidden_layer_sizes': (100,), 'solver': 'sgd'}
Best Model Mean Squared Error: 12.956837798198732


<div class="alert alert-block alert-success">
    <h2 style="text-align: left">Classification with Multilayer Perceptron Using sklearn</h2>
</div>

#### Dataset:
We'll use the Iris dataset available in scikit-learn, which contains information about iris flowers and the task is to classify them into three species.

#### Step 1: Data Loading and Preprocessing

In [17]:
# Load the dataset
data = pd.read_csv('housing.csv')

# Split the data into features and target
X = data.drop('medv', axis=1)
y = data['medv']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [18]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Encode labels
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)

#### Step 2: Model Building and Training

In [19]:
from sklearn.neural_network import MLPClassifier

# Create MLP Classifier
mlp_classifier = MLPClassifier(hidden_layer_sizes=(100, 50),
                               activation='relu',
                               solver='adam',
                               max_iter=500,
                               random_state=42)

# Train the model
mlp_classifier.fit(X_train_scaled, y_train_encoded)

#### Step 3: Model Evaluation

In [20]:
from sklearn.metrics import accuracy_score

# Predictions
y_pred = mlp_classifier.predict(X_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test_encoded, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0


#### Step 4: Hyperparameter Tuning

`np.logspace(-4, 0, 5)`: This part of the code uses NumPy's `logspace` function to generate an array of values for alpha. Here's what each argument means:

   - `-4`: This is the start exponent of the sequence. In this case, it's -4, so the smallest value generated will be $10^{-4}$.
   
   - `0`: This is the stop exponent of the sequence. It's 0, so the largest value generated will be $10^0 = 1$.
   
   - `5`: This is the number of values to generate between the start and stop exponents, inclusive. So, in this case, it will generate 5 values spaced evenly on a logarithmic scale between $10^{-4}$ and $10^0$.

In [23]:
for item in np.logspace(-4, 0, 5):
    print(item)

0.0001
0.001
0.01
0.1
1.0


In [21]:
from sklearn.model_selection import RandomizedSearchCV
import numpy as np

# Define parameter distribution
param_dist = {
    'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)],
    'activation': ['relu', 'tanh'],
    'solver': ['adam', 'sgd'],
    'alpha': np.logspace(-4, 0, 5),
}

# Perform Randomized Search
random_search = RandomizedSearchCV(MLPClassifier(max_iter=2000, random_state=42), param_dist, cv=5, n_iter=10)
random_search.fit(X_train_scaled, y_train_encoded)

# Best parameters
print("Best Parameters:", random_search.best_params_)

# Evaluate model with best parameters
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test_scaled)
accuracy = accuracy_score(y_test_encoded, y_pred)
print("Best Model Accuracy:", accuracy)

Best Parameters: {'solver': 'sgd', 'hidden_layer_sizes': (100, 50), 'alpha': 0.01, 'activation': 'relu'}
Best Model Accuracy: 1.0
