<a href="https://colab.research.google.com/github/shobha-nosimpler/Neural_Network/blob/main/NN_Bayesian_Optimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Here's a brief example of how you can use Bayesian Optimization for tuning the hyperparameters of a simple neural network using the 'skopt' library, Scikit-Optimize, in Python

# Import the libraries
from skopt
- skopt.space Real, Integer
- skopt.utils use_named_args
- skopt gp_minimize

from sklearn
- sklearn.datasets load_breast_cancer
- sklearn.model_selection train_test_split
- sklearn.preprocessing StandardScaler
- sklearn.neural_network MPLClassifier
- sklearn.metrics accuracy_score

In [5]:
!pip install scikit-optimize

Collecting scikit-optimize
  Downloading scikit_optimize-0.10.2-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting pyaml>=16.9 (from scikit-optimize)
  Downloading pyaml-24.7.0-py3-none-any.whl.metadata (11 kB)
Downloading scikit_optimize-0.10.2-py2.py3-none-any.whl (107 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.8/107.8 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyaml-24.7.0-py3-none-any.whl (24 kB)
Installing collected packages: pyaml, scikit-optimize
Successfully installed pyaml-24.7.0 scikit-optimize-0.10.2


In [6]:
import numpy as np
from skopt import gp_minimize
from skopt.space import Real, Integer
from skopt.utils import use_named_args
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Step 1 : Data Loading and Preprocessing
- We load the breast cancer data and split it into training & testing sets
- The data is then scaled using 'StandardScaler'

In [7]:
# Load and preprocess data
data = load_breast_cancer()

In [8]:
print(type(data))

<class 'sklearn.utils._bunch.Bunch'>


In [12]:
print(data.keys())

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])


In [9]:
print(type(data.data))

<class 'numpy.ndarray'>


In [11]:
print(data.data.shape)

(569, 30)


In [13]:
print(data.target_names)
print(data.feature_names)

['malignant' 'benign']
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


In [10]:
print(type(data.target))

<class 'numpy.ndarray'>


In [16]:
print(data.data[0].shape)

(30,)


In [15]:
# first row data for 30 features
print(data.data[0])

[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
 6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
 1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
 4.601e-01 1.189e-01]


In [18]:
# split the data which is numpy array of shape (569,30) : 80 percet Train and 20 percent Test
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

In [19]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(455, 30)
(114, 30)
(455,)
(114,)


In [20]:
print(X_train[0])
print(y_train[0])

[9.029e+00 1.733e+01 5.879e+01 2.505e+02 1.066e-01 1.413e-01 3.130e-01
 4.375e-02 2.111e-01 8.046e-02 3.274e-01 1.194e+00 1.885e+00 1.767e+01
 9.549e-03 8.606e-02 3.038e-01 3.322e-02 4.197e-02 9.559e-03 1.031e+01
 2.265e+01 6.550e+01 3.247e+02 1.482e-01 4.365e-01 1.252e+00 1.750e-01
 4.228e-01 1.175e-01]
1


In [22]:
print(np.unique(y_train))
print(np.unique(y_test))

[0 1]
[0 1]


In [23]:
# Standardise the train and test data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [24]:
# print the first row for the train and test data after scaling
print(X_train[0])
print(X_test[0])

[-1.44075296 -0.43531947 -1.36208497 -1.1391179   0.78057331  0.71892128
  2.82313451 -0.11914956  1.09266219  2.45817261 -0.26380039 -0.01605246
 -0.47041357 -0.47476088  0.83836493  3.25102691  8.43893667  3.39198733
  2.62116574  2.06120787 -1.23286131 -0.47630949 -1.24792009 -0.97396758
  0.72289445  1.18673232  4.67282796  0.9320124   2.09724217  1.88645014]
[-0.46649743 -0.13728933 -0.44421138 -0.48646498  0.28085007  0.04160589
 -0.11146496 -0.26486866  0.41524141  0.13513744 -0.02091509 -0.29323907
 -0.17460869 -0.2072995  -0.01181432 -0.35108921 -0.1810535  -0.24238831
 -0.33731758 -0.0842133  -0.2632354  -0.14784208 -0.33154752 -0.35109337
  0.48001942 -0.09649594 -0.03583041 -0.19435087  0.17275669  0.20372995]


# Step 2: Search Space
The hyperparameter space includes:
- learning_rate_init : The initial learning rate of training
- hidden_layer_sizes : The number of neurons in the hidden_layer(an integer)

In [25]:
# Define the hyperparameter search space
space = [
    Real(1e-6, 1e-2, "log-uniform", name='learning_rate_init'),
    Integer(1, 200, name='hidden_layer_sizes')
]

In [27]:
print(type(space[0]))
print(type(space[1]))

<class 'skopt.space.space.Real'>
<class 'skopt.space.space.Integer'>


In [28]:
print(space[0])
print(space[1])

Real(low=1e-06, high=0.01, prior='log-uniform', transform='identity')
Integer(low=1, high=200, prior='uniform', transform='identity')


# Step 3: Objective Function
- the objective function is defined to minimize the negative accuracy of the neural network
- this function will be called during the optimization process

# What is MLPClassifier : multi-layer perceptron classifier

- A multi-layer perceptron (MLP) classifier is a neural network that uses supervised learning to predict the category of an input data point.

- MLPs are a type of artificial neural network that consists of multiple layers of neurons, with each layer fully connected to the next.

- The neurons in the MLP typically use nonlinear activation functions, which allow the network to learn complex patterns in data

- MLPs are a popular choice for machine learning applications because they can handle complex classification tasks, model non-linear relationships, and flexibly handle different data types. For example, MLPs can be used for tasks such as classification, regression, and pattern recognition.

- In an MLP, the network is trained using input data as a DataSet and output labels as a LabelSet. The network is then trained to predict the most probable label for a given data point input. For example, the MLP classifier can support multi-class classification by applying Softmax as the output function. The model can also support multi-label classification, where a sample can belong to more than one class.

# Difference between neuron and perceptron

A neuron and a perceptron are related concepts but not exactly the same:

Neuron: In the context of neural networks, a neuron is a basic unit that performs a computation. **It receives inputs, applies a weighted sum, passes it through an activation function, and produces an output.** Neurons are the building blocks of more complex structures like layers and networks.

Perceptron: A perceptron is a type of artificial neuron and one of the simplest forms of a neural network. **It is essentially a single-layer neural network with a step activation function. The perceptron was one of the earliest models of a neuron and is typically used for binary classification tasks.**

So, while a perceptron is a specific type of neuron, the term "neuron" in a broader neural network context refers to a more general concept that includes various types of activation functions and layers, not just the step function used in perceptrons.

In [29]:
# Define the objective function to minimize (negative accuracy)
@use_named_args(space)
def objective(learning_rate_init, hidden_layer_sizes):
    model = MLPClassifier(learning_rate_init=learning_rate_init, hidden_layer_sizes=(hidden_layer_sizes,),
                          random_state=42, max_iter=1000)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    return -accuracy  # Minimize the negative accuracy

# Step 4 : Bayesian Optimization
The 'gp_minimize' function form skopt is used to perform Bayesian optimization searching for the best hyperparameters over 20 calls

In [30]:
# Run Bayesian Optimization
result = gp_minimize(objective, space, n_calls=20, random_state=42)



# Bayesian optimization took 46 seconds to complete


# Step 6 : Result
The best hyperparameters found and the corresponding accuracy are printed

This example is a basic illustration. In a real_world scenario you might want to tune additional hyperparmeters and consider a more complex model.

In [31]:
# Output the best hyperparameters and corresponding accuracy
print("Best hyperparameters:")
print("Learning rate:", result.x[0])
print("Hidden layer size:", result.x[1])
print("Best accuracy:", -result.fun)


Best hyperparameters:
Learning rate: 0.005678201970293135
Hidden layer size: 1
Best accuracy: 0.9912280701754386
