Deep Learning Binary Classification Model with Dropout Regularization

We start by importing the necessary libraries, which includes numpy for numerical operations, keras for building deep learning models, and sklearn for generating the classification data and splitting it into training and testing sets. We also set a random seed to ensure that the results are reproducible.

In [2]:
# import necessary libraries
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# set random seed for reproducibility
np.random.seed(123)


We define some hyperparameters for the model, including the number of input features, the number of output classes, the number of hidden units in each layer, the number of layers in the model, the batch size for training, the number of epochs for training, the learning rate for the optimizer, and the dropout rate for regularization.



In [3]:
# hyperparameters
input_dim = 20  # number of input features
output_dim = 2  # number of output classes
hidden_dim = 64  # number of hidden units in each layer
num_layers = 3  # number of layers in the model
batch_size = 128  # batch size for training
num_epochs = 50  # number of epochs for training
learning_rate = 0.001  # learning rate for optimizer
dropout_rate = 0.2  # dropout rate for regularization


We generate some random data for binary classification using the make_classification function from sklearn. This function creates a synthetic dataset with a specified number of samples, input features, output classes, and informative features, as well as a random seed for reproducibility.

In [4]:
# generate some random data for binary classification
X, y = make_classification(n_samples=10000, n_features=input_dim,
                            n_classes=output_dim, n_informative=10,
                            random_state=123)


We split the data into training and testing sets using train_test_split from sklearn. This function randomly splits the data into training and testing sets, with a specified test size and random seed for reproducibility.

In [5]:
# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
                                                    random_state=123)


We create a new model using the Sequential class from keras. This class allows us to stack multiple layers of neurons sequentially to create a deep learning model. We then add the input layer, which is a Dense layer with a specified number of hidden units, input dimensions, and activation function. We also add a Dropout layer after the input layer to randomly drop out a fraction of the neurons during training to reduce overfitting. We then add multiple hidden layers in a for loop, with each layer being a Dense layer with a specified number of hidden units and activation function, followed by a Dropout layer to regularize the model. Finally, we add the output layer, which is a `D

In [6]:
# create a new model
model = Sequential()

# add input layer
model.add(Dense(hidden_dim, input_dim=input_dim, activation='relu'))
model.add(Dropout(dropout_rate))

# add hidden layers
for i in range(num_layers-1):
    model.add(Dense(hidden_dim, activation='relu'))
    model.add(Dropout(dropout_rate))

# add output layer
model.add(Dense(output_dim, activation='softmax'))


We compile the model by specifying the optimizer, loss function, and evaluation metric. Here we use the Adam optimizer with a specified learning rate, the categorical cross-entropy loss function for multi-class classification problems, and accuracy as the evaluation metric.

In [7]:
# compile the model
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=Adam(lr=learning_rate),
              metrics=['accuracy'])




We train the model using the fit method on the model object. We pass in the training data and labels, the batch size for training, the number of epochs to train for, and the validation data and labels to use for evaluating the model's performance on a separate dataset during training.

In [8]:
# train the model
model.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=num_epochs,
          validation_data=(X_test, y_test),
          verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f9f5e7ce790>

We evaluate the trained model on the test set using the evaluate method on the model object. We pass in the test data and labels, and set verbose=0 to suppress the output of the evaluation process. We then print out the test loss and accuracy of the model.

In [9]:
# evaluate the model on the test set
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', loss)
print('Test accuracy:', accuracy)


Test loss: 0.10007339715957642
Test accuracy: 0.972000002861023


Finally, we generate some new random data and make predictions on it using the predict method on the model object. We pass in the new data to make predictions on, and then print out the new data and the corresponding predictions of the model.


In [10]:
# make predictions on new data
new_data = np.random.rand(5, input_dim)
predictions = model.predict(new_data)
print('New data:', new_data)
print('Predictions:', predictions)


New data: [[0.69646919 0.28613933 0.22685145 0.55131477 0.71946897 0.42310646
  0.9807642  0.68482974 0.4809319  0.39211752 0.34317802 0.72904971
  0.43857224 0.0596779  0.39804426 0.73799541 0.18249173 0.17545176
  0.53155137 0.53182759]
 [0.63440096 0.84943179 0.72445532 0.61102351 0.72244338 0.32295891
  0.36178866 0.22826323 0.29371405 0.63097612 0.09210494 0.43370117
  0.43086276 0.4936851  0.42583029 0.31226122 0.42635131 0.89338916
  0.94416002 0.50183668]
 [0.62395295 0.1156184  0.31728548 0.41482621 0.86630916 0.25045537
  0.48303426 0.98555979 0.51948512 0.61289453 0.12062867 0.8263408
  0.60306013 0.54506801 0.34276383 0.30412079 0.41702221 0.68130077
  0.87545684 0.51042234]
 [0.66931378 0.58593655 0.6249035  0.67468905 0.84234244 0.08319499
  0.76368284 0.24366637 0.19422296 0.57245696 0.09571252 0.88532683
  0.62724897 0.72341636 0.01612921 0.59443188 0.55678519 0.15895964
  0.15307052 0.69552953]
 [0.31876643 0.6919703  0.55438325 0.38895057 0.92513249 0.84167
  0.357397

This entire code creates a deep learning binary classification model with dropout regularization, trains it on a synthetic dataset, evaluates its performance on a test set, and makes predictions on new data. This upgraded version includes the following improvements:

Added a Dropout layer after each Dense layer to reduce overfitting.
Used the Adam optimizer with a specified learning rate instead of the default optimizer.
Used the sparse_categorical_crossentropy loss function for multi-class classification problems.
Split the data into training and testing sets using train_test_split from sklearn.
Added a validation_data parameter to the fit method to track the validation loss and accuracy during training.
Evaluated the model on the test data and printed the test loss and accuracy.