### 🔍 Determining the Optimal Number of Hidden Layers and Neurons in an Artificial Neural Network (ANN)

Selecting the ideal architecture for an ANN can be complex and typically involves iterative experimentation. However, the following best practices and strategies can guide you toward making more informed decisions:

---

#### ✅ Recommended Strategies:

- **Start Simple**: Begin with a minimal architecture (e.g., a single hidden layer) and increase complexity only if performance is inadequate.

- **Hyperparameter Tuning**: Leverage techniques such as Grid Search or Random Search to explore different combinations of layers and neurons systematically.

- **Cross-Validation**: Apply cross-validation to evaluate how well each model generalizes to unseen data, helping to prevent overfitting.

- **Heuristic Guidelines**: Use empirical rules of thumb as a starting point:
  - The number of neurons in a hidden layer should ideally lie between the size of the input layer and the output layer.
  - A commonly accepted practice is to start with **1 or 2 hidden layers** and adjust based on performance metrics.

---


> ⚠️ Note: There's no one-size-fits-all architecture. Always tailor your model based on the problem complexity, dataset size, and available computational resources.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.pipeline import Pipeline
from scikeras.wrappers import KerasClassifier
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.callbacks import EarlyStopping
import pickle




In [2]:
df = pd.read_csv(r'C:\Users\shail\OneDrive\Shailesh\Personal\Personal Learning\GenAI_HuggingFace_LangChain\Projects\ANN_Classification\Churn_Modelling.csv')
df

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,15606229,Obijiaku,771,France,Male,39,5,0.00,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7,0.00,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1


In [3]:
# Drop columns that are not useful for prediction
# 'RowNumber', 'CustomerId', and 'Surname' are identifiers or irrelevant for modeling
df = df.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)

# -------------------------------
# Label Encoding for 'Gender'
# -------------------------------
# Convert categorical gender values ('Male', 'Female') into numeric form (0 or 1)
label_encoder_gender = LabelEncoder()
df['Gender'] = label_encoder_gender.fit_transform(df['Gender'])

# -------------------------------
# One-Hot Encoding for 'Geography'
# -------------------------------
# Convert the 'Geography' column into dummy/indicator variables
# handle_unknown='ignore' ensures the encoder doesn't break on unseen categories
onehot_encoder_geo = OneHotEncoder(handle_unknown='ignore')

# Fit the encoder and transform 'Geography' into a binary matrix
geo_encoded = onehot_encoder_geo.fit_transform(df[['Geography']]).toarray()

# Convert the result into a DataFrame with appropriate column names
geo_encoded_df = pd.DataFrame(geo_encoded, columns=onehot_encoder_geo.get_feature_names_out(['Geography']))

# Merge the original dataframe (without 'Geography') with the new encoded geography dataframe
df = pd.concat([df.drop('Geography', axis=1), geo_encoded_df], axis=1)

# -------------------------------
# Prepare Features and Target Variable
# -------------------------------
# X contains all features except 'Exited'
# y contains the target variable to predict
X = df.drop('Exited', axis=1)
y = df['Exited']

# -------------------------------
# Split Data into Training and Testing Sets
# -------------------------------
# 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# -------------------------------
# Feature Scaling
# -------------------------------
# Scale the input features using StandardScaler to bring them to a common scale
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# -------------------------------
# Save Encoders and Scaler to Disk
# -------------------------------
# This is essential for ensuring consistent transformation during inference

# Save the label encoder for 'Gender'
with open('label_encoder_gender.pkl', 'wb') as file:
    pickle.dump(label_encoder_gender, file)

# Save the one-hot encoder for 'Geography'
with open('onehot_encoder_geo.pkl', 'wb') as file:
    pickle.dump(onehot_encoder_geo, file)

# Save the scaler used for standardization
with open('scaler.pkl', 'wb') as file:
    pickle.dump(scaler, file)

In [4]:
# ----------------------------------------------------------
# Function to create a customizable Keras ANN model for binary classification using KerasClassifier
# ----------------------------------------------------------

def create_model(neurons=32, layers=1):
    """
    Builds and compiles a Sequential ANN model with the specified number of layers and neurons.

    Parameters:
    - neurons (int): Number of neurons in each hidden layer
    - layers (int): Number of hidden layers

    Returns:
    - model (Sequential): Compiled Keras Sequential model
    """
    # Initialize a sequential model
    model = Sequential()

    # Input layer + first hidden layer
    # Input shape must match the number of features in training data
    model.add(Dense(neurons, activation='relu', input_shape=(X_train.shape[1],)))

    # Add additional hidden layers, if any
    for _ in range(layers - 1):
        model.add(Dense(neurons, activation='relu'))

    # Output layer with 1 neuron and sigmoid activation (for binary classification)
    model.add(Dense(1, activation='sigmoid'))

    # Compile the model with Adam optimizer and binary crossentropy loss
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    return model

In [5]:
# ------------------------------------------------------------
# Wrapping the Keras model with scikit-learn's KerasClassifier
# ------------------------------------------------------------
# This allows us to use Keras models just like scikit-learn models,
# enabling compatibility with tools like GridSearchCV, cross_val_score, etc.

# Parameters:
# - build_fn: The model-building function (create_model)
# - layers: Number of hidden layers to be passed to create_model
# - neurons: Number of neurons per hidden layer
# - verbose: Level of logging during training (1 = progress bar)

model = KerasClassifier(
    model=create_model,  # Function to build the model
    layers=1,               # Number of hidden layers
    neurons=32,             # Neurons per hidden layer
    verbose=1               # Verbose output during model training
)

In [6]:
# ------------------------------------------------------------
# Defining the Grid Search Parameter Space for ANN Tuning
# ------------------------------------------------------------
# We specify a dictionary where:
# - Keys are parameter names (as used in the KerasClassifier)
# - Values are lists of options to be explored for each parameter
# GridSearchCV will try all combinations of these values

param_grid = {
    'neurons': [16, 32, 64, 128],   # Number of neurons per hidden layer to try
    'layers': [1, 2],               # Number of hidden layers to test
    'epochs': [50, 100]            # Number of epochs for training in each run
}

In [7]:
# ------------------------------------------------------------
# Perform Grid Search for Hyperparameter Tuning of ANN
# ------------------------------------------------------------

# Create a GridSearchCV object to perform exhaustive search over specified parameter values
grid = GridSearchCV(
    estimator=model,        # The KerasClassifier wrapper for the ANN model
    param_grid=param_grid,  # Dictionary containing hyperparameters and their possible values
    n_jobs=-1,              # Use all available CPU cores for parallel processing
    cv=3,                   # 3-fold cross-validation to evaluate each combination
    verbose=1               # Verbosity level (1 = progress messages shown during training)
)

# Fit the grid search to the training data
# This will train multiple models using all combinations from param_grid
grid_result = grid.fit(X_train, y_train)

# ------------------------------------------------------------
# Display the Best Results
# ------------------------------------------------------------

# Print the best score and the associated hyperparameter values
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Fitting 3 folds for each of 16 candidates, totalling 48 fits


Epoch 1/50


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Best: 0.857625 using {'epochs': 50, 'layers': 1, 'neurons': 32}
