### Importing Required Libraries  
- `pandas` and `numpy`: For handling and processing data.  
- `train_test_split`: Splits the dataset into training and testing sets.  
- `LabelEncoder`: Converts categorical labels into numerical values.  
- `MinMaxScaler`: Normalizes features to a range between 0 and 1.  
- `Sequential`: Defines a neural network model.  
- `Dense` and `Dropout`: Layers for building the neural network.  
- `to_categorical`: Converts labels into a one-hot encoded format for classification.  


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.utils import to_categorical

### Loading and Preprocessing the Data  
- Loads the dataset **`selected_features_training.csv`**.  
- **Separates features (`X`)** and the **target variable (`y`)**.  
- `X` contains the selected features, and `y` contains the class labels for classification.  


In [3]:
df = pd.read_csv('selected_features_training.csv')

# Step 3: Preprocess the Data

# Separate features and target variable
X = df.drop('label', axis=1)
y = df['label']

### Encoding the Target Variable  
- Uses `LabelEncoder` to **convert categorical labels into numerical values**.  
- Applies `to_categorical` to **one-hot encode** the labels for multi-class classification.  


In [4]:
# Encode the target variable
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)
y = to_categorical(y)

### Splitting the Data into Training and Testing Sets  
- **Splits the dataset** into **80% training** and **20% testing** using `train_test_split`.  
- Ensures reproducibility with `random_state=42`.  
- `y_train.shape[1]` gives the **number of unique classes** in the target variable.  


In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
y_train.shape[1]

23

### Defining the Neural Network Model  
- Uses a **Sequential model** to build a feedforward neural network.  
- **First layer**:  
  - `Dense(1024)`: 1024 neurons with **ReLU activation**.  
  - `input_dim=X_train.shape[1]`: Matches the number of input features.  
  - `Dropout(0.5)`: Reduces overfitting by randomly dropping 50% of neurons.  
- **Second layer**:  
  - `Dense(64)`: 64 neurons with **ReLU activation**.  
  - `Dropout(0.5)`: Again, prevents overfitting.  
- **Output layer**:  
  - `Dense(y_train.shape[1], activation='softmax')`: Outputs probabilities for multi-class classification using **Softmax activation**.  


In [6]:
# Define the model
model = Sequential()
model.add(Dense(1024, input_dim=X_train.shape[1], activation='relu'))
model.add(Dropout(0.5))  # Introduce probabilistic elements
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(y_train.shape[1], activation='softmax'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### Compiling the Model  
- **Optimizer**: `adam` (Adaptive Moment Estimation) for efficient learning.  
- **Loss Function**: `categorical_crossentropy`, suitable for multi-class classification.  
- **Metrics**: `accuracy` to track the model’s performance.  


In [7]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

### Training the Model  
- **Trains the neural network** using the training data (`X_train`, `y_train`).  
- **Epochs**: `50` (number of times the model sees the dataset).  
- **Batch Size**: `64` (number of samples processed before updating the model).  
- **Validation Split**: `0.1` (10% of training data is used for validation).  
- Helps monitor performance and avoid overfitting.  


In [8]:
model.fit(X_train, y_train, epochs=50, batch_size=64, validation_split=0.1)

Epoch 1/50
[1m1418/1418[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 5ms/step - accuracy: 0.9048 - loss: 0.4531 - val_accuracy: 0.9638 - val_loss: 0.1142
Epoch 2/50
[1m1418/1418[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 5ms/step - accuracy: 0.9625 - loss: 0.1285 - val_accuracy: 0.9735 - val_loss: 0.0813
Epoch 3/50
[1m1418/1418[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 4ms/step - accuracy: 0.9679 - loss: 0.1050 - val_accuracy: 0.9721 - val_loss: 0.0736
Epoch 4/50
[1m1418/1418[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - accuracy: 0.9712 - loss: 0.0899 - val_accuracy: 0.9767 - val_loss: 0.0649
Epoch 5/50
[1m1418/1418[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 5ms/step - accuracy: 0.9735 - loss: 0.0835 - val_accuracy: 0.9798 - val_loss: 0.0758
Epoch 6/50
[1m1418/1418[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 4ms/step - accuracy: 0.9748 - loss: 0.0778 - val_accuracy: 0.9803 - val_loss: 0.0588
Epoch 7/50
[1m

<keras.src.callbacks.history.History at 0x161e5848ad0>

### Evaluating the Model  
- Tests the trained model on **unseen test data (`X_test`, `y_test`)**.  
- `model.evaluate()` returns:  
  - **Loss**: Measures the model's error.  
  - **Accuracy**: The percentage of correct predictions.  
- Prints the **final test accuracy** to assess model performance.  


In [9]:
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Test Accuracy: {accuracy:.4f}')

[1m788/788[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.9856 - loss: 0.0536
Test Accuracy: 0.9858


### Saving the Trained Model  
- Saves the trained neural network model to a file **`pnn_model.h5`**.  
- Allows reloading the model later without retraining.  
- Useful for deployment or further evaluation.

In [10]:
model.save(filepath="pnn_model.h5")



### Making Predictions and Evaluating Performance  

1. **Make Predictions**  
   - Converts probability scores into class labels using `np.argmax()`.  

2. **Convert True Labels**  
   - Transforms one-hot encoded labels into class labels.  

3. **Import Metrics**  
   - Uses accuracy, precision, recall, F1-score, and classification report.  
   - Adds **False Positive Rate (FPR)** and **Specificity (True Negative Rate)**.  

4. **Calculate Performance Metrics**  
   - **Accuracy**: Overall correctness of predictions.  
   - **Precision**: Percentage of predicted positives that are actually correct.  
   - **Recall**: Percentage of actual positives correctly identified.  
   - **F1 Score**: Balances precision and recall.  
   - **FPR (False Positive Rate)**: Measures incorrect positive predictions.  
   - **Specificity**: Measures the model's ability to correctly identify negatives.  

5. **Print Classification Report**  
   - Displays metrics for each class, including precision, recall, and F1-score.  

⚠️ **Note**: FPR and Specificity are only computed for **binary classification**.  


In [11]:
# Make predictions on the test data
y_pred_prob = model.predict(X_test)

# Convert predictions to class labels
y_pred = np.argmax(y_pred_prob, axis=1)

# Convert true labels from one-hot encoding to class labels
y_true = np.argmax(y_test, axis=1)

# Import necessary metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f'Accuracy: {accuracy:.4f}')

# Calculate precision
precision = precision_score(y_true, y_pred, average='weighted')
print(f'Precision: {precision:.4f}')

# Calculate recall
recall = recall_score(y_true, y_pred, average='weighted')
print(f'Recall: {recall:.4f}')

# Calculate F1 score
f1 = f1_score(y_true, y_pred, average='weighted')
print(f'F1 Score: {f1:.4f}')

# Calculate confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)

# Extract TN, FP, FN, TP for binary classification (assuming a 2-class problem)
if conf_matrix.shape == (2, 2):
    TN, FP, FN, TP = conf_matrix.ravel()

    # Calculate False Positive Rate (FPR)
    FPR = FP / (FP + TN)
    print(f'False Positive Rate (FPR): {FPR:.4f}')

    # Calculate Specificity (True Negative Rate)
    specificity = TN / (TN + FP)
    print(f'Specificity: {specificity:.4f}')
else:
    print("FPR and Specificity are not computed for multi-class classification.")

# Optional: Generate a classification report
print('Classification Report:')
print(classification_report(y_true, y_pred))


[1m788/788[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step
Accuracy: 0.9858
Precision: 0.9850
Recall: 0.9858
F1 Score: 0.9853
FPR and Specificity are not computed for multi-class classification.
Classification Report:
              precision    recall  f1-score   support

           0       0.76      0.72      0.74       185
           1       0.00      0.00      0.00         9
           3       1.00      0.82      0.90        11
           4       0.50      1.00      0.67         1
           5       0.95      0.97      0.96       733
           6       0.17      0.33      0.22         3
           9       1.00      1.00      1.00      8228
          10       0.96      0.96      0.96       313
          11       0.98      0.99      0.99     13422
          12       0.00      0.00      0.00         1
          13       0.00      0.00      0.00         1
          14       1.00      0.93      0.96        43
          15       1.00      0.98      0.99       573
         

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
