# NN-Session-3 Solution: Advanced Neural Network Tuning

This notebook provides complete solutions for the NN-session-3 exercises.
It demonstrates advanced techniques for tuning neural networks to achieve high accuracy.

## 1. Setup and Library Imports

In [1]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn

from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

%matplotlib inline

np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow version: {tf.__version__}")

TensorFlow version: 2.20.0


  if not hasattr(np, "object"):


## 2. Data Preparation

In [4]:
# SOLUTION: Load and preprocess 18-apps dataset
df = pd.read_csv("../sherlock/sherlock_18apps.csv", index_col=0)

# Data cleaning
df2 = df.copy()
df2 = df2.drop(['Unnamed: 0'], axis=1, errors='ignore')

# Drop columns with too many missing values (>50% missing)
missing_threshold = 0.5
missing_percent = df2.isna().sum() / len(df2)
cols_to_drop = missing_percent[missing_percent > missing_threshold].index.tolist()
print(f"Columns with >{missing_threshold*100}% missing values: {cols_to_drop}")
df2 = df2.drop(columns=cols_to_drop)

# Remove rows with remaining missing values
df2.dropna(inplace=True)

# Separate labels and features
labels = df2['ApplicationName']
df_features = df2.drop('ApplicationName', axis=1)

# Feature scaling - handle numeric and categorical features separately
numeric_features = df_features.select_dtypes(include=[np.number])
categorical_features = df_features.select_dtypes(exclude=[np.number])

print(f"Numeric features: {numeric_features.shape[1]} columns")
print(f"Categorical features: {categorical_features.shape[1]} columns")

# Scale only numeric features
scaler = preprocessing.StandardScaler()
scaler.fit(numeric_features)
numeric_features_n = pd.DataFrame(scaler.transform(numeric_features),
                                   columns=numeric_features.columns,
                                   index=numeric_features.index)

# Combine scaled numeric features with categorical features
if categorical_features.shape[1] > 0:
    df_features_n = pd.concat([numeric_features_n, categorical_features], axis=1)
else:
    df_features_n = numeric_features_n

# One-hot encoding
df_labels_onehot = pd.get_dummies(labels)
df_features_encoded = pd.get_dummies(df_features_n)

# Train-test split
train_F, test_F, train_L, test_L = train_test_split(
    df_features_encoded, df_labels_onehot, test_size=0.2, random_state=42
)

print(f"\nTraining set: {train_F.shape}")
print(f"Test set: {test_F.shape}")
print(f"Number of classes: {train_L.shape[1]}")


Columns with >50.0% missing values: ['cminflt']
Numeric features: 16 columns
Categorical features: 1 columns

Training set: (218461, 20)
Test set: (54616, 20)
Number of classes: 18


## 3. Hyperparameter Tuning Experiments

In [5]:
# SOLUTION: Define function to create models with different architectures
def create_model(hidden_layers=[128, 64, 32], dropout_rate=0.2, learning_rate=0.001):
    """
    Create a neural network with specified architecture.
    
    Parameters:
    -----------
    hidden_layers : list
        Number of units in each hidden layer
    dropout_rate : float
        Dropout rate for regularization
    learning_rate : float
        Learning rate for optimizer
    """
    model = Sequential()
    
    # Input layer and first hidden layer
    model.add(Dense(hidden_layers[0], activation='relu', input_shape=(train_F.shape[1],)))
    model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    
    # Additional hidden layers
    for units in hidden_layers[1:]:
        model.add(Dense(units, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout_rate))
    
    # Output layer
    model.add(Dense(train_L.shape[1], activation='softmax'))
    
    # Compile
    optimizer = Adam(learning_rate=learning_rate)
    model.compile(optimizer=optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    
    return model

print("Model creation function defined!")

Model creation function defined!


### Experiment 1: Baseline Model

In [None]:
# SOLUTION: Baseline model
print("\n" + "="*60)
print("EXPERIMENT 1: BASELINE MODEL")
print("="*60)

model_baseline = create_model(hidden_layers=[128, 64, 32], dropout_rate=0.2, learning_rate=0.001)

print("\nModel Architecture:")
model_baseline.summary()

history_baseline = model_baseline.fit(
    train_F, train_L,
    epochs=50, batch_size=32,
    validation_data=(test_F, test_L),
    verbose=0
)

test_loss, test_acc = model_baseline.evaluate(test_F, test_L, verbose=0)
print(f"\nBaseline Model Accuracy: {test_acc:.4f}")


EXPERIMENT 1: BASELINE MODEL

Model Architecture:


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### Experiment 2: Deeper Network

In [None]:
# SOLUTION: Deeper network
print("\n" + "="*60)
print("EXPERIMENT 2: DEEPER NETWORK")
print("="*60)

model_deep = create_model(hidden_layers=[256, 128, 64, 32], dropout_rate=0.3, learning_rate=0.001)

history_deep = model_deep.fit(
    train_F, train_L,
    epochs=50, batch_size=32,
    validation_data=(test_F, test_L),
    verbose=0
)

test_loss, test_acc_deep = model_deep.evaluate(test_F, test_L, verbose=0)
print(f"Deeper Network Accuracy: {test_acc_deep:.4f}")

### Experiment 3: Wider Network

In [None]:
# SOLUTION: Wider network
print("\n" + "="*60)
print("EXPERIMENT 3: WIDER NETWORK")
print("="*60)

model_wide = create_model(hidden_layers=[512, 256, 128], dropout_rate=0.2, learning_rate=0.001)

history_wide = model_wide.fit(
    train_F, train_L,
    epochs=50, batch_size=32,
    validation_data=(test_F, test_L),
    verbose=0
)

test_loss, test_acc_wide = model_wide.evaluate(test_F, test_L, verbose=0)
print(f"Wider Network Accuracy: {test_acc_wide:.4f}")

### Experiment 4: Different Learning Rate

In [None]:
# SOLUTION: Different learning rate
print("\n" + "="*60)
print("EXPERIMENT 4: DIFFERENT LEARNING RATE")
print("="*60)

model_lr = create_model(hidden_layers=[256, 128, 64], dropout_rate=0.2, learning_rate=0.0005)

history_lr = model_lr.fit(
    train_F, train_L,
    epochs=50, batch_size=32,
    validation_data=(test_F, test_L),
    verbose=0
)

test_loss, test_acc_lr = model_lr.evaluate(test_F, test_L, verbose=0)
print(f"Lower Learning Rate Accuracy: {test_acc_lr:.4f}")

### Experiment 5: With Early Stopping

In [None]:
# SOLUTION: Model with early stopping
print("\n" + "="*60)
print("EXPERIMENT 5: WITH EARLY STOPPING")
print("="*60)

model_es = create_model(hidden_layers=[256, 128, 64, 32], dropout_rate=0.2, learning_rate=0.001)

early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=0.00001)

history_es = model_es.fit(
    train_F, train_L,
    epochs=100, batch_size=32,
    validation_data=(test_F, test_L),
    callbacks=[early_stop, reduce_lr],
    verbose=0
)

test_loss, test_acc_es = model_es.evaluate(test_F, test_L, verbose=0)
print(f"With Early Stopping Accuracy: {test_acc_es:.4f}")
print(f"Epochs trained: {len(history_es.history['loss'])}")

## 4. Results Comparison

In [None]:
# SOLUTION: Create comprehensive comparison
results = pd.DataFrame({
    'Experiment': [
        'Baseline (128-64-32)',
        'Deeper (256-128-64-32)',
        'Wider (512-256-128)',
        'Lower LR (0.0005)',
        'Early Stopping'
    ],
    'Architecture': [
        '[128, 64, 32]',
        '[256, 128, 64, 32]',
        '[512, 256, 128]',
        '[256, 128, 64]',
        '[256, 128, 64, 32]'
    ],
    'Learning Rate': [0.001, 0.001, 0.001, 0.0005, 0.001],
    'Test Accuracy': [test_acc, test_acc_deep, test_acc_wide, test_acc_lr, test_acc_es]
})

print("\n" + "="*80)
print("COMPREHENSIVE RESULTS COMPARISON")
print("="*80)
print(results.to_string(index=False))
print("="*80)

best_idx = results['Test Accuracy'].idxmax()
print(f"\nBest Model: {results.loc[best_idx, 'Experiment']}")
print(f"Best Accuracy: {results.loc[best_idx, 'Test Accuracy']:.4f}")

In [None]:
# SOLUTION: Visualize training history comparison
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.flatten()

histories = [
    ('Baseline', history_baseline),
    ('Deeper', history_deep),
    ('Wider', history_wide),
    ('Lower LR', history_lr),
    ('Early Stopping', history_es)
]

for idx, (name, history) in enumerate(histories):
    ax = axes[idx]
    ax.plot(history.history['accuracy'], label='Training', linewidth=2)
    ax.plot(history.history['val_accuracy'], label='Validation', linewidth=2)
    ax.set_title(f'{name}', fontweight='bold')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Accuracy')
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.set_ylim([0, 1])

# Remove extra subplot
fig.delaxes(axes[5])

plt.tight_layout()
plt.show()

In [None]:
# SOLUTION: Bar chart comparison
fig, ax = plt.subplots(figsize=(12, 6))

colors = ['steelblue', 'coral', 'lightgreen', 'gold', 'plum']
bars = ax.bar(range(len(results)), results['Test Accuracy'], color=colors, edgecolor='black', linewidth=2, alpha=0.8)

ax.set_ylabel('Test Accuracy', fontsize=12)
ax.set_title('Neural Network Tuning: Accuracy Comparison', fontsize=14, fontweight='bold')
ax.set_xticks(range(len(results)))
ax.set_xticklabels(results['Experiment'], rotation=15, ha='right')
ax.set_ylim([0, 1])
ax.axhline(y=0.99, color='red', linestyle='--', linewidth=2, label='Target (99%)')
ax.grid(axis='y', alpha=0.3)
ax.legend()

for bar, acc in zip(bars, results['Test Accuracy']):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{acc:.4f}', ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

## 5. Hyperparameter Analysis

In [None]:
# SOLUTION: Analyze impact of different hyperparameters
print("\n" + "="*60)
print("HYPERPARAMETER IMPACT ANALYSIS")
print("="*60)

print("\n1. Network Depth Impact:")
print(f"   Baseline (3 layers): {test_acc:.4f}")
print(f"   Deeper (4 layers): {test_acc_deep:.4f}")
print(f"   Impact: {(test_acc_deep - test_acc)*100:+.2f}%")

print("\n2. Network Width Impact:")
print(f"   Baseline (128-64-32): {test_acc:.4f}")
print(f"   Wider (512-256-128): {test_acc_wide:.4f}")
print(f"   Impact: {(test_acc_wide - test_acc)*100:+.2f}%")

print("\n3. Learning Rate Impact:")
print(f"   Standard (0.001): {test_acc:.4f}")
print(f"   Lower (0.0005): {test_acc_lr:.4f}")
print(f"   Impact: {(test_acc_lr - test_acc)*100:+.2f}%")

print("\n4. Early Stopping Impact:")
print(f"   Without: {test_acc:.4f}")
print(f"   With: {test_acc_es:.4f}")
print(f"   Impact: {(test_acc_es - test_acc)*100:+.2f}%")

## 6. Key Findings and Recommendations

### Observations:

1. **Network Architecture**:
   - Deeper networks can capture more complex patterns
   - Wider networks provide more capacity per layer
   - Balance is key - too deep/wide can lead to overfitting

2. **Learning Rate**:
   - Lower learning rates allow finer convergence
   - May require more epochs to converge
   - Adaptive methods (Adam) help with convergence

3. **Regularization**:
   - Dropout prevents overfitting
   - Batch normalization stabilizes training
   - Early stopping prevents unnecessary training

4. **Callbacks**:
   - Early stopping saves training time
   - Learning rate reduction helps fine-tuning
   - Combination of callbacks improves results

### Recommendations for Achieving >99% Accuracy:

1. **Increase Model Capacity**: Use deeper/wider networks
2. **Fine-tune Learning Rate**: Use adaptive learning rate schedules
3. **Add Regularization**: Use dropout and batch normalization
4. **Use Callbacks**: Implement early stopping and LR reduction
5. **Data Augmentation**: Generate synthetic training data
6. **Ensemble Methods**: Combine multiple models
7. **Feature Engineering**: Create more discriminative features

### Cybersecurity Applications:

- **High-Accuracy Classification**: Critical for security systems
- **Malware Detection**: Requires >99% accuracy to minimize false negatives
- **Behavioral Analysis**: Detect anomalous application behavior
- **Real-time Monitoring**: Efficient models for mobile deployment