# NN-Session-2 Solution: Multi-Class Classification with Keras

This notebook provides complete solutions for the NN-session-2 exercises.
It demonstrates how to build neural networks for multi-class classification using Keras on the 18-apps dataset.

## 1. Setup and Library Imports

In [18]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn

from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam

%matplotlib inline

np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow version: {tf.__version__}")

TensorFlow version: 2.20.0


## 2. Loading and Exploring the 18-Apps Dataset

In [19]:
# SOLUTION: Load the 18-apps dataset
df = pd.read_csv("../sherlock/sherlock_18apps.csv", index_col=0)

print("Dataset Shape:", df.shape)
print("\nFirst few rows:")
print(df.head())
print("\nDataset Info:")
df.info()
print("\nBasic Statistics:")
print(df.describe().T)

Dataset Shape: (273129, 19)

First few rows:
   ApplicationName  CPU_USAGE  UidRxBytes  UidRxPackets  UidTxBytes  \
0            Gmail       0.13           0             0           0   
6         Hangouts       1.65           0             0           0   
11       Messenger       0.21           0             0           0   
18        Geo News       0.03           0             0           0   
19        Facebook       0.20           0             0           0   

    UidTxPackets  cutime  guest_time  importance  lru  num_threads  \
0              0     0.0         0.0         400   15         32.0   
6              0     0.0         0.0         400   15         17.0   
11             0     0.0         0.0         300    0         72.0   
18             0     0.0         0.0         300    0         14.0   
19             0     0.0         0.0         300    0         77.0   

    otherPrivateDirty  priority      rss state  stime  utime         vsize  \
0                8300      20

In [20]:
# SOLUTION: Explore application distribution
app_frequencies = df['ApplicationName'].value_counts()
print(f"Number of unique applications: {len(app_frequencies)}")
print(f"\nApplication Frequencies:")
print(app_frequencies)
print(f"\nTotal records: {len(df)}")

Number of unique applications: 18

Application Frequencies:
ApplicationName
Google App          60001
Chrome              28046
Facebook            20103
Geo News            19991
Messenger           19989
WhatsApp            19985
Photos              17382
ES File Explorer    16667
Gmail               16417
Calendar             8996
Moovit               8365
Waze                 8237
Hangouts             7608
YouTube              5173
Maps                 5159
Skype                4877
Moriarty             3616
Messages             2517
Name: count, dtype: int64

Total records: 273129


## 3. Data Preprocessing

In [21]:
# SOLUTION: Data cleaning and preprocessing
df2 = df.copy()

# Remove irrelevant columns
df2 = df2.drop(['Unnamed: 0'], axis=1, errors='ignore')

# Drop columns with too many missing values (>50% missing)
missing_threshold = 0.5
missing_percent = df2.isna().sum() / len(df2)
cols_to_drop = missing_percent[missing_percent > missing_threshold].index.tolist()

print(f"Columns with >{missing_threshold*100}% missing values: {cols_to_drop}")
df2 = df2.drop(columns=cols_to_drop)

# Remove rows with remaining missing values
df2.dropna(inplace=True)

print(f"\nAfter cleaning - Shape: {df2.shape}")
print(f"Missing values: {df2.isna().sum().sum()}")


Columns with >50.0% missing values: ['cminflt']

After cleaning - Shape: (273077, 18)
Missing values: 0


In [22]:
# SOLUTION: Separate labels from features
labels = df2['ApplicationName']
df_features = df2.drop('ApplicationName', axis=1)

print(f"Features shape: {df_features.shape}")
print(f"Labels shape: {labels.shape}")
print(f"\nFeature columns: {df_features.columns.tolist()}")

# Check data types
print(f"\nData types:")
print(df_features.dtypes)
print(f"\nNon-numeric columns: {df_features.select_dtypes(exclude=[np.number]).columns.tolist()}")


Features shape: (273077, 17)
Labels shape: (273077,)

Feature columns: ['CPU_USAGE', 'UidRxBytes', 'UidRxPackets', 'UidTxBytes', 'UidTxPackets', 'cutime', 'guest_time', 'importance', 'lru', 'num_threads', 'otherPrivateDirty', 'priority', 'rss', 'state', 'stime', 'utime', 'vsize']

Data types:
CPU_USAGE            float64
UidRxBytes             int64
UidRxPackets           int64
UidTxBytes             int64
UidTxPackets           int64
cutime               float64
guest_time           float64
importance             int64
lru                    int64
num_threads          float64
otherPrivateDirty      int64
priority             float64
rss                  float64
state                 object
stime                float64
utime                float64
vsize                float64
dtype: object

Non-numeric columns: ['state']


In [23]:
# SOLUTION: Feature scaling
# Separate numeric and categorical features
numeric_features = df_features.select_dtypes(include=[np.number])
categorical_features = df_features.select_dtypes(exclude=[np.number])

print(f"Numeric features: {numeric_features.shape[1]} columns")
print(f"Categorical features: {categorical_features.shape[1]} columns")
if categorical_features.shape[1] > 0:
    print(f"Categorical columns: {categorical_features.columns.tolist()}")

# Scale only numeric features
scaler = preprocessing.StandardScaler()
scaler.fit(numeric_features)
numeric_features_n = pd.DataFrame(scaler.transform(numeric_features),
                                   columns=numeric_features.columns,
                                   index=numeric_features.index)

# Combine scaled numeric features with categorical features
if categorical_features.shape[1] > 0:
    df_features_n = pd.concat([numeric_features_n, categorical_features], axis=1)
else:
    df_features_n = numeric_features_n

print("\nFeatures normalized successfully!")
print(f"Combined features shape: {df_features_n.shape}")
print(f"\nNormalized features statistics:")
print(df_features_n.describe())


Numeric features: 16 columns
Categorical features: 1 columns
Categorical columns: ['state']

Features normalized successfully!
Combined features shape: (273077, 17)

Normalized features statistics:
          CPU_USAGE    UidRxBytes  UidRxPackets    UidTxBytes  UidTxPackets  \
count  2.730770e+05  2.730770e+05  2.730770e+05  2.730770e+05  2.730770e+05   
mean  -6.661086e-18 -8.326358e-19 -2.081589e-19 -2.497907e-18 -1.040795e-18   
std    1.000002e+00  1.000002e+00  1.000002e+00  1.000002e+00  1.000002e+00   
min   -2.063179e-01 -1.820392e-02 -4.092106e-01 -1.026034e-02 -5.732477e-02   
25%   -1.907310e-01 -1.062313e-02 -1.506781e-02 -8.245287e-03 -1.602201e-02   
50%   -1.657920e-01 -1.062313e-02 -1.506781e-02 -8.245287e-03 -1.602201e-02   
75%   -9.097502e-02 -1.062313e-02 -1.506781e-02 -8.245287e-03 -1.602201e-02   
max    3.436225e+01  2.402136e+02  2.208840e+02  3.301378e+02  2.786950e+02   

             cutime  guest_time    importance           lru   num_threads  \
count  2.7307

In [24]:
# SOLUTION: One-hot encoding for labels
df_labels_onehot = pd.get_dummies(labels)

print(f"One-hot encoded labels shape: {df_labels_onehot.shape}")
print(f"\nFirst 5 rows of one-hot encoded labels:")
print(df_labels_onehot.head())
print(f"\nNumber of classes: {df_labels_onehot.shape[1]}")

One-hot encoded labels shape: (273077, 18)

First 5 rows of one-hot encoded labels:
    Calendar  Chrome  ES File Explorer  Facebook  Geo News  Gmail  Google App  \
0      False   False             False     False     False   True       False   
6      False   False             False     False     False  False       False   
11     False   False             False     False     False  False       False   
18     False   False             False     False      True  False       False   
19     False   False             False      True     False  False       False   

    Hangouts   Maps  Messages  Messenger  Moovit  Moriarty  Photos  Skype  \
0      False  False     False      False   False     False   False  False   
6       True  False     False      False   False     False   False  False   
11     False  False     False       True   False     False   False  False   
18     False  False     False      False   False     False   False  False   
19     False  False     False      False   F

In [25]:
# SOLUTION: One-hot encoding for categorical features
df_features_encoded = pd.get_dummies(df_features_n)

print(f"Features after one-hot encoding: {df_features_encoded.shape}")
print(f"\nFeature columns: {df_features_encoded.columns.tolist()[:10]}...")  # Show first 10

Features after one-hot encoding: (273077, 20)

Feature columns: ['CPU_USAGE', 'UidRxBytes', 'UidRxPackets', 'UidTxBytes', 'UidTxPackets', 'cutime', 'guest_time', 'importance', 'lru', 'num_threads']...


In [26]:
# SOLUTION: Train-test split
train_F, test_F, train_L, test_L = train_test_split(
    df_features_encoded, df_labels_onehot, test_size=0.2, random_state=42
)

print(f"Training set: {train_F.shape}")
print(f"Test set: {test_F.shape}")
print(f"\nNumber of features: {train_F.shape[1]}")
print(f"Number of classes: {train_L.shape[1]}")

Training set: (218461, 20)
Test set: (54616, 20)

Number of features: 20
Number of classes: 18


## 4. Building Neural Network Models

In [27]:
# SOLUTION: Define a multi-layer neural network for multi-class classification
def NN_multiclass_clf(learning_rate=0.001, hidden_units=128):
    """
    Create a multi-layer neural network for multi-class classification.
    
    Parameters:
    -----------
    learning_rate : float
        Learning rate for Adam optimizer
    hidden_units : int
        Number of units in hidden layers
    
    Returns:
    --------
    model : Sequential
        Compiled Keras model
    """
    model = Sequential([
        Dense(hidden_units, activation='relu', input_shape=(train_F.shape[1],)),
        Dropout(0.2),
        Dense(64, activation='relu'),
        Dropout(0.2),
        Dense(32, activation='relu'),
        Dense(train_L.shape[1], activation='softmax')
    ])
    
    adam = Adam(learning_rate=learning_rate)
    model.compile(optimizer=adam,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    
    return model

print("Function NN_multiclass_clf created successfully!")

Function NN_multiclass_clf created successfully!


In [28]:
# SOLUTION: Create and display model architecture
model_nn = NN_multiclass_clf(learning_rate=0.001, hidden_units=128)

print("\nNeural Network Architecture:")
print("="*60)
model_nn.summary()
print("="*60)


Neural Network Architecture:


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)




## 5. Training the Neural Network

In [None]:
# SOLUTION: Train the neural network
print("Training the neural network...")
print("="*60)

history_nn = model_nn.fit(
    train_F, train_L,
    epochs=50, batch_size=32,
    validation_data=(test_F, test_L),
    verbose=1
)

print("="*60)
print("Training completed!")

Training the neural network...
Epoch 1/50


In [None]:
# SOLUTION: Plot training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(history_nn.history['loss'], label='Training Loss', linewidth=2)
axes[0].plot(history_nn.history['val_loss'], label='Validation Loss', linewidth=2)
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Loss', fontsize=12)
axes[0].set_title('Model Loss', fontsize=13, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(history_nn.history['accuracy'], label='Training Accuracy', linewidth=2)
axes[1].plot(history_nn.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Accuracy', fontsize=12)
axes[1].set_title('Model Accuracy', fontsize=13, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Final Training Accuracy: {history_nn.history['accuracy'][-1]:.4f}")
print(f"Final Validation Accuracy: {history_nn.history['val_accuracy'][-1]:.4f}")

## 6. Model Evaluation

In [None]:
# SOLUTION: Evaluate neural network
test_loss, test_accuracy = model_nn.evaluate(test_F, test_L, verbose=0)

print("\n" + "="*60)
print("NEURAL NETWORK EVALUATION")
print("="*60)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")
print(f"Target Accuracy: >99%")
print(f"Status: {'✓ ACHIEVED' if test_accuracy > 0.99 else '✗ NOT YET'}")

## 7. Comparison with Traditional ML Models

In [None]:
# SOLUTION: Train Decision Tree
print("\n" + "="*60)
print("DECISION TREE MODEL")
print("="*60)

model_dtc = DecisionTreeClassifier(max_depth=15, random_state=42)
model_dtc.fit(train_F, train_L.idxmax(axis=1))  # Convert one-hot back to labels

test_L_labels = test_L.idxmax(axis=1)
test_pred_dtc = model_dtc.predict(test_F)
acc_dtc = accuracy_score(test_L_labels, test_pred_dtc)

print(f"Decision Tree Accuracy: {acc_dtc:.4f}")

In [None]:
# SOLUTION: Train Logistic Regression
print("\n" + "="*60)
print("LOGISTIC REGRESSION MODEL")
print("="*60)

model_lr = LogisticRegression(max_iter=1000, random_state=42)
model_lr.fit(train_F, train_L.idxmax(axis=1))

test_pred_lr = model_lr.predict(test_F)
acc_lr = accuracy_score(test_L_labels, test_pred_lr)

print(f"Logistic Regression Accuracy: {acc_lr:.4f}")

## 8. Results Comparison

In [None]:
# SOLUTION: Create comparison table
comparison_df = pd.DataFrame({
    'Model': ['Neural Network', 'Decision Tree', 'Logistic Regression'],
    'Test Accuracy': [test_accuracy, acc_dtc, acc_lr]
})

print("\n" + "="*60)
print("MODEL COMPARISON")
print("="*60)
print(comparison_df.to_string(index=False))
print("="*60)

best_idx = comparison_df['Test Accuracy'].idxmax()
print(f"\nBest Model: {comparison_df.loc[best_idx, 'Model']}")
print(f"Best Accuracy: {comparison_df.loc[best_idx, 'Test Accuracy']:.4f}")

In [None]:
# SOLUTION: Visualize comparison
fig, ax = plt.subplots(figsize=(10, 6))

models = comparison_df['Model']
accuracies = comparison_df['Test Accuracy']
colors = ['steelblue', 'coral', 'lightgreen']

bars = ax.bar(models, accuracies, color=colors, edgecolor='black', linewidth=2, alpha=0.8)

ax.set_ylabel('Test Accuracy', fontsize=12)
ax.set_title('Multi-Class Classification: Model Comparison', fontsize=14, fontweight='bold')
ax.set_ylim([0, 1])
ax.axhline(y=0.99, color='red', linestyle='--', linewidth=2, label='Target (99%)')
ax.grid(axis='y', alpha=0.3)
ax.legend()

for bar, acc in zip(bars, accuracies):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{acc:.4f}', ha='center', va='bottom', fontsize=12, fontweight='bold')

plt.xticks(rotation=15, ha='right')
plt.tight_layout()
plt.show()

## 9. Key Findings and Discussion

### Observations:

1. **Multi-Class Challenge**: Classifying 18 apps is significantly more challenging than binary classification

2. **Neural Network Advantages**:
   - Can learn complex non-linear relationships
   - Better performance on high-dimensional data
   - Flexible architecture for different problem complexities

3. **One-Hot Encoding**: Essential for multi-class classification with neural networks

4. **Dropout Regularization**: Helps prevent overfitting in deep networks

### Why Neural Networks Excel:

- **Hidden Layers**: Enable learning of hierarchical features
- **Non-linear Activations**: ReLU captures complex patterns
- **Softmax Output**: Proper probability distribution for multi-class
- **Categorical Crossentropy**: Appropriate loss for multi-class problems

### Achieving >99% Accuracy:

To reach the target accuracy:
1. Increase model capacity (more hidden units)
2. Train for more epochs
3. Adjust learning rate
4. Use data augmentation
5. Ensemble multiple models

### Cybersecurity Applications:

- **App Classification**: Identify running applications on mobile devices
- **Malware Detection**: Classify apps as benign or malicious
- **Behavioral Analysis**: Detect anomalous application behavior
- **Threat Intelligence**: Build profiles of known malicious applications