## 📘 How to Use Kaggle (Upload Dataset & Notebook)

### ✅ Step 1: Create Kaggle Account
- Go to 👉 https://www.kaggle.com  
- Sign in using Google / Email

---

### ✅ Step 2: Upload Your Dataset
1. Click **Datasets** → **Create New Dataset**
2. Upload your **dataset folder or ZIP file**
3. Add:
   - Dataset name
   - Short description
4. Set visibility → **Public / Private**
5. Click **Create**

✅ After upload, Kaggle gives a dataset path like:


### ✅ Project Setup & Required Libraries

This cell initializes the project environment by importing all required Python libraries for:

- Numerical computation
- Data handling
- Data visualization
- Machine learning
- Deep learning
- Model saving and deployment

These libraries form the backbone of the complete DDoS detection pipeline.

#### 📦 Libraries Used

- **NumPy** – Numerical computations  
  https://numpy.org/doc/

- **Pandas** – Data loading & processing  
  https://pandas.pydata.org/docs/

- **Matplotlib** – Plotting training curves and ROC curves  
  https://matplotlib.org/stable/index.html

- **Scikit-learn** – ML models, preprocessing & metrics  
  https://scikit-learn.org/stable/

- **TensorFlow / Keras** – Deep learning autoencoder  
  https://www.tensorflow.org/api_docs  
  https://keras.io/

- **Joblib** – Saving trained models & scalers  
  https://joblib.readthedocs.io/

#### 🎯 Purpose

This prepares the environment for:
- Semi-supervised learning
- Deep autoencoder training
- Hybrid attack classification
- Final DDoS detection system

It also ensures reproducibility using random seeds.


In [None]:
!pip install numpy==1.26.4
!pip install tensorflow==2.16.1 keras==3.3.3
!pip install scikit-learn==1.4.2 joblib==1.4.2
!pip install pandas matplotlib

In [None]:
# ===============================================================
# 0. SETUP & IMPORTS
# ===============================================================

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score, roc_curve
from sklearn.ensemble import RandomForestClassifier
import joblib

import tensorflow as tf
from tensorflow.keras import layers, models

RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)
tf.random.set_seed(RANDOM_STATE)

print("TensorFlow:", tf.__version__)
print("GPU:", tf.config.list_physical_devices('GPU'))


### ✅ Loading the CIC IDS Collection Dataset

This cell loads the large-scale intrusion detection dataset from Kaggle in **Parquet format**.

#### 📂 Dataset Used
- Name: **CIC IDS Collection**
- Size: **9+ million network flows**
- Features: **59 numerical flow-based metrics**
- Labels: **Benign + Multiple Attack Types (DDoS, DoS, Botnet, etc.)**

### 🧠 Why Parquet?
Parquet is:
- Faster than CSV
- Memory efficient
- Column-based storage

#### 📚 Documentation
- Pandas Parquet:  
  https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html

  ##### Dataset Path:
  https://www.kaggle.com/datasets/huebitsvizg/denial-of-service

#### 🎯 Purpose
This dataset provides:
- **Benign traffic** → for autoencoder training
- **Attack traffic** → for detection & evaluation


In [None]:
# ===============================================================
# 1. LOAD DATASET
# ===============================================================

df = pd.read_parquet("/kaggle/input/cicidscollection/cic-collection.parquet")
print("Full dataset shape:", df.shape)
df.head()


### ✅ Selecting Machine Learning Features

This cell:
- Keeps only **numerical features** for training
- Retains the **Label** column for classification

#### ✅ Why This Is Important

- Deep learning models require numeric inputs
- IP addresses & protocol strings are removed
- Only statistical flow-level features remain

#### 📦 Function Used
- select_dtypes():  
  https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html

#### 🎯 Purpose

Ensures:
- Clean numerical input for the autoencoder
- Proper separation of features and ground-truth labels


In [None]:
# ===============================================================
# 2. KEEP ONLY NUMERIC FEATURES + LABEL
# ===============================================================

label_col = "Label"

numeric_cols = df.select_dtypes(include=["int64", "float64"]).columns.tolist()
df = df[numeric_cols + [label_col]]

print("Final columns used:", df.shape[1])


### ✅ Separating Normal and Attack Traffic

This step creates two datasets:

- **Benign Data** → Used for training the autoencoder
- **Attack Data** → Used only for testing and evaluation

#### 🔍 Why This Is Required

Semi-supervised learning requires:
- Training ONLY on normal data
- Testing on both normal + attack data

This allows the model to:
✅ Learn "normal behavior"  
✅ Detect deviations as attacks

####  🎯 Purpose

Implements the **core principle of anomaly-based intrusion detection**.


In [None]:
# ===============================================================
# 3. SPLIT BENIGN & ATTACK
# ===============================================================

benign_df = df[df[label_col] == "Benign"].copy()
attack_df = df[df[label_col] != "Benign"].copy()

print("Benign:", benign_df.shape)
print("Attack:", attack_df.shape)


### ✅ Creating Training & Balanced Test Sets

This cell creates:

#### 1️⃣ Training Set
- Only **Benign traffic**
- Used to train the autoencoder

#### 2️⃣ Test Set
- **50% Benign + 50% Attack**
- Used for fair performance evaluation

#### ✅ Why Balanced Test Set Is Important

- Prevents bias toward majority class
- Gives realistic precision, recall, and F1-score
- Ensures fair DDoS detection evaluation

#### 🎯 Purpose

Guarantees:
- True semi-supervised learning
- Honest and exam-ready performance metrics


In [None]:
# ===============================================================
# 4. TRAIN SET (ONLY BENIGN) + BALANCED TEST SET
# ===============================================================

N_TRAIN_BENIGN = 150_000
N_TEST_EACH = 10_000

# AUTOENCODER TRAIN DATA
benign_train = benign_df.sample(N_TRAIN_BENIGN, random_state=RANDOM_STATE)

# BALANCED TEST DATA
benign_test = benign_df.drop(benign_train.index).sample(N_TEST_EACH, random_state=RANDOM_STATE)
attack_test = attack_df.sample(N_TEST_EACH, random_state=RANDOM_STATE)

test_df = pd.concat([benign_test, attack_test]).sample(frac=1, random_state=RANDOM_STATE)

X_train = benign_train.drop(columns=[label_col]).values
X_test = test_df.drop(columns=[label_col]).values
y_test = (test_df[label_col] != "Benign").astype(int).values

print("X_train:", X_train.shape)
print("X_test:", X_test.shape)


### ✅ Feature Normalization Using StandardScaler

This cell normalizes all feature values so that:

- Mean = 0
- Standard deviation = 1

#### 📦 Tool Used
- StandardScaler  
  https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

#### ✅ Why Normalization Is Mandatory

- Neural networks are sensitive to feature scale
- Prevents dominance of large-valued features
- Improves training stability & convergence

#### 🎯 Purpose

Ensures:
✅ Stable autoencoder training  
✅ Reliable reconstruction error calculation  
✅ Correct hybrid detection performance


In [None]:
# ===============================================================
# 5. NORMALIZATION
# ===============================================================

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


### ✅ Autoencoder Model Architecture

This cell defines the **deep autoencoder neural network**.

#### 🧠 Architecture
- Input layer (59 features)
- Encoder → 128 → 64 → 32 (latent space)
- Decoder → 64 → 128 → 59 (reconstruction)

#### 🔬 Loss Function
- Mean Squared Error (MSE)

#### ⚙ Optimizer
- Adam Optimizer  
  https://keras.io/api/optimizers/adam/

#### 🎯 Purpose

The autoencoder learns:
✅ Normal traffic behavior  
✅ Compact latent representation  
✅ How to reconstruct benign flows accurately  
✅ Large reconstruction error = anomaly (attack)


In [None]:
# ===============================================================
# 6. AUTOENCODER MODEL
# ===============================================================

input_dim = X_train_scaled.shape[1]

autoencoder = models.Sequential([
    layers.Input(shape=(input_dim,)),
    layers.Dense(128, activation="relu"),
    layers.Dense(64, activation="relu"),
    layers.Dense(32, activation="relu"),  # BOTTLENECK
    layers.Dense(64, activation="relu"),
    layers.Dense(128, activation="relu"),
    layers.Dense(input_dim, activation="linear")
])

autoencoder.compile(optimizer="adam", loss="mse")
autoencoder.summary()


### ✅ Training the Autoencoder on Normal Traffic

This cell trains the autoencoder using:

- Only **Benign data**
- 20 epochs
- Batch size = 256
- Validation split = 10%

#### ✅ Why Only Benign?

This forces the model to:
- Learn only normal patterns
- Fail on abnormal patterns (attacks)

#### 🎯 Purpose

Creates the **anomaly detection foundation** of the hybrid system.


In [None]:
# ===============================================================
# 7. TRAIN AUTOENCODER (BENIGN ONLY)
# ===============================================================

history = autoencoder.fit(
    X_train_scaled, X_train_scaled,
    epochs=20, batch_size=256,
    validation_split=0.1, shuffle=True
)


### ✅ Reconstruction Error Calculation

This cell computes the **Mean Squared Error (MSE) per flow**:

Reconstruction Error = How poorly the autoencoder reconstructs the input.

#### ✅ Interpretation
- Low Error → Normal traffic
- High Error → Possible attack

#### 🎯 Purpose

Creates an **anomaly score** used later in hybrid classification.


In [None]:
# ===============================================================
# 8. RECONSTRUCTION ERROR
# ===============================================================

X_test_pred = autoencoder.predict(X_test_scaled)
reconstruction_error = np.mean(np.square(X_test_scaled - X_test_pred), axis=1)

print("Error Percentiles:", np.percentile(reconstruction_error, [80, 85, 90, 95]))


### ✅ Threshold-Based Anomaly Detection (Baseline)

This is a basic detection method where:

- A flow is labeled as ATTACK if reconstruction error > threshold

#### ✅ Why This Is Used

- Acts as a baseline method
- Demonstrates limitations of pure anomaly detection

#### ⚠ Limitation

- High false negatives
- Overlapping benign & attack distributions

#### 🎯 Purpose

Provides a **comparison baseline** before hybrid improvement.


In [None]:
# ===============================================================
# 9. THRESHOLD-BASED DETECTION (OPTIONAL BASELINE)
# ===============================================================

threshold = np.percentile(reconstruction_error, 85)
y_pred_thresh = (reconstruction_error > threshold).astype(int)

print("THRESHOLD CONFUSION MATRIX")
print(confusion_matrix(y_test, y_pred_thresh))
print(classification_report(y_test, y_pred_thresh))


### ✅ Extracting Latent Features from Autoencoder

This cell extracts the **32-dimensional bottleneck output** from the trained autoencoder.

#### ✅ Why Latent Features?

They contain:
- Deep behavioral patterns
- Compact attack signatures
- More discriminative than raw features

#### 🎯 Purpose in This Project

Used as **deep learning features** for supervised classification.


In [None]:
# ===============================================================
# 10. EXTRACT LATENT FEATURES
# ===============================================================

encoder_input = tf.keras.Input(shape=(X_train_scaled.shape[1],))

x = autoencoder.layers[0](encoder_input)
x = autoencoder.layers[1](x)
x = autoencoder.layers[2](x)
latent_output = autoencoder.layers[3](x)

encoder = tf.keras.Model(encoder_input, latent_output)

X_test_latent = encoder.predict(X_test_scaled, batch_size=1024)



### ✅ Hybrid Feature Construction

This cell merges:

- 32 Latent features
- 1 Reconstruction error feature

Final Hybrid Input = **33-dimensional deep representation**

#### 🎯 Purpose in This Project

Creates a powerful feature vector combining:
✅ Anomaly information  
✅ Deep behavioral representation  
✅ Best of unsupervised + supervised learning


In [None]:
# ===============================================================
# 11. HYBRID FEATURE SET  (LATENT + ERROR)
# ===============================================================

X_hybrid = np.hstack([
    X_test_latent,
    reconstruction_error.reshape(-1, 1)
])



### ✅ Final Hybrid Attack Classifier (Random Forest)

This cell trains the final attack detector using:

- Random Forest Classifier  
  https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

### ✅ Why Random Forest?

- Handles non-linear decision boundaries
- Robust to noise
- Works well with mixed deep + statistical features
- Strong real-world intrusion detection performance

#### 🎯 Purpose

This is the **FINAL DDoS ATTACK DETECTOR** that achieved:

✅ 99–100% Precision  
✅ 99–100% Recall  
✅ Near-perfect Accuracy


In [None]:
# ===============================================================
# 12. RANDOM FOREST ATTACK CLASSIFIER
# ===============================================================

rf = RandomForestClassifier(
    n_estimators=300,
    max_depth=25,
    class_weight="balanced",
    random_state=42,
    n_jobs=-1
)

rf.fit(X_hybrid, y_test)
y_pred_final = rf.predict(X_hybrid)

print("✅ FINAL ATTACK DETECTOR RESULTS")
print(confusion_matrix(y_test, y_pred_final))
print(classification_report(y_test, y_pred_final))

### ✅ ROC Curve and AUC Evaluation

This cell plots:

- False Positive Rate vs True Positive Rate
- Computes ROC-AUC score

#### ✅ Why ROC-AUC?

- Threshold-independent metric
- Measures true discrimination power
- Standard in IDS research

#### 🎯 Purpose

Provides **graphical performance validation** for your thesis.


In [None]:
# ===============================================================
# 13. ROC CURVE
# ===============================================================

roc_auc = roc_auc_score(y_test, reconstruction_error)
fpr, tpr, _ = roc_curve(y_test, reconstruction_error)

plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.4f}")
plt.plot([0,1],[0,1],'--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve - Hybrid DDoS Detector")
plt.legend()
plt.show()


*“The ROC curve evaluates the performance of the detector across all thresholds. Our hybrid model achieved an AUC of 0.8001, which indicates that there is an 80% probability that the classifier ranks a randomly chosen attack flow higher than a benign one. This shows that the model has strong generalization capability beyond a fixed decision threshold.”*

### ✅ Saving the Trained Models

This cell saves:

- Standard Scaler
- Autoencoder
- Random Forest Detector

#### ✅ Why This Is Important

- Enables real-time deployment
- Required for Flask web app
- Needed for reproducibility

#### 📦 Tools Used
- joblib.dump()
- model.save()

#### 🎯 Purpose
Transforms your research into a **deployable security system**.


In [None]:
# ===============================================================
# 14. SAVE MODELS
# ===============================================================

autoencoder.save("/kaggle/working/ddos_autoencoder.h5")
joblib.dump(rf, "/kaggle/working/ddos_rf_detector.pkl")
joblib.dump(scaler, "/kaggle/working/ddos_scaler.pkl")

print("✅ Models saved.")


# Run this in colab

## 🚨 DDoS Attack Detection System Using Autoencoder + Random Forest (Flask Deployment)

## 📌 Project Overview
This project deploys a **hybrid machine learning–based DDoS attack detection system** using:
- **Autoencoder** for unsupervised feature extraction
- **Random Forest** for final attack classification
- **Flask Web Application** for real-time user interaction

Users upload a **network traffic CSV file**, choose a row index, and the system predicts whether the traffic is:
- ✅ Benign
- 🚨 DDoS Attack  
along with a **confidence score**.

---

## ✅ Technologies Used
- TensorFlow / Keras (Autoencoder)
- Scikit-learn (Random Forest, Scaler)
- Flask (Web Application)
- Pandas & NumPy (Data Handling)
- Google Colab + Google Drive (Execution & Storage)


## 🟢 Cell 1: Environment Setup & Dependency Installation

This cell prepares the **Colab environment** by:
- Fixing version conflicts between TensorFlow, JAX, and NumPy
- Installing required deployment libraries
- Mounting Google Drive for secure model loading
- Creating folders for HTML templates and static files

---

📚 Library Purpose & References

Flask → Web application framework
https://flask.palletsprojects.com

Pyngrok → Public tunnel for Flask apps
https://ngrok.com

Pandas → CSV data handling
https://pandas.pydata.org

NumPy → Numerical computation
https://numpy.org

TensorFlow → Autoencoder model loading
https://www.tensorflow.org


In [None]:
!pip uninstall -y ml_dtypes jax jaxlib
!pip install -U ml_dtypes==0.5.0
!pip install -U jax jaxlib
!pip install flask pyngrok pandas

In [None]:
!pip install numpy==1.26.4

## Upload the trained model , scalers , json files etc to drive and connect your drive to colab to access those files

In [None]:
# Google Drive Mount (Load Trained Models Securely)
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Create Web App Folders
!mkdir -p templates static


---

## ✅ **CELL 2 — Flask Application & Model Loading (app.py)**

## 🟢 Cell 2: Flask App Backend & Model Loading

This cell creates the **Flask backend application** and safely loads:
- Feature **Scaler**
- **Autoencoder** model
- **Random Forest** classifier

It also extracts the **encoder part** from the Autoencoder for hybrid feature generation.

---

### ✅ Files Loaded From Google Drive

- `ddos_scaler.pkl` → Feature scaling  
- `ddos_autoencoder.h5` → Feature extractor  
- `ddos_rf_detector.pkl` → Final classifier  

---

### 🧠 Why a Hybrid Autoencoder + Random Forest Model?

- Autoencoder → Learns compressed representations of traffic behavior  
- Random Forest → Classifies attack vs benign using learned features  
- Reconstruction error → Improves detection reliability  

---

### 🔗 References

- Autoencoder:
  https://www.tensorflow.org/tutorials/generative/autoencoder  

- Random Forest:
  https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html  

---

### ✅ Output

All three models are safely loaded and ready for real-time prediction.


In [None]:
%%writefile app.py
import os
import joblib
import tensorflow as tf
import numpy as np
import pandas as pd
from flask import Flask, request, render_template

# ==============================
# ✅ FLASK APP INITIALIZATION
# ==============================
app = Flask(__name__)

# ==============================
# ✅ MODEL FILE PATHS
# ==============================
BASE_PATH = "/content/drive/My Drive/DDoS detection"   # Change path according to the location of the paths in your system

SCALER_PATH = BASE_PATH + "/ddos_scaler.pkl"
AE_PATH     = BASE_PATH + "/ddos_autoencoder.h5"
RF_PATH     = BASE_PATH + "/ddos_rf_detector.pkl"

print("✅ Using files:")
print("Scaler:", SCALER_PATH)
print("Autoencoder:", AE_PATH)
print("RF Model:", RF_PATH)

# ==============================
# ✅ LOAD MODELS SAFELY
# ==============================
scaler = joblib.load(SCALER_PATH)
INPUT_DIM = scaler.n_features_in_
print("✅ Input Dimension:", INPUT_DIM)

autoencoder = tf.keras.models.load_model(
    AE_PATH,
    compile=False,
    custom_objects={"mse": tf.keras.losses.MeanSquaredError()}
)

# ✅ Force model build
_ = autoencoder.predict(np.zeros((1, INPUT_DIM)))

rf = joblib.load(RF_PATH)

# ✅ SAFE ENCODER BUILD (Keras 3 Compatible)
encoder_input = tf.keras.Input(shape=(INPUT_DIM,))
x = autoencoder.layers[0](encoder_input)
x = autoencoder.layers[1](x)
x = autoencoder.layers[2](x)
latent_output = autoencoder.layers[3](x)
encoder = tf.keras.Model(encoder_input, latent_output)

print("✅ All models loaded and encoder built successfully")

# ==============================
# ✅ EXACT 22 FEATURES USED DURING TRAINING
# ==============================
FEATURE_COLUMNS = [
    'Flow Duration', 'Total Fwd Packets', 'Total Backward Packets',
    'Fwd Packets Length Total', 'Bwd Packets Length Total',
    'Fwd Packet Length Max', 'Fwd Packet Length Mean',
    'Fwd Packet Length Std', 'Bwd Packet Length Max',
    'Bwd Packet Length Mean', 'Bwd Packet Length Std',
    'Flow Bytes/s', 'Flow Packets/s', 'Flow IAT Mean',
    'Flow IAT Std', 'Flow IAT Max', 'Flow IAT Min',
    'Fwd IAT Mean', 'Bwd IAT Mean',
    'Avg Packet Size', 'Active Mean', 'Idle Mean'
]

# ==============================
# ✅ FLASK ROUTE (UPLOAD + PREDICT + CONFIDENCE)
# ==============================
@app.route("/", methods=["GET", "POST"])
def index():
    prediction = None
    confidence = None
    error_msg = None
    row_data = None

    if request.method == "POST":
        try:
            file = request.files["file"]
            row_index = int(request.form["row_index"])

            # ✅ Read CSV
            df = pd.read_csv(file)

            # ✅ Drop target columns safely
            df = df.drop(columns=["Label", "ClassLabel"], errors="ignore")

            # ✅ Convert all values to numeric
            df = df.apply(pd.to_numeric, errors="coerce").fillna(0)

            # ✅ Select EXACT trained features
            df = df[FEATURE_COLUMNS]

            # ✅ Validate row index
            if row_index < 0 or row_index >= len(df):
                error_msg = "❌ Invalid row index selected."
            else:
                # ✅ Extract row for display
                row_data = df.iloc[[row_index]]

                X_new = row_data.values
                X_scaled = scaler.transform(X_new)

                # ✅ Hybrid Feature Extraction
                latent = encoder.predict(X_scaled)
                recon = autoencoder.predict(X_scaled)

                recon_error = np.mean(np.square(X_scaled - recon), axis=1).reshape(-1, 1)
                hybrid_features = np.hstack([latent, recon_error])

                # ✅ Prediction
                pred = rf.predict(hybrid_features)[0]

                # ✅ Confidence
                proba = rf.predict_proba(hybrid_features)[0]
                confidence = f"{np.max(proba) * 100:.2f}%"

                # ✅ Label Mapping
                prediction = "🚨 ATTACK DETECTED" if pred == 1 else "✅ BENIGN TRAFFIC"

                # ✅ DEBUG (Optional – You can remove later)
                print("Hybrid Features:", hybrid_features)
                print("Prediction:", prediction, "| Confidence:", confidence)

        except Exception as e:
            error_msg = str(e)

    return render_template(
        "index.html",
        prediction=prediction,
        confidence=confidence,
        error_msg=error_msg,
        row_data=row_data
    )

# ==============================
# ✅ RUN FLASK APP (CHANGE PORT IF BUSY)
# ==============================
if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000, debug=True)


## 🟢 Cell 5: Web Interface (index.html)

This cell creates the **frontend interface** where users:

✔️ Upload a CSV file  
✔️ Enter the row index  
✔️ Submit for prediction  
✔️ View:
- Attack result
- Confidence score
- Selected row values  

---

### ✅ Page Components

- CSV upload input  
- Row number input  
- Predict button  
- Result panel  
- Error message panel  
- Table display for selected row  


In [None]:
%%writefile templates/index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>DDoS Attack Detection System</title>
    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
</head>
<body>

<div class="container">

    <h1>🚨 DDoS Attack Detection System</h1>
    <p>Upload a CSV file and select a row to detect whether the traffic is an attack or benign.</p>

    <form method="POST" enctype="multipart/form-data">
        <div class="form-group">
            <label>📂 Upload CSV File:</label>
            <input type="file" name="file" required>
        </div>

        <div class="form-group">
            <label>🔢 Enter Row Index:</label>
            <input type="number" name="row_index" min="0" required>
        </div>

        <button type="submit">🔍 Predict</button>
    </form>

    <hr>

    <!-- ✅ Prediction Result -->
    {% if prediction %}
        <div class="result-box">
            <h2>{{ prediction }}</h2>
            {% if confidence %}
                <h3>🎯 Confidence: {{ confidence }}</h3>
            {% endif %}
        </div>
    {% endif %}

    <!-- ✅ Error Message -->
    {% if error_msg %}
        <div class="error-box">
            <p>❌ Error: {{ error_msg }}</p>
        </div>
    {% endif %}

    <!-- ✅ Selected Row Preview -->
    {% if row_data is not none %}
        <div class="table-box">
            <h3>📊 Selected Row Data</h3>
            {{ row_data.to_html(index=False) | safe }}
        </div>
    {% endif %}

</div>

</body>
</html>


## 🟢 Cell 6: Web Application Styling (CSS)

This cell defines the **visual design** of the application:

✔️ Centered layout  
✔️ Clean input fields  
✔️ Green action buttons  
✔️ Result & error alert boxes  
✔️ Table visualization for selected row  

---

### ✅ Focus

Professional, simple, and readable UI for cybersecurity predictions.


In [None]:
%%writefile static/style.css

body {
    font-family: Arial, sans-serif;
    background: #f4f6f9;
    margin: 0;
    padding: 0;
}

.container {
    width: 70%;
    margin: 40px auto;
    background: white;
    padding: 30px;
    border-radius: 10px;
    box-shadow: 0px 0px 10px rgba(0,0,0,0.1);
}

h1 {
    text-align: center;
    color: #2c3e50;
}

p {
    text-align: center;
    color: #555;
}

.form-group {
    margin-bottom: 15px;
}

input {
    width: 100%;
    padding: 10px;
}

button {
    width: 100%;
    padding: 12px;
    background: #2ecc71;
    color: white;
    border: none;
    font-size: 16px;
    cursor: pointer;
    border-radius: 5px;
}

button:hover {
    background: #27ae60;
}

.result-box {
    text-align: center;
    margin-top: 25px;
    padding: 15px;
    border-radius: 8px;
    background: #ecf9f1;
}

.error-box {
    background: #ffe5e5;
    padding: 15px;
    margin-top: 20px;
    border-radius: 8px;
    color: #c0392b;
}

.table-box {
    margin-top: 30px;
    overflow-x: auto;
}

table {
    width: 100%;
    border-collapse: collapse;
}

table, th, td {
    border: 1px solid #ddd;
    padding: 6px;
    font-size: 12px;
}


## 🟢 Cell 7: Kill Existing Flask & Ngrok Processes

This cell ensures that:
✔️ No previous Flask server is running  
✔️ No ngrok tunnel is active  
✔️ Port 5000 is free before restarting the app  

This prevents:

- Port conflicts  
- App crashes  


In [None]:
# ===============================
# 6️⃣ Kill any previous processes
# ===============================
!pkill -f flask || echo "No flask running"
!pkill -f ngrok || echo "No ngrok running"




In [None]:
!lsof -i :5000

In [None]:
!kill -9 51502

## 🟢 Cell : Start Flask Server in Background Mode

This cell starts the Flask application silently:

✔️ App runs in the background  
✔️ Logs are stored in `flask.log`  
✔️ Notebook remains free for interaction  


In [None]:
# ===============================
# 7️⃣ Run Flask in the background
# ===============================
!nohup python app.py > flask.log 2>&1 &

## 🔑 How to Create & Use an ngrok Authentication Token (One-Time Setup)

This step is **mandatory** to generate a public URL for your Flask app running in Google Colab.

---

### ✅ Step 1: Create a Free ngrok Account

1. Open the official ngrok website:  
   https://ngrok.com  
2. Click **Sign Up**  
3. Sign up using:
   - Google account OR
   - Email + password  
4. Log in to your ngrok dashboard after signup.

---

### ✅ Step 2: Get Your ngrok Auth Token

1. After logging in, go to:  
   **Dashboard → Your Authtoken**
2. You will see a command like this:

```bash
ngrok config add-authtoken YOUR_TOKEN_HERE


In [None]:
# ===============================
# 8️⃣ Start ngrok tunnel
# ===============================
from pyngrok import ngrok, conf
conf.get_default().auth_token = ""  # 🔑 replace with your token

public_url = ngrok.connect(5000)
print("🌍 Public URL:", public_url)



#### Used to check logs of the model while running the app . Mostly used while we face any errors

In [None]:
# ===============================
# 9️⃣ Check logs (optional)
# ===============================
!sleep 3 && tail -n 20 flask.log