LINDEF: Lightweight Real-Time Network Intrusion Detection and Containment Framework

LINDEF is a machine learning-based network intrusion detection project designed to identify suspicious network traffic while staying lightweight enough for local environments, school networks, small businesses, and other organizations with limited cybersecurity resources.

The system uses a two-stage classification pipeline:

A binary detection model classifies traffic as benign or malicious.
An attack classification model identifies the likely attack type when suspicious traffic is detected.

This repository contains the training code, Colab notebook, Streamlit dashboard, benchmark results, documentation, and simulation dashboard demo for the LINDEF project.

Project Overview

Traditional intrusion detection systems can be expensive, difficult to maintain, resource-heavy, or dependent on constantly updated rules and signatures. LINDEF explores whether a lightweight machine learning system can provide strong intrusion detection performance while tracking practical deployment factors such as:

Accuracy
Precision
Recall
F1-score
False positive rate
Average latency
RAM usage
Model size

The goal is not only to detect attacks accurately, but also to evaluate whether the system could realistically operate in lower-resource environments.

Core Goals

LINDEF was designed around three main goals:

Detect malicious traffic accurately
Classify the type of attack when possible
Remain lightweight enough for practical local use

The project also includes response mapping, where predicted attack types are assigned severity levels and recommended containment actions.

How LINDEF Works

LINDEF uses a two-stage detection process.

1. Binary Detection Model

The binary model classifies each network flow as:

BENIGN
ATTACK

This model acts as the first detection layer. If traffic is classified as benign, the system allows it. If traffic is classified as suspicious, it is passed into the attack classification model.

2. Attack Classification Model

The attack classification model predicts the likely attack type for malicious traffic. Example attack labels include:

neptune
smurf
nmap
portsweep
ipsweep
guess_passwd
httptunnel
warezmaster
apache2

The predicted attack type is then mapped to a severity level and a recommended response.

Example response mapping:

Attack Category	Example Attacks	Example Response
DoS-style attacks	`neptune`, `smurf`, `apache2`	`BLOCK_IP`
Scanning/probe attacks	`nmap`, `portsweep`, `ipsweep`	`BLOCK_IP`
Credential/access attempts	`guess_passwd`, `warezmaster`	`THROTTLE_IP`
Tunneling or host compromise	`httptunnel`, `rootkit`	`ISOLATE_HOST`
Normal traffic	`normal`, `benign`	`ALLOW`

Datasets Used

LINDEF combines multiple public intrusion detection datasets to improve attack coverage and reduce overfitting to a single dataset.

Dataset	Description
NSL-KDD	Benchmark intrusion detection dataset containing normal traffic and multiple attack categories
UNSW-NB15	Modern intrusion detection dataset containing normal traffic and nine attack categories
CIC-IDS	Flow-based intrusion detection dataset containing benign traffic and multiple attack types

Combining these datasets provides broader attack coverage, more training records, and a more diverse feature space.

Training Pipeline

The training pipeline performs the following steps:

Loads NSL-KDD, UNSW-NB15, and CIC-IDS data
Combines all datasets into one training set
Removes invalid, missing, infinite, and duplicate values
Drops leakage-prone columns such as IP addresses, timestamps, and flow identifiers
Creates binary and multi-class labels
Encodes categorical features
Adds a packet-rate feature when flow columns are available
Scales features using StandardScaler
Applies SMOTE to reduce class imbalance
Trains Random Forest models
Evaluates detection and classification performance
Saves model artifacts for dashboard inference

Models

LINDEF trains two Random Forest classifiers:

Model	Purpose
Binary Random Forest Model	Detects whether traffic is normal or malicious
Attack Classification Random Forest Model	Classifies the likely attack type after traffic is flagged as malicious

Generated model artifacts include:

binary_model.pkl
class_model.pkl
scaler.pkl
feature_columns.pkl
feature_medians.npy
labelEncoder.pkl

These artifacts are not included in the repository by default because some model files may be too large for normal GitHub upload. They can be regenerated by running the training notebook.

Results

LINDEF produced strong performance in both binary detection and attack classification.

Model	Accuracy	Precision	Recall	F1-score
Binary Classification Model	0.9983	0.9983	0.9983	0.9983
Attack Classification Model	0.9415	0.9416	0.9415	0.9415

Additional performance metrics:

Metric	Binary Model	Attack Classification Model
False Positive Rate	0.21%	0.37%
Average Latency	54.18 ms	53.79 ms
RAM Usage	18.52 MB	8.32 MB
ROC-AUC	0.97	0.89

The binary model performed especially strongly, while the attack classification model showed strong multi-class performance across a broader set of attack labels.

Benchmark Comparison

LINDEF was compared against common intrusion detection approaches. The LINDEF metrics come from project testing, while the non-LINDEF values are literature-based comparison ranges summarized for context.

Method	Detection Task	Accuracy	False Positive Rate	Average Latency	RAM Usage
LINDEF Binary Random Forest	Normal vs. attack detection	99.83%	0.21%	54.18 ms	18.52 MB
LINDEF Attack Classification Random Forest	Attack type classification	94.15%	0.37%	53.79 ms	8.32 MB
Signature-Based IDS	Known attack pattern matching	94%–98%	<1%	<5 ms	50–200 MB
Anomaly-Based IDS	Detects deviations from normal behavior	85%–95%	3%–5%	10–50 ms	1–4 GB
Cloud-Based Endpoint Detection	Endpoint and device monitoring	96%–99%	2%	50–200 ms	Varies
Rule-Based IDS	Manually written detection rules	90%–95%	1%–2%	5–15 ms	200–500 MB

LINDEF shows a strong balance of high accuracy, low false positive rate, and low RAM usage. Its latency is higher than some traditional IDS approaches, but the tradeoff is reasonable for the intended use case of lightweight monitoring in smaller or moderate-traffic environments.

Simulation Dashboard Demo

This repository includes a screen-recorded dashboard demo named:

lindef_simulation_dashboard

The demo shows the Streamlit dashboard processing sample flow data generated from the training/testing pipeline. It demonstrates the detection pipeline without requiring a full live packet capture setup.

The demo shows LINDEF:

Loading trained model artifacts
Processing sample flow-style records
Aligning data to the expected training features
Scaling input features
Classifying traffic as benign or malicious
Predicting attack types
Assigning severity levels
Recommending response actions
Logging recent detections
Displaying dashboard charts

This demo should be interpreted as a simulation dashboard demo using sample flow data, not as a fully deployed live-network environment.

The dashboard is designed to support a future live workflow using:

TShark packet capture
CICFlowMeter feature extraction
LINDEF model inference
Streamlit visualization

If the demo video is included in this repository, it should be placed in:

demo/lindef_simulation_dashboard.mp4

Repository Structure

LINDEF/
├── README.md
├── requirements.txt
├── .gitignore
│
├── src/
│   └── train_lindef_models.py
│
├── notebooks/
│   └── LINDEF_training_colab.ipynb
│
├── app/
│   └── dashboard.py
│
├── data/
│   └── README.md
│
├── models/
│   └── README.md
│
├── results/
│   ├── README.md
│   ├── benchmark_results.csv
│   ├── benchmark_results.md
│   ├── confusion_matrices.png
│   ├── binary_roc.png
│   └── multiclass_roc.png
│
├── demo/
│   ├── README.md
│   └── lindef_simulation_dashboard.mp4
│
└── docs/
    ├── methodology.md
    ├── limitations.md
    ├── future_work.md
    └── LINDEF_poster.pdf

Installation

Clone the repository:

git clone https://github.com/YOUR-USERNAME/LINDEF.git
cd LINDEF

Install dependencies:

pip install -r requirements.txt

Training the Models

The easiest way to train the models is through the Colab notebook:

notebooks/LINDEF_training_colab.ipynb

The notebook generates:

binary_model.pkl
class_model.pkl
scaler.pkl
feature_columns.pkl
feature_medians.npy
labelEncoder.pkl
simulation_test.csv
confusion_matrices.png
binary_roc.png
multiclass_roc.png

The generated model files should be placed locally in:

models/

Model files are not included by default because some artifacts may be too large for normal GitHub upload.

Running the Dashboard

After generating the model artifacts, place them locally in the models/ folder.

Expected local files:

models/binary_model.pkl
models/class_model.pkl
models/scaler.pkl
models/feature_columns.pkl
models/feature_medians.npy

Then run:

streamlit run app/dashboard.py

The dashboard can run in simulation mode using sample flow data. A future version will expand live capture support using TShark and CICFlowMeter.

Data Availability

The full datasets are not included in this repository because of file size and licensing constraints.

The training pipeline expects the following files:

NSL-KDD(training) - KDDTrain+.csv
NSL-KDD(testing) - KDDTest+.csv
UNSW-NB15 (training) - UNSW_NB15_training-set (1).csv
UNSW-NB15 (testing) - UNSW_NB15_testing-set (1).csv
CIC-IDS CSV files or CIC-IDS zip file

More details are provided in:

data/README.md

Project Poster

A general project poster summarizing the background, methodology, results, benchmark comparison, limitations, and future work is included in:

docs/LINDEF_Project_Poster.pdf

This poster is included as a general LINDEF project summary, not only as a competition-specific poster.

Limitations

LINDEF has several limitations that should be considered before real-world deployment:

Public IDS datasets may not fully represent live enterprise traffic.
CICFlowMeter does not directly recreate every NSL-KDD or UNSW-NB15 feature from raw packet captures.
Live detection accuracy depends on how closely extracted features match the training feature space.
Some attack classes have fewer samples than others.
SMOTE helps class imbalance but does not fully replace real examples of rare attacks.
The current dashboard demo uses sample data rather than a fully deployed live network environment.
Benchmark values for non-LINDEF methods are literature-based comparison ranges, not direct same-hardware tests.
The model may struggle with zero-day attacks or adversarial traffic designed to evade detection.
Additional testing is needed in higher-bandwidth and larger network environments.

Future Work

Future improvements include:

Improving live feature extraction with CICFlowMeter or NFStream
Testing LINDEF on larger and higher-bandwidth networks
Expanding training data with newer intrusion detection datasets
Adding ensemble models such as Random Forest plus XGBoost or LightGBM
Tuning probability thresholds to reduce false positives
Adding explainability tools such as feature importance or SHAP
Improving containment actions based on attack type, confidence, and repeated behavior
Adding Docker support for easier setup
Directly benchmarking LINDEF against Snort, Suricata, Zeek, and endpoint detection tools on the same hardware
Improving the Streamlit dashboard for more reliable live monitoring

Tools and Libraries

LINDEF uses:

Python
pandas
NumPy
scikit-learn
imbalanced-learn
Matplotlib
Seaborn
Joblib
psutil
Streamlit
TShark/Wireshark
CICFlowMeter

Project Status

LINDEF is a research prototype and science fair project. It demonstrates that a lightweight machine learning pipeline can achieve strong intrusion detection results while maintaining relatively low memory usage. Additional live-network testing is needed before real-world deployment.

Author

Sanjay Balaji

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LINDEF: Lightweight Real-Time Network Intrusion Detection and Containment Framework

Project Overview

Core Goals

How LINDEF Works

1. Binary Detection Model

2. Attack Classification Model

Datasets Used

Training Pipeline

Models

Results

Benchmark Comparison

Simulation Dashboard Demo

Repository Structure

Installation

Training the Models

Running the Dashboard

Data Availability

Project Poster

Limitations

Future Work

Tools and Libraries

Project Status

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
app		app
data		data
demo		demo
docs		docs
models		models
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LINDEF: Lightweight Real-Time Network Intrusion Detection and Containment Framework

Project Overview

Core Goals

How LINDEF Works

1. Binary Detection Model

2. Attack Classification Model

Datasets Used

Training Pipeline

Models

Results

Benchmark Comparison

Simulation Dashboard Demo

Repository Structure

Installation

Training the Models

Running the Dashboard

Data Availability

Project Poster

Limitations

Future Work

Tools and Libraries

Project Status

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages