Skip to content

sanjayBee/LINDEF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LINDEF: Lightweight Real-Time Network Intrusion Detection and Containment Framework

LINDEF is a machine learning-based network intrusion detection project designed to identify suspicious network traffic while staying lightweight enough for local environments, school networks, small businesses, and other organizations with limited cybersecurity resources.

The system uses a two-stage classification pipeline:

  1. A binary detection model classifies traffic as benign or malicious.
  2. An attack classification model identifies the likely attack type when suspicious traffic is detected.

This repository contains the training code, Colab notebook, Streamlit dashboard, benchmark results, documentation, and simulation dashboard demo for the LINDEF project.


Project Overview

Traditional intrusion detection systems can be expensive, difficult to maintain, resource-heavy, or dependent on constantly updated rules and signatures. LINDEF explores whether a lightweight machine learning system can provide strong intrusion detection performance while tracking practical deployment factors such as:

  • Accuracy
  • Precision
  • Recall
  • F1-score
  • False positive rate
  • Average latency
  • RAM usage
  • Model size

The goal is not only to detect attacks accurately, but also to evaluate whether the system could realistically operate in lower-resource environments.


Core Goals

LINDEF was designed around three main goals:

  1. Detect malicious traffic accurately
  2. Classify the type of attack when possible
  3. Remain lightweight enough for practical local use

The project also includes response mapping, where predicted attack types are assigned severity levels and recommended containment actions.


How LINDEF Works

LINDEF uses a two-stage detection process.

1. Binary Detection Model

The binary model classifies each network flow as:

BENIGN
ATTACK

This model acts as the first detection layer. If traffic is classified as benign, the system allows it. If traffic is classified as suspicious, it is passed into the attack classification model.

2. Attack Classification Model

The attack classification model predicts the likely attack type for malicious traffic. Example attack labels include:

neptune
smurf
nmap
portsweep
ipsweep
guess_passwd
httptunnel
warezmaster
apache2

The predicted attack type is then mapped to a severity level and a recommended response.

Example response mapping:

Attack Category Example Attacks Example Response
DoS-style attacks neptune, smurf, apache2 BLOCK_IP
Scanning/probe attacks nmap, portsweep, ipsweep BLOCK_IP
Credential/access attempts guess_passwd, warezmaster THROTTLE_IP
Tunneling or host compromise httptunnel, rootkit ISOLATE_HOST
Normal traffic normal, benign ALLOW

Datasets Used

LINDEF combines multiple public intrusion detection datasets to improve attack coverage and reduce overfitting to a single dataset.

Dataset Description
NSL-KDD Benchmark intrusion detection dataset containing normal traffic and multiple attack categories
UNSW-NB15 Modern intrusion detection dataset containing normal traffic and nine attack categories
CIC-IDS Flow-based intrusion detection dataset containing benign traffic and multiple attack types

Combining these datasets provides broader attack coverage, more training records, and a more diverse feature space.


Training Pipeline

The training pipeline performs the following steps:

  1. Loads NSL-KDD, UNSW-NB15, and CIC-IDS data
  2. Combines all datasets into one training set
  3. Removes invalid, missing, infinite, and duplicate values
  4. Drops leakage-prone columns such as IP addresses, timestamps, and flow identifiers
  5. Creates binary and multi-class labels
  6. Encodes categorical features
  7. Adds a packet-rate feature when flow columns are available
  8. Scales features using StandardScaler
  9. Applies SMOTE to reduce class imbalance
  10. Trains Random Forest models
  11. Evaluates detection and classification performance
  12. Saves model artifacts for dashboard inference

Models

LINDEF trains two Random Forest classifiers:

Model Purpose
Binary Random Forest Model Detects whether traffic is normal or malicious
Attack Classification Random Forest Model Classifies the likely attack type after traffic is flagged as malicious

Generated model artifacts include:

binary_model.pkl
class_model.pkl
scaler.pkl
feature_columns.pkl
feature_medians.npy
labelEncoder.pkl

These artifacts are not included in the repository by default because some model files may be too large for normal GitHub upload. They can be regenerated by running the training notebook.


Results

LINDEF produced strong performance in both binary detection and attack classification.

Model Accuracy Precision Recall F1-score
Binary Classification Model 0.9983 0.9983 0.9983 0.9983
Attack Classification Model 0.9415 0.9416 0.9415 0.9415

Additional performance metrics:

Metric Binary Model Attack Classification Model
False Positive Rate 0.21% 0.37%
Average Latency 54.18 ms 53.79 ms
RAM Usage 18.52 MB 8.32 MB
ROC-AUC 0.97 0.89

The binary model performed especially strongly, while the attack classification model showed strong multi-class performance across a broader set of attack labels.


Benchmark Comparison

LINDEF was compared against common intrusion detection approaches. The LINDEF metrics come from project testing, while the non-LINDEF values are literature-based comparison ranges summarized for context.

Method Detection Task Accuracy False Positive Rate Average Latency RAM Usage
LINDEF Binary Random Forest Normal vs. attack detection 99.83% 0.21% 54.18 ms 18.52 MB
LINDEF Attack Classification Random Forest Attack type classification 94.15% 0.37% 53.79 ms 8.32 MB
Signature-Based IDS Known attack pattern matching 94%–98% <1% <5 ms 50–200 MB
Anomaly-Based IDS Detects deviations from normal behavior 85%–95% 3%–5% 10–50 ms 1–4 GB
Cloud-Based Endpoint Detection Endpoint and device monitoring 96%–99% 2% 50–200 ms Varies
Rule-Based IDS Manually written detection rules 90%–95% 1%–2% 5–15 ms 200–500 MB

LINDEF shows a strong balance of high accuracy, low false positive rate, and low RAM usage. Its latency is higher than some traditional IDS approaches, but the tradeoff is reasonable for the intended use case of lightweight monitoring in smaller or moderate-traffic environments.


Simulation Dashboard Demo

This repository includes a screen-recorded dashboard demo named:

lindef_simulation_dashboard

The demo shows the Streamlit dashboard processing sample flow data generated from the training/testing pipeline. It demonstrates the detection pipeline without requiring a full live packet capture setup.

The demo shows LINDEF:

  • Loading trained model artifacts
  • Processing sample flow-style records
  • Aligning data to the expected training features
  • Scaling input features
  • Classifying traffic as benign or malicious
  • Predicting attack types
  • Assigning severity levels
  • Recommending response actions
  • Logging recent detections
  • Displaying dashboard charts

This demo should be interpreted as a simulation dashboard demo using sample flow data, not as a fully deployed live-network environment.

The dashboard is designed to support a future live workflow using:

TShark packet capture
CICFlowMeter feature extraction
LINDEF model inference
Streamlit visualization

If the demo video is included in this repository, it should be placed in:

demo/lindef_simulation_dashboard.mp4

Repository Structure

LINDEF/
├── README.md
├── requirements.txt
├── .gitignore
│
├── src/
│   └── train_lindef_models.py
│
├── notebooks/
│   └── LINDEF_training_colab.ipynb
│
├── app/
│   └── dashboard.py
│
├── data/
│   └── README.md
│
├── models/
│   └── README.md
│
├── results/
│   ├── README.md
│   ├── benchmark_results.csv
│   ├── benchmark_results.md
│   ├── confusion_matrices.png
│   ├── binary_roc.png
│   └── multiclass_roc.png
│
├── demo/
│   ├── README.md
│   └── lindef_simulation_dashboard.mp4
│
└── docs/
    ├── methodology.md
    ├── limitations.md
    ├── future_work.md
    └── LINDEF_poster.pdf

Installation

Clone the repository:

git clone https://github.com/YOUR-USERNAME/LINDEF.git
cd LINDEF

Install dependencies:

pip install -r requirements.txt

Training the Models

The easiest way to train the models is through the Colab notebook:

notebooks/LINDEF_training_colab.ipynb

The notebook generates:

binary_model.pkl
class_model.pkl
scaler.pkl
feature_columns.pkl
feature_medians.npy
labelEncoder.pkl
simulation_test.csv
confusion_matrices.png
binary_roc.png
multiclass_roc.png

The generated model files should be placed locally in:

models/

Model files are not included by default because some artifacts may be too large for normal GitHub upload.


Running the Dashboard

After generating the model artifacts, place them locally in the models/ folder.

Expected local files:

models/binary_model.pkl
models/class_model.pkl
models/scaler.pkl
models/feature_columns.pkl
models/feature_medians.npy

Then run:

streamlit run app/dashboard.py

The dashboard can run in simulation mode using sample flow data. A future version will expand live capture support using TShark and CICFlowMeter.


Data Availability

The full datasets are not included in this repository because of file size and licensing constraints.

The training pipeline expects the following files:

NSL-KDD(training) - KDDTrain+.csv
NSL-KDD(testing) - KDDTest+.csv
UNSW-NB15 (training) - UNSW_NB15_training-set (1).csv
UNSW-NB15 (testing) - UNSW_NB15_testing-set (1).csv
CIC-IDS CSV files or CIC-IDS zip file

More details are provided in:

data/README.md

Project Poster

A general project poster summarizing the background, methodology, results, benchmark comparison, limitations, and future work is included in:

docs/LINDEF_Project_Poster.pdf

This poster is included as a general LINDEF project summary, not only as a competition-specific poster.


Limitations

LINDEF has several limitations that should be considered before real-world deployment:

  • Public IDS datasets may not fully represent live enterprise traffic.
  • CICFlowMeter does not directly recreate every NSL-KDD or UNSW-NB15 feature from raw packet captures.
  • Live detection accuracy depends on how closely extracted features match the training feature space.
  • Some attack classes have fewer samples than others.
  • SMOTE helps class imbalance but does not fully replace real examples of rare attacks.
  • The current dashboard demo uses sample data rather than a fully deployed live network environment.
  • Benchmark values for non-LINDEF methods are literature-based comparison ranges, not direct same-hardware tests.
  • The model may struggle with zero-day attacks or adversarial traffic designed to evade detection.
  • Additional testing is needed in higher-bandwidth and larger network environments.

Future Work

Future improvements include:

  • Improving live feature extraction with CICFlowMeter or NFStream
  • Testing LINDEF on larger and higher-bandwidth networks
  • Expanding training data with newer intrusion detection datasets
  • Adding ensemble models such as Random Forest plus XGBoost or LightGBM
  • Tuning probability thresholds to reduce false positives
  • Adding explainability tools such as feature importance or SHAP
  • Improving containment actions based on attack type, confidence, and repeated behavior
  • Adding Docker support for easier setup
  • Directly benchmarking LINDEF against Snort, Suricata, Zeek, and endpoint detection tools on the same hardware
  • Improving the Streamlit dashboard for more reliable live monitoring

Tools and Libraries

LINDEF uses:

  • Python
  • pandas
  • NumPy
  • scikit-learn
  • imbalanced-learn
  • Matplotlib
  • Seaborn
  • Joblib
  • psutil
  • Streamlit
  • TShark/Wireshark
  • CICFlowMeter

Project Status

LINDEF is a research prototype and science fair project. It demonstrates that a lightweight machine learning pipeline can achieve strong intrusion detection results while maintaining relatively low memory usage. Additional live-network testing is needed before real-world deployment.


Author

Sanjay Balaji

About

Lightweight: Real-Time Network Intrusion Detection and Containment Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors