Skip to content

katiie104/IDS-using-MachineLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# Intrusion Detection System (IDS) Using Machine Learning

## 📌 Overview

This project implements an **Intrusion Detection System (IDS)** leveraging **Machine Learning (ML)** to detect
network intrusions. It utilizes the **NSL-KDD** dataset for training/testing models, and integrates the following
components:

- **Zeek** for network traffic analysis
- **Filebeat** for log collection
- **ELK Stack** (Elasticsearch, Logstash, Kibana) for log storage & visualization

The system classifies network traffic as benign or malicious (e.g., port scanning, DoS attacks), and sends alerts
to Elasticsearch for real-time visualization on **Kibana dashboards**.

### 🧪 Tested Environment

- **Kali Linux**: Simulates attacks (port scanning, DoS)
- **Ubuntu**: Hosts IDS, Zeek, Filebeat, and ELK Stack

**Detection performance**: 0.82–0.98 for port scanning and DoS in the test environment.

---

## ✨ Features

- 🔍 **Machine Learning Models**: Logistic Regression, Decision Tree, Random Forest, SVM, KNN, MLP
- 🌐 **Network Traffic Analysis**: Zeek generates detailed network logs
- 🔄 **Log Processing**: Converts Zeek logs into ML-compatible format
- 📊 **Real-time Monitoring**: ELK Stack + Kibana dashboards
- 📥 **Log Collection**: Filebeat forwards logs to Elasticsearch

---

## ⚙️ Prerequisites

Ensure the following are installed:

- Ubuntu (or any Linux distro)
- Python 3.8+
- Zeek (Bro)
- Filebeat
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Docker (optional)
- Kali Linux (optional, for simulating attacks)

---

## 🚀 Installation

### 1. Clone the Repository

```bash
git clone https://github.com/katiie104/IDS-using-MachineLearning.git
cd IDS-using-MachineLearning

2. Install Python Dependencies

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Dependencies (requirements.txt):

  • pandas
  • numpy
  • scikit-learn
  • xgboost
  • shap
  • matplotlib
  • seaborn
  • joblib
  • pyshark

3. Install Zeek

sudo apt-get update
sudo apt-get install zeek

4. Install Filebeat

curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.10.2-amd64.deb
sudo dpkg -i filebeat-8.10.2-amd64.deb

Edit /etc/filebeat/filebeat.yml to collect Zeek logs (e.g., conn.log) and forward to Elasticsearch.

5. Set Up ELK Stack (via Docker)

docker-compose -f elk/docker-compose-elk.yml up -d

6. Download NSL-KDD Dataset

Download the following files and place them in the data/ directory:

  • KDDTrain+.txt
  • KDDTest+.txt

Download Link


📁 Project Structure

IDS-using-MachineLearning/
├── dataset/                # Training data
│   ├── NSL-KDD/            # NSL-KDD dataset
│       ├── KDDTrain+.txt
│       ├── KDDTest+.txt
├── models/                 # Trained models
│   ├── preprocessor.pkl    # Preprocessing pipeline
│   ├── xgb_model.pkl       # XGBoost model
├── src/                    # Source code
│   ├── __init__.py         # Package initialization
│   ├── config.py           # Configuration settings
│   ├── preprocess.py       # Data preprocessing
│   ├── train_model.py      # Model training
│   ├── zeek_feature_extractor.py  # Zeek log feature extraction
│   ├── explain_model.py    # Model explanation (SHAP)
│   ├── stream_monitor.py   # Real-time traffic monitoring
├── logs/                   # Application logs (ignored in git)
│   ├── app.log             # Log file
├── venv/                   # Virtual environment
├── requirements.txt        # Python dependencies
├── main.py                 # Main execution script
└── README.md               # Project documentation
└── License.txt             # License

🛠️ Usage

1. Train and Evaluate ML Models

python main.py training

Notebook includes:

  • Data preprocessing
  • Training various ML models
  • Evaluation (accuracy, precision, recall, F1-score)

2. Configure Zeek

sudo zeekctl deploy

Logs will be stored in: /opt/zeek/logs/current/


3. Process Zeek Logs

python zeek_feature_extractor.py

Extracts and formats features like duration, orig_bytes, etc. to match NSL-KDD structure.


4. Run IDS Pipeline

⚠️ A main.py script (optional) should:

  • Load trained ML model
  • Process Zeek logs in real time
  • Classify traffic and send alerts to Elasticsearch
python main.py

5. Visualize with Kibana

Access Kibana at: http://localhost:5601

  • Create dashboards to view logs and ML alerts
  • Set alerts for detection probabilities > 0.8

6. Simulate Attacks

Port Scanning:

nmap -sS <target_ip>

DoS Attack:

hping3 -S --flood -V <target_ip>

📈 Performance

  • Accuracy: 95–98% on NSL-KDD test set
  • Detection Probabilities: 0.82–0.98 (for DoS and port scanning)
  • Precision/Recall/F1-score: High for common attacks
  • False Positive Rate: Not fully evaluated

🔮 Future Improvements

  • Real-time log streaming using Apache Kafka
  • Deep learning models: LSTM, Autoencoder
  • Train on newer datasets: CICIDS2017, CSE-CIC-IDS2018
  • Harden ELK security (auth & encryption)
  • Test advanced attacks (SQLi, DNS tunneling)

🤝 Contributing

Contributions welcome!

# Fork the repository
# Create your feature branch
git checkout -b feature/your-feature

# Commit and push
git commit -m "Add your feature"
git push origin feature/your-feature

Then open a Pull Request.


📄 License

This project is licensed under the MIT License. See the LICENSE file for more details.


📬 Contact

For issues or suggestions, open an issue on GitHub or email: [retwon2k4@gmail.com]


---

About

The system uses a machine learning model (XGBoost) to detect cyber attacks in real time, based on logging from PyShark (Wireshark API).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages