This project focuses on building a Network Intrusion Detection System (NIDS) using machine learning techniques, specifically k-means clustering and Random Forest classifiers. By leveraging the KDDCUP99 dataset, the system effectively detects and classifies network anomalies, including Denial of Service (DOS), R2L, U2R, and Probing attacks.
- Binary and Multinomial Classification: Detect normal network activity and classify specific attack types.
- Feature Engineering: Attribute Ratio (AR)-based feature selection for improved performance.
- Machine Learning Models: Combines unsupervised k-means clustering with supervised Random Forest classifiers.
- Comprehensive Evaluation: Precision, recall, F1 score, and ROC curve for model assessment.
- Scalable Deployment: Ready for real-time network monitoring.
The project uses the KDDCUP99 dataset, a widely used benchmark dataset for network intrusion detection. It includes a variety of network traffic examples, both benign and malicious, enabling robust model training and evaluation.
- Data Preprocessing: Cleaning and preparing the dataset for analysis.
- Feature Selection: Using AR-based methods to identify key features.
- Clustering and Classification:
- Apply k-means to group similar data points.
- Train Random Forest classifiers on clustered data for improved accuracy.
- Model Evaluation: Assess performance using detailed metrics and confusion matrices.
- Deployment Plan: Strategies for integrating the system into operational networks.
- Clone the repository:
git clone https://github.com/shayan02/network-anomaly-detection.git cd network-anomaly-detection
The project is implemented in a Jupyter Notebook, consisting of 89 code cells covering the following aspects:
- Data preprocessing and cleaning.
- Feature engineering and selection.
- Implementation of k-means clustering and Random Forest classifiers.
- Detailed evaluation and visualization of results.
- Initial Setup:
import os from collections import defaultdict import pandas as pd import numpy as np import matplotlib.pyplot as plt from google.colab import drive import warnings %matplotlib inline
- Warnings Filter:
warnings.filterwarnings('ignore')
- Modify the configuration file to specify dataset paths and parameters.
- Run the scripts to preprocess data, train models, and evaluate performance.
- Use the trained model to monitor network traffic for anomalies.
The system demonstrates high accuracy in detecting and classifying various types of network attacks. Performance metrics and visualizations are included in the project for detailed analysis.
- Incorporate deep learning models for better handling of complex patterns.
- Adapt the system for real-time data streams.
- Expand testing with more recent and diverse datasets.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please open an issue or submit a pull request to propose changes or report bugs.
- Dataset: KDDCUP99 Dataset
- Techniques: Inspired by advancements in machine learning for cybersecurity.