🧾 Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection

This repository contains the code and analysis accompanying the research paper
“Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection.”

The research has been published on Zenodo (DOI: https://doi.org/10.5281/zenodo.18408605).

The paper is published on Zenodo (Version 2.0):
https://doi.org/10.5281/zenodo.18408605

Overview

Network intrusion detection is challenged by extreme class imbalance, where rare attack instances are easily overshadowed by benign traffic. This project investigates supervised machine learning approaches for binary intrusion detection, with an explicit focus on detecting minority (attack) classes rather than optimizing overall accuracy.

While the study is inspired by attack behaviors commonly analyzed in honeypot environments, it does not involve live honeypot deployment. Experiments are conducted using the publicly available CICIDS 2017 dataset, generated in a controlled environment that simulates realistic benign and malicious network traffic.

Core research focus:

Linear vs ensemble models under class imbalance
Precision–recall analysis over raw accuracy
Feature importance and interpretability
Transparent discussion of methodological limitations

Goal:

To improve early intrusion detection by identifying deceptive attack vectors through data-driven insights.

⚙️ Setup Instructions

Clone the repo:

git clone https://github.com/harddikk/Catching-the-Rare.git
cd Catching-the-Rare

Install dependencies:

pip install -r requirements.txt

Dataset

This project uses the CICIDS 2017 intrusion detection dataset released by the Canadian Institute for Cybersecurity.

Due to size and licensing constraints, the dataset is not included in this repository.

To reproduce the experiments:

Download CICIDS 2017 from the official source
Place the extracted CSV files in a directory named:

CICIDS2017/

at the project root.

🗂 Project Structure

Click to expand

Catching-the-Rare/
├── honeypot_intrusion_detection.ipynb      # Main Jupyter Notebook
├── requirements.txt                        # Python dependencies
├── LICENSE                                 # CC BY 4.0 License
├── README.md                               # This file
├── CICIDS2017/                             # Dataset folder (not included)
└── results/                                # Folder for selected plots/images
    ├── confusion_matrix_rf.png
    └── feature_importance_rf.png

🚀 Running the Notebook

Open

honeypot_intrusion_detection.ipynb

in Jupyter Notebook or VS Code, and execute all cells sequentially.
Results (plots, accuracy metrics, confusion matrices, etc.) will be displayed inline.

🧩 Techniques Used

Data preprocessing and cleaning
Stratified sampling under class imbalance
Feature scaling with leakage-aware pipelines
Supervised ML models:
- Logistic Regression (linear baseline)
- Random Forest (ensemble)
- XGBoost (ensemble)
Evaluation using imbalance-aware metrics:
- Precision–Recall curves
- ROC curves
- Balanced accuracy

📈 Results

Click to expand: Model Performance Metrics & Plots

Model	Accuracy	Precision (weighted)	Recall (weighted)	F1-Score (weighted)
Logistic Regression	0.9416	0.9489	0.9416	0.9435
Random Forest	0.9986	0.9985	0.9986	0.9985
XGBoost	0.9991	0.9991	0.9991	0.9991

Note: Metrics are calculated on the test split of the dataset.

Selected plots:

Note: Accuracy is reported for completeness but is not emphasized due to severe class imbalance. Confusion matrices for other models and feature importance for XGBoost are available in the notebook.

📚 Citation

If you use or reference this work, please cite:

Tiwari, Hardik. (2026). Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection. Zenodo. https://doi.org/10.5281/zenodo.18408605

📄 License

This project is licensed under the Creative Commons Attribution 4.0 International License.
You’re free to use, modify, and share it — just give proper credit.

👨‍💻 Author

Hardik Tiwari
High school Researcher passionate about AI, cybersecurity, and system-level innovation.
Connect on LinkedIn 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧾 Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection

Overview

⚙️ Setup Instructions

Dataset

🗂 Project Structure

🚀 Running the Notebook

🧩 Techniques Used

📈 Results

📚 Citation

📄 License

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
honeypot_intrusion_detection.ipynb		honeypot_intrusion_detection.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧾 Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection

Overview

⚙️ Setup Instructions

Dataset

🗂 Project Structure

🚀 Running the Notebook

🧩 Techniques Used

📈 Results

📚 Citation

📄 License

👨‍💻 Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages