Skip to content

Enhancing NIDS through innovative noise techniques and data strategies.

Notifications You must be signed in to change notification settings

euangoodbrand/Learning_From_Noisy_NIDS_Data

Repository files navigation

Learning From Noisy NIDS Data

Image Noise

Overview

This repository contains the code and datasets used in the research project focused on improving Network Intrusion Detection Systems (NIDS) through learning from noisy data. The project explores innovative techniques to address label noise, data imbalance, and concept drift in NIDS datasets. The objective is to develop robust models that are capable of performing accurately in adversarial environments typical of modern cybersecurity threats.

Applying Co-teaching on NIDS Dataset

This repository contains a PyTorch implementation of all the techniques described and cited below.

Introduction

Our project focuses on enhancing Network Intrusion Detection Systems (NIDS) by addressing common challenges such as label noise, data imbalance, and concept drift. We explore various synthetic noise techniques, including uniform, feature-dependent, class-dependent, and a newly devised method called MIMICRY, to simulate real-world adversities.

We conduct experiments on three datasets: CIC-IDS2017, real Windows PE, and a synthetic version of BODMAS, to ensure the relevance and applicability of our findings across diverse environments.

To mitigate data imbalances, we investigate augmentation strategies such as downsampling, upsampling, SMOTE, and Adasyn. Additionally, we explore sample reweighting techniques like naive, focal, and class balance to address noisy labels.

Furthermore, we explore novel noise learning techniques to enhance the adaptability and resilience of NIDS in detecting evolving cyber threats. Our project aims to contribute to the improvement of NIDS by rigorously testing and evaluating various methodologies.

Requirements

  • Python 3.6+
  • PyTorch 1.7.0+
  • scikit-learn
  • imbalanced-learn
  • pandas
  • numpy
  • tqdm

Datasets

The dataset used in this project is derived from CICIDS2017, a comprehensive dataset for network intrusion detection. The dataset contains various types of attacks simulated in a testbed to mirror real-world data, alongside benign traffic for a balanced representation.

Windows PE and Synthetic BODMAS can be found at https://github.com/nuwuxian/morse/tree/main or directly through: https://tinyurl.com/skvw9n7j

Usage

To run the techniques on the NIDS datasets, adjust the parameters as needed and execute the command, the following is specific to coteaching plus:

python main.py --dataset cicids --model_type coteaching_plus --noise_type symmetric --noise_rate 0.2 data_augmentation none --seed 1 --num_workers 4 --result_dir results/trial_1/

Customization

  • --lr: Learning rate for the optimizer.
  • --noise_rate: The simulated rate of label noise in the dataset.
  • --num_gradual: Specifies how many epochs for linear drop rate.
  • --num_workers: The number of subprocesses to use for data loading.
  • Additional arguments are available in main.py for further customization.

Citation

If you find this implementation helpful for your research, please consider citing the original papers:

@INPROCEEDINGS{10179453,
  author={Wu, Xian and Guo, Wenbo and Yan, Jia and Coskun, Baris and Xing, Xinyu},
  booktitle={2023 IEEE Symposium on Security and Privacy (SP)}, 
  title={From Grim Reality to Practical Solution: Malware Classification in Real-World Noise}, 
  year={2023},
  volume={},
  number={},
  pages={2602-2619},
  keywords={Training;Text mining;Privacy;Supervised learning;Training data;Semisupervised learning;Malware},
  doi={10.1109/SP46215.2023.10179453}}

Co-Teaching

@misc{han2018coteaching,
      title={Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels}, 
      author={Bo Han and Quanming Yao and Xingrui Yu and Gang Niu and Miao Xu and Weihua Hu and Ivor Tsang and Masashi Sugiyama},
      year={2018},
      eprint={1804.06872},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Co-Teaching +

@misc{yu2019does,
      title={How does Disagreement Help Generalization against Label Corruption?}, 
      author={Xingrui Yu and Bo Han and Jiangchao Yao and Gang Niu and Ivor W. Tsang and Masashi Sugiyama},
      year={2019},
      eprint={1901.04215},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Mentor Mix

@inproceedings{jiang2020beyond,
  title={Beyond synthetic noise: Deep learning on controlled noisy labels},
  author={Jiang, L. and Huang, D. and Liu, M. and Yang, W.},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2020}
}

Bootstrap


@misc{reed2015training,
      title={Training Deep Neural Networks on Noisy Labels with Bootstrapping}, 
      author={Scott Reed and Honglak Lee and Dragomir Anguelov and Christian Szegedy and Dumitru Erhan and Andrew Rabinovich},
      year={2015},
      eprint={1412.6596},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

LRT

@InProceedings{zheng2020error,
  title = 	 {Error-Bounded Correction of Noisy Labels},
  author =       {Zheng, Songzhu and Wu, Pengxiang and Goswami, Aman and Goswami, Mayank and Metaxas, Dimitris and Chen, Chao},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {11447--11457},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/zheng20c/zheng20c.pdf},
  url = 	 {https://proceedings.mlr.press/v119/zheng20c.html}
}

GCE

@misc{zhang2018generalized,
      title={Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels}, 
      author={Zhilu Zhang and Mert R. Sabuncu},
      year={2018},
      eprint={1805.07836},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

ELR

@misc{liu2020earlylearning,
   title={Early-Learning Regularization Prevents Memorization of Noisy Labels}, 
      author={Sheng Liu and Jonathan Niles-Weed and Narges Razavian and Carlos Fernandez-Granda},
      year={2020},
      eprint={2007.00151},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Noise Adaption

@inproceedings{
goldberger2017training,
title={Training deep neural-networks using a noise adaptation layer},
author={Jacob Goldberger and Ehud Ben-Reuven},
booktitle={International Conference on Learning Representations},
year={2017},
url={https://openreview.net/forum?id=H12GRgcxg}
}

LIO

@InProceedings{pmlr-v139-zhang21n,
  title = 	 {Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization},
  author =       {Zhang, Yivan and Niu, Gang and Sugiyama, Masashi},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {12501--12512},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/zhang21n/zhang21n.pdf},
  url = 	 {https://proceedings.mlr.press/v139/zhang21n.html}
}

Additionally, if you utilize this adaptation for your research, please reference this repository and the dataset accordingly.

Acknowledgments

This project is inspired by the work of Xiani Wu et al., on "From Grim Reality to Practical Solution: Malware Classification in Real-World Noise" Our adaptation focuses on the specific challenges posed by the NIDS dom

About

Enhancing NIDS through innovative noise techniques and data strategies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published