Learning From Noisy NIDS Data

Overview

This repository contains the code and datasets used in the research project focused on improving Network Intrusion Detection Systems (NIDS) through learning from noisy data. The project explores innovative techniques to address label noise, data imbalance, and concept drift in NIDS datasets. The objective is to develop robust models that are capable of performing accurately in adversarial environments typical of modern cybersecurity threats.

Applying Co-teaching on NIDS Dataset

This repository contains a PyTorch implementation of all the techniques described and cited below.

Introduction

Our project focuses on enhancing Network Intrusion Detection Systems (NIDS) by addressing common challenges such as label noise, data imbalance, and concept drift. We explore various synthetic noise techniques, including uniform, feature-dependent, class-dependent, and a newly devised method called MIMICRY, to simulate real-world adversities.

We conduct experiments on three datasets: CIC-IDS2017, real Windows PE, and a synthetic version of BODMAS, to ensure the relevance and applicability of our findings across diverse environments.

To mitigate data imbalances, we investigate augmentation strategies such as downsampling, upsampling, SMOTE, and Adasyn. Additionally, we explore sample reweighting techniques like naive, focal, and class balance to address noisy labels.

Furthermore, we explore novel noise learning techniques to enhance the adaptability and resilience of NIDS in detecting evolving cyber threats. Our project aims to contribute to the improvement of NIDS by rigorously testing and evaluating various methodologies.

Requirements

Python 3.6+
PyTorch 1.7.0+
scikit-learn
imbalanced-learn
pandas
numpy
tqdm

Datasets

The dataset used in this project is derived from CICIDS2017, a comprehensive dataset for network intrusion detection. The dataset contains various types of attacks simulated in a testbed to mirror real-world data, alongside benign traffic for a balanced representation.

Windows PE and Synthetic BODMAS can be found at https://github.com/nuwuxian/morse/tree/main or directly through: https://tinyurl.com/skvw9n7j

Usage

To run the techniques on the NIDS datasets, adjust the parameters as needed and execute the command, the following is specific to coteaching plus:

python main.py --dataset cicids --model_type coteaching_plus --noise_type symmetric --noise_rate 0.2 data_augmentation none --seed 1 --num_workers 4 --result_dir results/trial_1/

Customization

--lr: Learning rate for the optimizer.
--noise_rate: The simulated rate of label noise in the dataset.
--num_gradual: Specifies how many epochs for linear drop rate.
--num_workers: The number of subprocesses to use for data loading.
Additional arguments are available in main.py for further customization.

Citation

If you find this implementation helpful for your research, please consider citing the original papers:

@INPROCEEDINGS{10179453,
  author={Wu, Xian and Guo, Wenbo and Yan, Jia and Coskun, Baris and Xing, Xinyu},
  booktitle={2023 IEEE Symposium on Security and Privacy (SP)}, 
  title={From Grim Reality to Practical Solution: Malware Classification in Real-World Noise}, 
  year={2023},
  volume={},
  number={},
  pages={2602-2619},
  keywords={Training;Text mining;Privacy;Supervised learning;Training data;Semisupervised learning;Malware},
  doi={10.1109/SP46215.2023.10179453}}

Co-Teaching

link https://arxiv.org/abs/1804.06872

@misc{han2018coteaching,
      title={Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels}, 
      author={Bo Han and Quanming Yao and Xingrui Yu and Gang Niu and Miao Xu and Weihua Hu and Ivor Tsang and Masashi Sugiyama},
      year={2018},
      eprint={1804.06872},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Co-Teaching +

link: https://arxiv.org/abs/1901.04215

@misc{yu2019does,
      title={How does Disagreement Help Generalization against Label Corruption?}, 
      author={Xingrui Yu and Bo Han and Jiangchao Yao and Gang Niu and Ivor W. Tsang and Masashi Sugiyama},
      year={2019},
      eprint={1901.04215},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Mentor Mix

link : https://arxiv.org/pdf/1911.09781.pdfhttps://proceedings.mlr.press/v119/jiang20c/jiang20c.pdf

@inproceedings{jiang2020beyond,
  title={Beyond synthetic noise: Deep learning on controlled noisy labels},
  author={Jiang, L. and Huang, D. and Liu, M. and Yang, W.},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2020}
}

Bootstrap

link: https://arxiv.org/abs/1412.6596


@misc{reed2015training,
      title={Training Deep Neural Networks on Noisy Labels with Bootstrapping}, 
      author={Scott Reed and Honglak Lee and Dragomir Anguelov and Christian Szegedy and Dumitru Erhan and Andrew Rabinovich},
      year={2015},
      eprint={1412.6596},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

LRT

Link : https://arxiv.org/pdf/2011.10077.pdf

@InProceedings{zheng2020error,
  title = 	 {Error-Bounded Correction of Noisy Labels},
  author =       {Zheng, Songzhu and Wu, Pengxiang and Goswami, Aman and Goswami, Mayank and Metaxas, Dimitris and Chen, Chao},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {11447--11457},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/zheng20c/zheng20c.pdf},
  url = 	 {https://proceedings.mlr.press/v119/zheng20c.html}
}

GCE

link: https://arxiv.org/abs/1805.07836

@misc{zhang2018generalized,
      title={Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels}, 
      author={Zhilu Zhang and Mert R. Sabuncu},
      year={2018},
      eprint={1805.07836},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

ELR

Link: https://arxiv.org/abs/2007.00151

@misc{liu2020earlylearning,
   title={Early-Learning Regularization Prevents Memorization of Noisy Labels}, 
      author={Sheng Liu and Jonathan Niles-Weed and Narges Razavian and Carlos Fernandez-Granda},
      year={2020},
      eprint={2007.00151},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Noise Adaption

Link: https://openreview.net/forum?id=H12GRgcxg

@inproceedings{
goldberger2017training,
title={Training deep neural-networks using a noise adaptation layer},
author={Jacob Goldberger and Ehud Ben-Reuven},
booktitle={International Conference on Learning Representations},
year={2017},
url={https://openreview.net/forum?id=H12GRgcxg}
}

LIO

Link: https://proceedings.mlr.press/v139/zhang21n.html

@InProceedings{pmlr-v139-zhang21n,
  title = 	 {Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization},
  author =       {Zhang, Yivan and Niu, Gang and Sugiyama, Masashi},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {12501--12512},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/zhang21n/zhang21n.pdf},
  url = 	 {https://proceedings.mlr.press/v139/zhang21n.html}
}

Additionally, if you utilize this adaptation for your research, please reference this repository and the dataset accordingly.

Acknowledgments

This project is inspired by the work of Xiani Wu et al., on "From Grim Reality to Practical Solution: Malware Classification in Real-World Noise" Our adaptation focuses on the specific challenges posed by the NIDS dom

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
Assets		Assets
Extract		Extract
FeatureNoiseLearning		FeatureNoiseLearning
LabelNoiseLearning		LabelNoiseLearning
latex_output		latex_output
output		output
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
duplicate_remover_label.py		duplicate_remover_label.py
feature_data_scraper.py		feature_data_scraper.py
file_struct.py		file_struct.py
heatmap_gen.py		heatmap_gen.py
label_data_scraper.py		label_data_scraper.py
latex_feature_gen.py		latex_feature_gen.py
latex_label_gen.py		latex_label_gen.py
notes.md		notes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning From Noisy NIDS Data

Overview

Applying Co-teaching on NIDS Dataset

Introduction

Requirements

Datasets

Usage

Customization

Citation

Co-Teaching

link https://arxiv.org/abs/1804.06872

Co-Teaching +

link: https://arxiv.org/abs/1901.04215

Mentor Mix

link : https://arxiv.org/pdf/1911.09781.pdfhttps://proceedings.mlr.press/v119/jiang20c/jiang20c.pdf

Bootstrap

link: https://arxiv.org/abs/1412.6596

LRT

Link : https://arxiv.org/pdf/2011.10077.pdf

GCE

link: https://arxiv.org/abs/1805.07836

ELR

Link: https://arxiv.org/abs/2007.00151

Noise Adaption

Link: https://openreview.net/forum?id=H12GRgcxg

LIO

Link: https://proceedings.mlr.press/v139/zhang21n.html

Acknowledgments

About

Releases

Packages

Languages

euangoodbrand/Learning_From_Noisy_NIDS_Data

Folders and files

Latest commit

History

Repository files navigation

Learning From Noisy NIDS Data

Overview

Applying Co-teaching on NIDS Dataset

Introduction

Requirements

Datasets

Usage

Customization

Citation

Co-Teaching

link https://arxiv.org/abs/1804.06872

Co-Teaching +

link: https://arxiv.org/abs/1901.04215

Mentor Mix

link : https://arxiv.org/pdf/1911.09781.pdfhttps://proceedings.mlr.press/v119/jiang20c/jiang20c.pdf

Bootstrap

link: https://arxiv.org/abs/1412.6596

LRT

Link : https://arxiv.org/pdf/2011.10077.pdf

GCE

link: https://arxiv.org/abs/1805.07836

ELR

Link: https://arxiv.org/abs/2007.00151

Noise Adaption

Link: https://openreview.net/forum?id=H12GRgcxg

LIO

Link: https://proceedings.mlr.press/v139/zhang21n.html

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages