GitHub - ishaak15/UNSW-IDS-Feature-Selection

UNSW-IDS-Feature-Selection

In this work, we apply a two-stage anomaly-based network intrusion detection process using the UNSW-NB15 dataset. In stage 1 we make of three different Feature Selection Algorithms to rank the features in the Dataset according to their relation to the resulting labels. We plan to make use of Recursive Feature Elimination, Pearson Correlation and ExtraTreesClassifier which use inbuilt class feature_importances of tree-based classifiers we also plot graphs of feature importances for better visualization among other techniques to select the best dataset features for the purpose of machine learning.

After ranking the features we will be making 2 new Datasets consisting of the top 15 and 30 features based on the average of the ranks obtained by the features from the 3 ranking algorithms and use them alongside the original dataset with all the features.

Then we perform a classification activity in order to identify intrusive traffic from a normal one, using a number of Machine Learning techniques, including k Nearest Neighbour, Logistic Regression Classifier, Decision Tree, Random Forest, Neural Network and a few ensemble classifiers.

The result of the classification activity will make of certain performance metrics which include, accuracy, precision, recall, and f1 measure. With the results of the Machine Learning models, we can understand the necessity of feature selection and also all the data packet attributes that can be monitored by firewalls.

About the files:

UNSW_IDS_analysis.ipynb:

This file houses all initial analysis data of the Dataset. Followed by thorough cleaning, label encoding, and removal of duplicates. All this is followed by 3 Feature selection Algorithms, to pick the top 15, 30, and finally all features from the Dataset. The completed files were then exported to our Google Drive and later moved here in the form of the following files

Training Dataset Files:

  Dataset1.csv
  Dataset2.csv
  Dataset3.csv

Testing Dataset Files:

  Testset1.csv
  Testset2.csv
  Testset3.csv

Training_and_Testing.ipynb:

This file has all the models, and their training and testing results. A total of 8 ML Algorithms were implemented to compare and contrast the effect of feature selection on the different metrics for strengthening Intrusion Detection Systems. The following models were used:

  k Nearest Neighbour
  Decision Tree
  Multinomial Naive Bayes
  Random Forest
  Logistic Regression Classifier 
  Neural Network
  Ensemble Classifier: AdaBoost Classifier using Decision Tree
  Ensemble Classifier: Stack Classifier of KNN, Random Forest and XGBoost

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Copy_of_Training_and_Testing.ipynb		Copy_of_Training_and_Testing.ipynb
Dataset1.csv		Dataset1.csv
Dataset2.csv		Dataset2.csv
Dataset3.csv		Dataset3.csv
LICENSE		LICENSE
Models_Training_and_Testing.ipynb		Models_Training_and_Testing.ipynb
README.md		README.md
Testset1.csv		Testset1.csv
Testset2.csv		Testset2.csv
Testset3.csv		Testset3.csv
Training_and_Testing.ipynb		Training_and_Testing.ipynb
UNSW_IDS_analysis.ipynb		UNSW_IDS_analysis.ipynb
UNSW_NB15_testing-set.csv		UNSW_NB15_testing-set.csv
bayes_10.sav		bayes_10.sav
bayes_20.sav		bayes_20.sav
bayes_all.sav		bayes_all.sav
clfDT_10.sav		clfDT_10.sav
clfDT_20.sav		clfDT_20.sav
clfDT_all.sav		clfDT_all.sav
clfKNN10.sav		clfKNN10.sav
clfKNN20.sav		clfKNN20.sav
clfKNN_all.sav		clfKNN_all.sav

License

ishaak15/UNSW-IDS-Feature-Selection

Folders and files

Latest commit

History

Repository files navigation

UNSW-IDS-Feature-Selection

After ranking the features we will be making 2 new Datasets consisting of the top 15 and 30 features based on the average of the ranks obtained by the features from the 3 ranking algorithms and use them alongside the original dataset with all the features.

Then we perform a classification activity in order to identify intrusive traffic from a normal one, using a number of Machine Learning techniques, including k Nearest Neighbour, Logistic Regression Classifier, Decision Tree, Random Forest, Neural Network and a few ensemble classifiers.

About the files:

UNSW_IDS_analysis.ipynb:

Training Dataset Files:

Testing Dataset Files:

Training_and_Testing.ipynb:

This file has all the models, and their training and testing results. A total of 8 ML Algorithms were implemented to compare and contrast the effect of feature selection on the different metrics for strengthening Intrusion Detection Systems. The following models were used:

About

Resources

License

Stars

Watchers

Forks

Languages