dnsclass: open source, reference implementation of the DNS-Class algorithm in Python.
The classifier takes as input ARFF files generated with the Flowcalc
program (using the
lpi plugins). dnsclass
classifies given network traffic flows basing on their DNS context and outputs a classification
The classification process is divided into several steps, into script files named
step6_predict.py. There are also scripts named
cvN_* that support cross-validation.
For scientific works, please cite the following paper:
Foremski P., Callegari C., Pagano M., "DNS-Class: Immediate classification of IP flows using DNS"
This software package uses libshorttext, which is included in the dnsclass repository, but may be licensed differently.
The purpose of the steps:
step1_reformat.sh: reformat input ARFF files into the target text input format; skip all flows but those of selected protocols; some corrections may be required to match your ARFF files
step2_divide.sh: divide the dataset into training and testing (may be skipped)
step3_convert_train.py: convert the training dataset into the libsvm format (Vector Space Model (VSM))
step4_train.sh: train the model
step5_convert_test.py: as step 3, but for the testing dataset
step6_predict.py: classify the testing dataset
step7_analyze.py: show the confusion matrix and errors made in step 6
Project realized at The Institute of Theoretical and Applied Informatics of the Polish Academy of Sciences, under grant nr 2011/01/N/ST6/07202 of the Polish National Science Centre.
Project website: http://mutrics.iitis.pl/