The Entropy Time Series algorithm aims to extract an entropy profile of a specific binary in order to generate a search space where a classifier can distinguish between malware and benign-ware using only the profile information.
This repository is divided into two folder:
- EntCalculator: Creates the Entropy Profiles and also contains a Structural Entropy implementation.
- Classifier: Classifies the profiles in different classes.
To create an entropy profile you need a configuration file, you can find an example in config.ini.
The command to create an Entropy Profile from a specific file is:
./entCalculator -i file.bin
The output, depending on your configuration file, will provide you a lot of information about the original profile, the sub-profile, the wavelet and the final reconstruction. In EnTS we use the last one. If you want to create the space from a big set of files, you only need to put everything in a matrix style. I recommend you to use AWK for that:
awk 'FNR==1{print $1}' reconSeg* > ents.matrix
In the current format, ents.matrix must be class-balanced and it will be divided in 2/3 for train and 1/3 for test. Train and Test must be sorted too, in order to ensure that your test data is not contaminating your train set.
The configuration file has the following options:
[Files] This part is for the output filenames
- original: the original entropy profile.
- subseq: a representative subsequence of the entropy profile.
- wavelet: the wavelet related to the entropy profile.
- reconstruction: the EnTS entropy profile.
- segments: the segments generated by Structural Entropy
- segmentation: the summary of the segments generated by Structural Entropy, which are used by the pairwise distance.
[Values] Parameters for the wavelets
- scaleOr: maximum scale for the wavelet (-1 means the maximum possible)
- scaleMod: Eliminates everything until this scale if scaleUntil is activate.
- subsampleSize: size of the subsequence.
- windowsSize: size of the file chunks for the entropy calculation.
- threshold: threshold value for smoothing the wavelet.
[Operations] Active operations
- threshold: Activates the clean by threshold for the wavelet (1)
- scaleUntil: Activates clean by scale (1)
- scaleShow: Debug option
You can add your own configuration file using the -f option combined with the -i option
Once you have the space I recommend you to set up the names and the classes in different files. Suppose we have all.names and all.classes with the names and the classes for malware and benign-ware. Then, we call the the classifier as follows:
./ents.r ents.matrix ENTS all.classes all.names
This generates an output and two files: ENTS.rf.roc and ENTS.rf.class. These files contains the roc curve and the individual classification of the test set.
As you might notice there are some extra files and options inherit from Structural Entropy, you can generate the structural entropy segments with the -i option and once you have the segments, you can use EntCalculatos to calculate a pairwise comparison of all of them following structural entropy metric. To do this create a folder with the segment* files and run:
./entCalculator -d folder/
This command will print the similarity matrix.
We provide a small Shiny visualization framework for R. This is designed to show the original entropy profile, the chosen subsequence, the wavelet decomposition and the final reconstruction. EnTS uses the whole reconstraction, while Structural Entropy uses only the average information from the selected segments (highlight in red). You can run it in R (from the visualization folder) as follows:
>library(shiny)
>runApp(".")
In server.R you need to set the variable inEnTS to your entCalculator program. You can see a visualization example below:
To visualize a similarity/distance matrix you can use the script imgSim.r. You can call the script as follows:
./imgSim.r matrix example.pdf Example
where matrix is the matrix you want to plot, example.pdf is the output and "Example" is the plot title. You can see an example below:
The script considers that the Rscript interpreter in /usr/local/bin/Rscript, if that is not your case, just change the first line with the proper path.
You can compile the program using make. If you need to modify it, just change the Makefile. In EntCalculator folder run:
make
The classifier works in R, you only need to ensure that you have all the proper libraries installed.