steganalysis-ml is a steganalysis tool for detecting LSB Steganography in JPEG monochromatic images. Different machine learning classifiers were used to classify images in two classes: 'stego' and 'clean'.
Use the function 'stego_or_clean' in 'detect_stego.ipynb' file to analyze an image as an input.
In this project, five different classifiers were trained and tested using 70,000 images:
The features vector is 8 statistical measurements extracted from the histograms of the images:
This is a dataset consisting of 8 features extracted from 70,000 monochromatic still images adapted from the Genome Project Standford's database, that are labeled in two classes: LSB steganography (1) and without LSB Steganography (0).
These features are Kurtosis, Skewness, Standard Deviation, Range, Median, Geometric Mean, Hjorth Mobility, and Hjorth Complexity, all extracted from the histograms of the still images, including random spatial transformations.
The steganographic function embeds five types of payloads, from 0.1 to 0.5.
You can find more details in the following link:
https://ieee-dataport.org/open-access/steganalysis-still-images-lsb-steganography-features-dataset
The five classifiers were trained with 56,000 images and tested with 14,000 images. Here are the accuracies of our classifiers:
Classifier | Train Set | Test Set |
---|---|---|
SVMs | 90.94 % | 90.3 % |
KNN | 94.37 % | 92.94 % |
Random Forest* | 98.4 % | 93.21 % |
Multilayer Perceptron | 92.6 % | 92.11 % |
Naive-Bayes | 89.84 % | 85.1% |
*: As we see, Random Forest Classifier has the best performance on the used dataset.
- This project will be soon extended to a Web Application for LSB Detection using all the five classifiers above.
- Due to the limited dataset, the classifiers don't perform well on high-resolution images as well as steganography techniques other than LSB Steganography. So the model needs a larger and more variant dataset to have wider use cases.
P.S: This is a Specific Steganalysis (LSB Steganography) and not a Blind Steganalysis.