Skip to content

Colorado-Mesa-University-Cybersecurity/DeepLearning-AndroidMalware

Repository files navigation

DeepLearning Android Malware

Detecting and Classifying Android Malware using Deep Learning Techniques

Jupyter Notebooks

Creating a large CSV file with all the features and categories
Creating multiple data files for Benign vs. a malware category
Selecting features for a dataset (In progress)
Visualizing some features in the data

Since our models haven't been performing well, I decided to complete a Sanity Check notebook, demonstrating all of the techniques we're employing here and trying to find any failures in our methods.

  • One issue I found was stratifying the data using train_test_split from sklearn. As it turns out, this function does not stratify by default and I've fixed this in the Adware vs Benign notebook and the rest. Despite this, performance is still low.

X vs Benign

Experiments

Below are the experiments we want to run for the paper. Each experiment should be

Metrics

The metrics we want to collect for all of these experiments are the Accuracy, Loss, True Positive Rate (TPR), True Negative Rate (TNR), False Positive Rate (FPR), and False Negative Rate (FNR). Depending on the platform the experiments are ran (fastai, Keras), there will be different ways of acquiring the data. Notes of how to do so will be detailed below.

Keras for Binary Classification

# Initializing the metrics objects
accuracy = BinaryAccuracy()
tp = TruePositives()
tn = TrueNegatives()
fp = FalsePositives()
fn = FalseNegatives()
metrics = [accuracy, tp, tn, tp, fn]

# Adding to the model's compile method
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=metrics)

FastAI for Binary Classification

# Set up the metrics we want to collect. I wanted TP,TN,FP,FN but that wasn't available. Recall and precision 
# are still extremely helpful for evaluating the model
metrics = [accuracy, Recall(), Precision()]

# Create the learner


### To-Do List
 - [x] Create a line graph demonstrating how **loss** changes as we change the learning rate and a scatter plot demonstrating the differences in performance (**accuracy** or **loss**) for each different optimizer.
ScreenShot is in /ScreeShots/sgd.png
- Loss was improved by using mean_squared_error loss function. Using other optimizers like Adam, Optimizers, Adadelta; the accurace was the same i.e. ~53%. 
- Various learning rate like 0.1, 0.01 were applied, however doing some research, 0.001 was found to be the optimal one. 
- Using SGD optimizers and mean_squared_loss function on keras-tensorflow, and keras-theano produced the similar result (Accuracy: ~53% and loss function value: ~24).
  • Binary Classification Problem: Is a sample malicious or benign traffic? (Completed before the previous To-Do, so I may have to rerun these with a new optimizer or learning rate)
    • fastai
    • Keras-TensorFlow
    • Keras-Theano
  • Multi-Classification Problem #1: Can we differentiate between benign, adware, scareware, etc... traffic?
    • fastai
    • Keras-TensorFlow
    • Keras-Theano
  • Multi-Classification Problem #2: Can we differentiate between the different species of each type of malicious traffic? (Ex. Gooligan vs ... vs Shuanet for Adware)
    • Adware
      • fastai
      • Keras-TensorFlow
      • Keras-Theano
    • Ransomware
      • fastai
      • Keras-TensorFlow
      • Keras-Theano
    • Scareware
      • fastai
      • Keras-TensorFlow
      • Keras-Theano
    • SMSmalware
      • fastai
      • Keras-TensorFlow
      • Keras-Theano

Experimental Results

Place acquired graphs and data here (or point us to the file with the data)

Introduction

While detecting android malware with Deep Learning and other machine learning techniques seems to be a solved academic problem (Z. Yuan et. al, X. Su et. al, Abdelmonim Naway and Yuancheng LI, Karbab et. al), employing both static and dynamic analysis on the malware, there is little published work using machine learning techniques on network traffic to specifically detect android malware. The CICMalAnal2017 dataset is one of the only datasets containing real, up-to-date network traffic from malicious and benign android applications. The goal of this project is to employ deep learning techniques, in conjunction with the CICMalAnal2017 dataset, to accurately identify the intent of a given application through collected network traffic data.

Dataset Summary

How to Download

The dataset used for this project is described by the Canadian Institute for Cybersecurity at the University of New Brunswick here. The link at the bottom of the description of their site can be used to download the dataset. Additionally, the provided dl-data.sh script may be used (however the link used needs occasional updating. The script works as of May 2020).

Since this is a significant dataset (roughly 300 MB zipped), the download takes a while. Go enjoy a coffee while you wait.

Data Cleanup

As described in Arash et. al, only nine attributes of the provided 80+ are used to achieve high-accuracy in simpler machine learning algorithms. For computational and temporal simplicity, only these nine attributes are kept for the analysis conducted here. Below are listed the nine attributes from the paper matched to the attribute name in the dataset:

  1. Maximum flow packet length (Flow IAT Max)
  2. Minimum flow packet length (Flow IAT Min)
  3. Backward variance data bytes (Bwd Packet Length Std)*
  4. Flow FIN F 17 (FIN Flag Count)
  5. Flow forward bytes (Fwd IAT Total)
  6. Flow backward bytes (Bwd IAT Total)
  7. Maximum Idle (Idle Max)
  8. Initial window forward (Init_Win_bytes_forward)
  9. Minimum segment size forward (min_seg_size_forward)

* (Can't find the variance, so using this instead since it is related)

Since the analysis is focused on determining the type of traffic (malicious/benign) given a sample, attributes such as IP and Port numbers are dropped from the dataset. There is an obvious use of these in ideas such as black/whitelists, however this is not the contribution of the project. Nan values are also dropped if present.

Binary and Multi-Classification Files

Dataset Composition

The composition of the dataset is shown in the table below:

Type Number of Instances
Benign 1,210,210
Malware 982,212
Adware 424,147

Broken down further, we have a clearer idea of the makeup.

Type Number of Instances
Benign 1,210,210
Adware 424,147
Scareware 401,165
Ransomware 348,943
SMSmalware 229,275

Additionally, there is a listing out of both the types of malware and species of each individual malware below.

Malware Type Species Number of Instances
ADWARE DOWGIN 39,682
EWIND 43,374
FEIWO 56,632
GOOLIGAN 93,772
KEMOGE 38,771
KOODOUS 32,547
MOBIDASH 31,034
SELFMITE 13,029
SHUANET 39,271
YOUMI 36,035
RANSOMWARE CHARGER 39,551
JISUT 25,672
KOLER 44,555
LOCKERPIN 25,307
PLETOR 4,715
PORNDROID 46,082
RANSOMBO 39,859
SIMPLOCKER 36,340
SVPENG 54,161
WANNALOCKER 32,701
SCAREWARE ANDROIDDEFENDER 56,440
ANDROIDSPY 25,414
AVFORANDROID 42,448
AVPASS 40,776
FAKEAPP 34,676
FAKEAPPAL 44,563
FAKEAV 40,089
FAKEJOBOFFER 30,683
FAKETAOBAO 33,299
PENETHO 21,631
VIRUSSHIELD 23,716
(Unlabeled) 7,430
SMSMALWARE BEANBOT 12,371
BIIGE 33,678
FAKEINST 15,026
FAKENOTIFY 22,197
FAKEMART 6,401
JIFAKE 5,993
MAZARBOT 6,065
NANDROBOX 44,517
PLANKTON 39,765
SMSSNIFFER 33,618
ZSONE 9,644
MALWARE Unlabeled 2,828

Experiments

  • Compare optimizer and learning rate performance
  • Binary Classification Problem: Is a sample malicious or benign traffic?
    • fastai
    • Keras-TensorFlow
    • Keras-Theano
  • Multi-Classification Problem #1: Can we differentiate between benign, adware, scareware, etc... traffic?
    • fastai
    • Keras-TensorFlow
    • Keras-Theano
  • Multi-Classification Problem #2: Can we differentiate between the different species of each type of malicious traffic? (Ex. Gooligan vs ... vs Shuanet for Adware)
    • Adware
      • fastai
      • Keras-TensorFlow
      • Keras-Theano
    • Ransomware
      • fastai
      • Keras-TensorFlow
      • Keras-Theano
    • Scareware
      • fastai
      • Keras-TensorFlow
      • Keras-Theano
    • SMSmalware
      • fastai
      • Keras-TensorFlow
      • Keras-Theano

Deep Learning Frameworks

  • perfomance results using various deep learning frameworks are compared

Fastai-Pytorch

Keras

Results

Adware

  • classification of adware types
Framework Accuracy (%)
Fastai-Pytorch 42.72
Keras-Tensorflow *
Keras-Theano *

Ransomware

Framework Accuracy (%)
Fastai-Pytorch *
Keras-Tensorflow *
Keras-Theano *

Scareware

Framework Accuracy (%)
Fastai-Pytorch *
Keras-Tensorflow *
Keras-Theano *

SMSmalware

Framework Accuracy (%)
Fastai-Pytorch *
Keras-Tensorflow *
Keras-Theano *

References

About

Detecting and Classifying Android Malware using Deep Learning Techniques

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published