# Building an ML classifier for malicious IDS traffic

In this part of the workshop we will try to create a classifier to detect malicious traffic in an Industrial Control System (ICS) network



In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from utils import get_training_data

### Dataset

The [data](https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets) we are using comes from Oak Ridge National Labs and was collected by Uttam Adhikari, Shengyi Pan, and Tommy Morris in collaboration with Raymond Borges and Justin Beaver. 

The data was collected on an ICS testbed representing a power system. More [details available here](http://www.ece.uah.edu/~thm0009/icsdatasets/PowerSystem_Dataset_README.pdf)  for those interested.

Briefly, there are monitors 4 Intelligent Electronic Devices (IEDs) that can turn 4 respective breakers on and off. 

The dataset comprises 29 synchrophasor measurements (measurements of the electrical grid) for each phasor measuring unit (PMU) {R1, R2, R3, R4}. Additionally there are 4 features each collected from control panel logs, relay logs (from the 4 PMUs) and snort logs. 

In total there are 128 features


In [2]:
training_data = get_training_data()

# List possible features
features = [
    # R1
    'R1-PA1:VH', 'R1-PM1:V', 'R1-PA2:VH', 'R1-PM2:V', 'R1-PA3:VH', 'R1-PM3:V', 'R1-PA4:IH', 'R1-PM4:I', 
    'R1-PA5:IH', 'R1-PM5:I', 'R1-PA6:IH', 'R1-PM6:I', 'R1-PA7:VH', 'R1-PM7:V', 'R1-PA8:VH', 'R1-PM8:V', 
    'R1-PA9:VH', 'R1-PM9:V', 'R1-PA10:IH', 'R1-PM10:I', 'R1-PA11:IH', 'R1-PM11:I', 'R1-PA12:IH', 
    'R1-PM12:I', 'R1:F', 'R1:DF', 'R1-PA:Z', 'R1-PA:ZH', 'R1:S', 
    
    'R2-PA1:VH', 'R2-PM1:V', 'R2-PA2:VH', 'R2-PM2:V', 'R2-PA3:VH', 'R2-PM3:V', 'R2-PA4:IH', 'R2-PM4:I', 
    'R2-PA5:IH', 'R2-PM5:I', 'R2-PA6:IH', 'R2-PM6:I', 'R2-PA7:VH', 'R2-PM7:V', 'R2-PA8:VH', 'R2-PM8:V', 
    'R2-PA9:VH', 'R2-PM9:V', 'R2-PA10:IH', 'R2-PM10:I', 'R2-PA11:IH', 'R2-PM11:I', 'R2-PA12:IH', 'R2-PM12:I', 
    'R2:F', 'R2:DF', 'R2-PA:Z', 'R2-PA:ZH', 'R2:S', 
    
    'R3-PA1:VH', 'R3-PM1:V', 'R3-PA2:VH', 'R3-PM2:V', 'R3-PA3:VH', 'R3-PM3:V', 'R3-PA4:IH', 'R3-PM4:I', 
    'R3-PA5:IH', 'R3-PM5:I', 'R3-PA6:IH', 'R3-PM6:I', 'R3-PA7:VH', 'R3-PM7:V', 'R3-PA8:VH', 'R3-PM8:V', 
    'R3-PA9:VH', 'R3-PM9:V', 'R3-PA10:IH', 'R3-PM10:I', 'R3-PA11:IH', 'R3-PM11:I', 'R3-PA12:IH', 'R3-PM12:I', 
    'R3:F', 'R3:DF', 'R3-PA:Z', 'R3-PA:ZH', 'R3:S', 
    
    'R4-PA1:VH', 'R4-PM1:V', 'R4-PA2:VH', 'R4-PM2:V', 'R4-PA3:VH', 'R4-PM3:V', 'R4-PA4:IH', 'R4-PM4:I', 
    'R4-PA5:IH', 'R4-PM5:I', 'R4-PA6:IH', 'R4-PM6:I', 'R4-PA7:VH', 'R4-PM7:V', 'R4-PA8:VH', 'R4-PM8:V', 
    'R4-PA9:VH', 'R4-PM9:V', 'R4-PA10:IH', 'R4-PM10:I', 'R4-PA11:IH', 'R4-PM11:I', 'R4-PA12:IH', 
    'R4-PM12:I', 'R4:F', 'R4:DF', 'R4-PA:Z', 'R4-PA:ZH', 'R4:S', 
    
    'control_panel_log1', 'control_panel_log2', 'control_panel_log3', 'control_panel_log4', 
    
    'relay1_log', 'relay2_log', 'relay3_log', 'relay4_log', 
    
    'snort_log1', 'snort_log2', 'snort_log3', 'snort_log4'
]


# todo quick seaborn plot of features


True     30584
False    12397
Name: malicious, dtype: int64
['R1-PA1:VH' 'R1-PM1:V' 'R1-PA2:VH' 'R1-PM2:V' 'R1-PA3:VH' 'R1-PM3:V'
 'R1-PA4:IH' 'R1-PM4:I' 'R1-PA5:IH' 'R1-PM5:I' 'R1-PA6:IH' 'R1-PM6:I'
 'R1-PA7:VH' 'R1-PM7:V' 'R1-PA8:VH' 'R1-PM8:V' 'R1-PA9:VH' 'R1-PM9:V'
 'R1-PA10:IH' 'R1-PM10:I' 'R1-PA11:IH' 'R1-PM11:I' 'R1-PA12:IH'
 'R1-PM12:I' 'R1:F' 'R1:DF' 'R1-PA:Z' 'R1-PA:ZH' 'R1:S' 'R2-PA1:VH'
 'R2-PM1:V' 'R2-PA2:VH' 'R2-PM2:V' 'R2-PA3:VH' 'R2-PM3:V' 'R2-PA4:IH'
 'R2-PM4:I' 'R2-PA5:IH' 'R2-PM5:I' 'R2-PA6:IH' 'R2-PM6:I' 'R2-PA7:VH'
 'R2-PM7:V' 'R2-PA8:VH' 'R2-PM8:V' 'R2-PA9:VH' 'R2-PM9:V' 'R2-PA10:IH'
 'R2-PM10:I' 'R2-PA11:IH' 'R2-PM11:I' 'R2-PA12:IH' 'R2-PM12:I' 'R2:F'
 'R2:DF' 'R2-PA:Z' 'R2-PA:ZH' 'R2:S' 'R3-PA1:VH' 'R3-PM1:V' 'R3-PA2:VH'
 'R3-PM2:V' 'R3-PA3:VH' 'R3-PM3:V' 'R3-PA4:IH' 'R3-PM4:I' 'R3-PA5:IH'
 'R3-PM5:I' 'R3-PA6:IH' 'R3-PM6:I' 'R3-PA7:VH' 'R3-PM7:V' 'R3-PA8:VH'
 'R3-PM8:V' 'R3-PA9:VH' 'R3-PM9:V' 'R3-PA10:IH' 'R3-PM10:I' 'R3-PA11:IH'
 'R3-PM11:I' 'R3-PA12:IH' 'R3-PM

In [3]:
# List models and features

from models import RobustCovariance


anomaly_detection_models = [
    RobustCovariance, 
]

classification_models = [
    
]

In [4]:
snort_features =  ["snort_log1", "snort_log2", "snort_log3", "snort_log4"]

m1 = RobustCovariance(snort_features, save_model="RobustCov_snort")
m1.train(training_data)
m1.test()

-----


TypeError: Singleton array -inf cannot be considered a valid collection.

In [None]:
# todo could try anomaly with no event vs natural event (leave one out...)