<a href="https://colab.research.google.com/github/jvitorc/TCC/blob/main/UNSW_NB15.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### *João Vitor Cardoso <2020>*

# **Explorando o uso de redes neurais para detecção de intrusão com a base UNSW-NB15**

Usando a base [UNSW-NB15 - CSV Files](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/) para detecção de intrusão com redes neurais.
  

## Baixando Base da Dados

#### Baixando CSV de ataques DOS

In [None]:
!wget https://cloudstor.aarnet.edu.au/plus/s/2DhnLGDdEECo4ys/download?path=%2FUNSW-NB15%20-%20CSV%20Files&files=UNSW-NB15_1.csv

In [None]:
!unzip -x 'download?path=%2FUNSW-NB15 - CSV Files'

## Importando Bibliotecas

Instalando nova versão do tensorflow

In [None]:
!pip uninstall tensorflow

In [None]:
!pip install tensorflow==2.0.0

Importando bibliotecas

In [None]:
import os
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import  keras
import matplotlib.pyplot as plt
import re


### Verificando GPU

In [None]:
tf.test.gpu_device_name()

In [None]:
!nvidia-smi

## Explorando os Dados

Carregando dados

In [None]:
PATH = '/content/UNSW-NB15 - CSV Files/'

In [None]:
info = pd.read_csv(PATH+'NUSW-NB15_features.csv', encoding='latin-1')

In [None]:
info

Unnamed: 0,No.,Name,Type,Description
0,1,srcip,nominal,Source IP address
1,2,sport,integer,Source port number
2,3,dstip,nominal,Destination IP address
3,4,dsport,integer,Destination port number
4,5,proto,nominal,Transaction protocol
5,6,state,nominal,Indicates to the state and its dependent proto...
6,7,dur,Float,Record total duration
7,8,sbytes,Integer,Source to destination transaction bytes
8,9,dbytes,Integer,Destination to source transaction bytes
9,10,sttl,Integer,Source to destination time to live value


In [None]:
column_names = info['Name']
column_names

0                srcip
1                sport
2                dstip
3               dsport
4                proto
5                state
6                  dur
7               sbytes
8               dbytes
9                 sttl
10                dttl
11               sloss
12               dloss
13             service
14               Sload
15               Dload
16               Spkts
17               Dpkts
18                swin
19                dwin
20               stcpb
21               dtcpb
22             smeansz
23             dmeansz
24         trans_depth
25         res_bdy_len
26                Sjit
27                Djit
28               Stime
29               Ltime
30             Sintpkt
31             Dintpkt
32              tcprtt
33              synack
34              ackdat
35     is_sm_ips_ports
36        ct_state_ttl
37    ct_flw_http_mthd
38        is_ftp_login
39          ct_ftp_cmd
40          ct_srv_src
41          ct_srv_dst
42          ct_dst_ltm
43         

In [None]:
data1 = pd.read_csv(PATH+'UNSW-NB15_1.csv', names=column_names)
data2 = pd.read_csv(PATH+'UNSW-NB15_2.csv', names=column_names)
data3 = pd.read_csv(PATH+'UNSW-NB15_3.csv', names=column_names)
data4 = pd.read_csv(PATH+'UNSW-NB15_4.csv', names=column_names)

### Unindo dados

In [None]:
data = data1.append(data2, ignore_index=True)
data = data.append(data3, ignore_index=True)
data = data.append(data4, ignore_index=True)

In [None]:
data.head()

Unnamed: 0,srcip,sport,dstip,dsport,proto,state,dur,sbytes,dbytes,sttl,dttl,sloss,dloss,service,Sload,Dload,Spkts,Dpkts,swin,dwin,stcpb,dtcpb,smeansz,dmeansz,trans_depth,res_bdy_len,Sjit,Djit,Stime,Ltime,Sintpkt,Dintpkt,tcprtt,synack,ackdat,is_sm_ips_ports,ct_state_ttl,ct_flw_http_mthd,is_ftp_login,ct_ftp_cmd,ct_srv_src,ct_srv_dst,ct_dst_ltm,ct_src_ ltm,ct_src_dport_ltm,ct_dst_sport_ltm,ct_dst_src_ltm,attack_cat,Label
0,59.166.0.0,1390,149.171.126.6,53,udp,CON,0.001055,132,164,31,29,0,0,dns,500473.9375,621800.9375,2,2,0,0,0,0,66,82,0,0,0.0,0.0,1421927414,1421927414,0.017,0.013,0.0,0.0,0.0,0,0,0.0,0.0,0,3,7,1,3,1,1,1,,0
1,59.166.0.0,33661,149.171.126.9,1024,udp,CON,0.036133,528,304,31,29,0,0,-,87676.08594,50480.17188,4,4,0,0,0,0,132,76,0,0,9.89101,10.682733,1421927414,1421927414,7.005,7.564333,0.0,0.0,0.0,0,0,0.0,0.0,0,2,4,2,3,1,1,2,,0
2,59.166.0.6,1464,149.171.126.7,53,udp,CON,0.001119,146,178,31,29,0,0,dns,521894.5313,636282.375,2,2,0,0,0,0,73,89,0,0,0.0,0.0,1421927414,1421927414,0.017,0.013,0.0,0.0,0.0,0,0,0.0,0.0,0,12,8,1,2,2,1,1,,0
3,59.166.0.5,3593,149.171.126.5,53,udp,CON,0.001209,132,164,31,29,0,0,dns,436724.5625,542597.1875,2,2,0,0,0,0,66,82,0,0,0.0,0.0,1421927414,1421927414,0.043,0.014,0.0,0.0,0.0,0,0,0.0,0.0,0,6,9,1,1,1,1,1,,0
4,59.166.0.3,49664,149.171.126.0,53,udp,CON,0.001169,146,178,31,29,0,0,dns,499572.25,609067.5625,2,2,0,0,0,0,73,89,0,0,0.0,0.0,1421927414,1421927414,0.005,0.003,0.0,0.0,0.0,0,0,0.0,0.0,0,7,9,1,1,1,1,1,,0


### Tamanho dos dados

In [None]:
data.shape

(2540047, 49)

### Verificando tipos de ataques (Porcentagem)

In [None]:
data.Label.value_counts()/sum(data.Label.value_counts())*100

0    87.351297
1    12.648703
Name: Label, dtype: float64

In [None]:
data.attack_cat.value_counts()/sum(data.attack_cat.value_counts())*100

Generic             67.068908
Exploits            13.858499
 Fuzzers             5.974484
DoS                  5.089905
 Reconnaissance      3.805990
 Fuzzers             1.572134
Analysis             0.833222
Backdoor             0.558697
Reconnaissance       0.547492
 Shellcode           0.400893
Backdoors            0.166209
Shellcode            0.069409
Worms                0.054158
Name: attack_cat, dtype: float64

## Pré-processamento dos dados





### Listando colunas com tipo object

In [None]:
data.dtypes[data.dtypes == 'O']

srcip         object
sport         object
dstip         object
dsport        object
proto         object
state         object
service       object
ct_ftp_cmd    object
attack_cat    object
dtype: object

### Filtrando dados defeituosos

In [None]:
def isdecimal(x):
  return  str(x).isdecimal()

In [None]:
data  = data[data['sport'].apply(isdecimal)]

In [None]:
data  = data[data['dsport'].apply(isdecimal)]

### Ajustando tipos

In [None]:
data['sport'] = pd.to_numeric(data['sport'])

In [None]:
data['dsport'] = pd.to_numeric(data['dsport'])

Separando dados categoricos

In [None]:
proto = pd.Categorical(data['proto'])

In [None]:
data['proto'] = proto.codes

In [None]:
state = pd.Categorical(data['state'])

In [None]:
data['state'] = state.codes

In [None]:
service = pd.Categorical(data['service'])

In [None]:
data['service'] = service.codes

Substituindo dados

In [None]:
data['ct_ftp_cmd'] = data['ct_ftp_cmd'].replace(' ', 0)

In [None]:
data['ct_ftp_cmd'] = pd.to_numeric(data['ct_ftp_cmd'])

In [None]:
data['attack_cat'] = data['attack_cat'].replace(np.nan, 'Benign')

Separando Labels

In [None]:
attack_cat = data.pop('attack_cat')

In [None]:
attack_cat = pd.Categorical(attack_cat)

In [None]:
label = data.pop('Label')

Separando endereços ip

In [None]:
srcip = data.pop('srcip')

In [None]:
dstip = data.pop('dstip')

Verificando novos tipos

In [None]:
data.dtypes

sport                 int64
dsport                int64
proto                 int16
state                  int8
dur                 float64
sbytes                int64
dbytes                int64
sttl                  int64
dttl                  int64
sloss                 int64
dloss                 int64
service                int8
Sload               float64
Dload               float64
Spkts                 int64
Dpkts                 int64
swin                  int64
dwin                  int64
stcpb                 int64
dtcpb                 int64
smeansz               int64
dmeansz               int64
trans_depth           int64
res_bdy_len           int64
Sjit                float64
Djit                float64
Stime                 int64
Ltime                 int64
Sintpkt             float64
Dintpkt             float64
tcprtt              float64
synack              float64
ackdat              float64
is_sm_ips_ports       int64
ct_state_ttl          int64
ct_flw_http_mthd    

### Normaliznado dados

In [None]:
features_stats = data.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sport,2539739.0,30536.931086,20441.216792,0.0,11231.0,31690.0,47439.0,65535.0
dsport,2539739.0,11235.096789,18438.200836,0.0,53.0,80.0,14970.0,65535.0
proto,2539739.0,114.494973,9.455864,0.0,113.0,113.0,119.0,133.0
state,2539739.0,4.546094,1.428131,0.0,5.0,5.0,5.0,15.0
dur,2539739.0,0.658863,13.925768,0.0,0.001037,0.015864,0.214754,8786.637695


In [None]:
def norm(x):
  return (x - features_stats['mean']) / features_stats['std']

normed_features = norm(data)

In [None]:
normed_features.isna().sum()

### Separando conjunto de treino e teste

In [None]:
 normed_features['Label'] = label

In [None]:
normed_features['attack_cat'] = attack_cat

In [None]:
train = normed_features.sample(frac=0.8,random_state=1)
test = normed_features.drop(train.index)

In [None]:
train.Label.value_counts()

0    1774805
1     256986
Name: Label, dtype: int64

In [None]:
test.Label.value_counts()

0    443651
1     64297
Name: Label, dtype: int64

In [None]:
test.shape, train.shape

((507948, 47), (2031791, 47))

In [None]:
train_targ1 = train.pop('Label')
train_targ2 = train.pop('attack_cat')
train_feat = train

test_targ1 = test.pop('Label')
test_targ2 = test.pop('attack_cat')
test_feat = test

## Criando Modelo

In [None]:
attack_cat.value_counts()

 Fuzzers              5051
 Fuzzers             19195
 Reconnaissance      12228
 Shellcode            1288
Analysis              2677
Backdoor              1795
Backdoors              534
DoS                  16353
Exploits             44525
Generic             215481
Reconnaissance        1759
Shellcode              223
Worms                  174
dtype: int64

In [None]:
train_feat.shape[1]

45

In [None]:
def build_model_label():
  model = keras.Sequential([
    keras.layers.Dense(15, activation='relu', input_shape=[45]),
    keras.layers.Dense(10, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
  ])

  model.compile(loss='sparse_categorical_crossentropy',
                optimizer='adam',
                metrics=['accuracy'])

  return model

In [None]:
model_label = build_model_label()

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


In [None]:
model_label.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 15)                690       
_________________________________________________________________
dense_1 (Dense)              (None, 10)                160       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 11        
Total params: 861
Trainable params: 861
Non-trainable params: 0
_________________________________________________________________


### Criando checkpoint para salvar treinamento

In [None]:
!mkdir training

In [None]:
checkpoint_path = "training/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

In [None]:
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)

## Treinamento

In [None]:
train_feat.shape

(2031791, 45)

In [None]:
train_targ1.shape

(2031791,)

In [None]:
train_targ1.value_counts()

0    1774805
1     256986
Name: Label, dtype: int64

In [None]:
train_feat.head()

Unnamed: 0,sport,dsport,proto,state,dur,sbytes,dbytes,sttl,dttl,sloss,dloss,service,Sload,Dload,Spkts,Dpkts,swin,dwin,stcpb,dtcpb,smeansz,dmeansz,trans_depth,res_bdy_len,Sjit,Djit,Stime,Ltime,Sintpkt,Dintpkt,tcprtt,synack,ackdat,is_sm_ips_ports,ct_state_ttl,ct_flw_http_mthd,is_ftp_login,ct_ftp_cmd,ct_srv_src,ct_srv_dst,ct_dst_ltm,ct_src_ ltm,ct_src_dport_ltm,ct_dst_sport_ltm,ct_dst_src_ltm
2232574,-1.442866,-0.606464,0.476427,1.018048,-0.047312,-0.072259,-0.226138,-0.037272,-0.718064,-0.229348,-0.288552,0.091341,0.677751,-0.580151,-0.410191,-0.351678,-1.19627,-1.192953,-0.887352,-0.887328,0.050996,-0.824453,-0.237878,-0.089312,-0.09397,-0.212334,0.875105,0.875105,-0.069473,-0.055003,-0.133905,-0.126767,-0.120812,-0.040675,-0.382335,,,-0.111515,1.82633,1.848883,1.906352,1.840065,2.047349,2.981003,1.967656
1531343,0.826862,-0.606464,0.476427,1.018048,-0.047312,-0.074918,-0.226138,2.562334,-0.718064,-0.229348,-0.288552,0.091341,0.97004,-0.580151,-0.410191,-0.351678,-1.19627,-1.192953,-0.887352,-0.887328,-0.442682,-0.824453,-0.237878,-0.089312,-0.09397,-0.212334,0.857768,0.857767,-0.069476,-0.055003,-0.133905,-0.126767,-0.120812,-0.040675,2.545361,,,-0.111515,0.719021,0.740114,1.293782,1.352578,1.457587,2.171254,0.901823
2202124,0.826862,-0.606464,0.476427,1.018048,-0.047312,-0.074918,-0.226138,2.562334,-0.718064,-0.229348,-0.288552,0.091341,0.97004,-0.580151,-0.410191,-0.351678,-1.19627,-1.192953,-0.887352,-0.887328,-0.442682,-0.824453,-0.237878,-0.089312,-0.09397,-0.212334,0.874617,0.874617,-0.069476,-0.055003,-0.133905,-0.126767,-0.120812,-0.040675,2.545361,,,-0.111515,1.272676,1.294499,2.028866,2.083809,2.165301,1.199556,1.43474
152766,1.367192,-0.327857,-0.1581,0.317832,-0.046856,-0.042228,-0.211812,-0.425873,-0.041315,0.037101,-0.182541,-0.72756,-0.2917,0.078812,-0.148026,-0.154161,0.83594,0.838262,0.756195,0.756295,-0.232046,-0.538418,-0.237878,-0.089312,-0.092921,-0.20811,-1.170264,-1.170265,-0.069374,-0.054828,-0.117701,-0.10402,-0.114215,-0.040675,-0.382335,-0.295422,-0.198842,-0.111515,-0.111461,-0.368655,-0.666443,-0.4755,-0.42965,-0.419941,-0.519288
1813052,1.521782,2.665765,-0.1581,0.317832,-0.009157,0.00631,-0.20702,-0.425873,-0.041315,0.081509,-0.164872,-0.72756,-0.31096,-0.569539,-0.069376,-0.104781,0.83594,0.838262,0.400276,0.407319,0.287961,-0.517561,-0.237878,-0.089312,-0.030933,-0.20455,0.865101,0.8651,-0.062398,-0.042233,-0.117896,-0.104984,-0.113547,-0.040675,-0.382335,,,-0.111515,-0.111461,-0.18386,-0.298901,0.011987,-0.42965,-0.419941,-0.25283


In [None]:
history = model_label.fit(train_feat, train_targ1, epochs=10)

### Validação

In [None]:
test_loss, test_acc = model.evaluate(test_feat,  test_targ, verbose=2)

### Grafico

In [None]:
acc = history.history['acc']
loss = history.history['loss']

In [None]:
epochs = range(1, len(acc) + 1)

In [None]:
plt.clf()

plt.plot(epochs, acc, '-r^', label='Training acc')
plt.title('Training  accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='best')

plt.show()

In [None]:
plt.clf()

plt.plot(epochs, loss, '-bo', label='Training loss')
plt.title('Training  loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc='best')

plt.show()

## Restaurando rede neural apartir dos dados salvos

Verificando se os arquvios existem

In [None]:
!ls {checkpoint_dir}

Criando novo modelo

In [None]:
model = build_model()

loss, acc = model.evaluate(test_feat,  test_targ, verbose=2)
print("Modelo sem treinamento, precisão: {:5.2f}%".format(100*acc))

Carregando pesos

In [None]:
model.load_weights(checkpoint_path)

loss,acc = model.evaluate(test_feat,  test_targ, verbose=2)
print("Modelo restaurado, precisão: {:5.2f}%".format(100*acc))

### Salvar dados no Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!zip training.zip training/

In [None]:
!cp training.zip "/content/drive/My Drive/"