# ML in Cybersecurity: Task 3

## Team
  * **Team name**:  *fill this in*
  * **Members**:  *fill this in. format: name1 (email1), name2 (email2), name3 (email3),*


## Logistics
  * **Due date**: 9th December 2021, 23:59:59
  * Email the completed notebook to: `mlcysec_ws2022_staff@lists.cispa.saarland`
  * Complete this in **teams of 3**
  * Feel free to use the forum to discuss.
  
## Timeline
  * 26-Nov-2021: hand-out
  * **09-Dec-2021**: Email completed notebook
  
  
## About this Project
In this project, you will explore an application of ML to a popular task in cybersecurity: malware classification.
You will be presented with precomputed behaviour analysis reports of thousands of program binaries, many of which are malwares.
Your goal is to train a malware detector using this behavioural reports.


## A Note on Grading
The grading for this project will depend on:
 1. Vectorizing Inputs
   * Obtaining a reasonable vectorized representations of the input data (a file containing a sequence of system calls)
   * Understanding the influence these representations have on your model
 1. Classification Model  
   * Following a clear ML pipeline
   * Obtaining reasonable performances (>60\%) on held-out test set
   * Choice of evaluation metric
   * Visualizing loss/accuracy curves
 1. Analysis
   * Which methods (input representations/ML models) work better than the rest and why?
   * Which hyper-parameters and design-choices were important in each of your methods?
   * Quantifying influence of these hyper-parameters on loss and/or validation accuracies
   * Trade-offs between methods, hyper-parameters, design-choices
   * Anything else you find interesting (this part is open-ended)


## Grading Details
 * 40 points: Vectorizing input data (each input = behaviour analysis file in our case)
 * 40 points: Training a classification model
 * 15 points: Analysis/Discussion
 * 5 points: Clean code
 
## Filling-in the Notebook
You'll be submitting this very notebook that is filled-in with your code and analysis. Make sure you submit one that has been previously executed in-order. (So that results/graphs are already visible upon opening it). 

The notebook you submit **should compile** (or should be self-contained and sufficiently commented). Check tutorial 1 on how to set up the Python3 environment.


**The notebook is your project report. So, to make the report readable, omit code for techniques/models/things that did not work. You can use the final summary to provide a report about these.**

It is extremely important that you **do not** re-order the existing sections. Apart from that, the code blocks that you need to fill-in are given by:
```
#
#
# ------- Your Code -------
#
#
```
Feel free to break this into multiple-cells. It's even better if you interleave explanations and code-blocks so that the entire notebook forms a readable "story".


## Code of Honor
We encourage discussing ideas and concepts with other students to help you learn and better understand the course content. However, the work you submit and present **must be original** and demonstrate your effort in solving the presented problems. **We will not tolerate** blatantly using existing solutions (such as from the internet), improper collaboration (e.g., sharing code or experimental data between groups) and plagiarism. If the honor code is not met, no points will be awarded.

 
 ## Versions
  * v1.1: Updated deadline
  * v1.0: Initial notebook
  
  ---

In [2]:
import time 
 
import numpy as np 
import matplotlib.pyplot as plt 

import json 
import time 
import pickle 
import sys 
import csv 
import os
import os.path as osp 
import shutil 
import pathlib
from pathlib import Path
from dataclasses import dataclass
from IPython.display import display, HTML
 
%matplotlib inline 
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots 
plt.rcParams['image.interpolation'] = 'nearest' 
plt.rcParams['image.cmap'] = 'gray' 
 
# for auto-reloading external modules 
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython 
%load_ext autoreload
%autoreload 2

In [12]:
# Some suggestions of our libraries that might be helpful for this project
from collections import Counter          # an even easier way to count
from multiprocessing import Pool         # for multiprocessing
from tqdm import tqdm                    # fancy progress bars

# Load other libraries here.
# Keep it minimal! We should be easily able to reproduce your code.

# We preload pytorch as an example
import torch
import torch.nn as nn
import torchmetrics
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset, TensorDataset
from torch.nn.utils.rnn import pack_padded_sequence

from sklearn.model_selection import train_test_split
import gc
from sklearn.preprocessing import LabelEncoder

# Setup

  * Download the datasets: [train](https://nextcloud.mpi-klsb.mpg.de/index.php/s/pJrRGzm2So2PMZm) (128M) and [test](https://nextcloud.mpi-klsb.mpg.de/index.php/s/zN3yeWzQB3i5WqE) (92M)
  * Unpack them under `./data/train` and `./data/test`
  * Hint: you can execute shell scripts from notebooks using the `!` prefix, e.g., `! wget <url>`

In [None]:
!wget -O train.tar.gz https://nextcloud.mpi-klsb.mpg.de/index.php/s/pJrRGzm2So2PMZm/download
!wget -O test.tar.gz https://nextcloud.mpi-klsb.mpg.de/index.php/s/zN3yeWzQB3i5WqE/download

In [None]:
!mkdir data

In [None]:
!tar -xzvf train.tar.gz -C data/

In [None]:
!tar -xzvf test.tar.gz -C data/

In [None]:
# Check that you are prepared with the data
! printf '# train examples (Should be 13682) : '; ls data/train | wc -l
! printf '# test  examples (Should be 10000) : '; ls data/test | wc -l

Now that you're set, let's briefly look at the data you have been handed.
Each file encodes the behavior report of a program (potentially a malware), using an encoding scheme called "The Malware Instruction Set" (MIST for short).
At this point, we highly recommend you briefly read-up Sec. 2 of the [MIST](http://www.mlsec.org/malheur/docs/mist-tr.pdf) documentation.

You will find each file named as `filename.<malwarename>`:
```
» ls data/train | head
00005ecc06ae3e489042e979717bb1455f17ac9d.NothingFound
0008e3d188483aeae0de62d8d3a1479bd63ed8c9.Basun
000d2eea77ee037b7ef99586eb2f1433991baca9.Patched
000d996fa8f3c83c1c5568687bb3883a543ec874.Basun
0010f78d3ffee61101068a0722e09a98959a5f2c.Basun
0013cd0a8febd88bfc4333e20486bd1a9816fcbf.Basun
0014aca72eb88a7f20fce5a4e000c1f7fff4958a.Texel
001ffc75f24a0ae63a7033a01b8152ba371f6154.Texel
0022d6ba67d556b931e3ab26abcd7490393703c4.Basun
0028c307a125cf0fdc97d7a1ffce118c6e560a70.Swizzor
...
```
and within each file, you will see a sequence of individual systems calls monitored duing the run-time of the binary - a malware named 'Basun' in the case:
```
» head data/train/000d996fa8f3c83c1c5568687bb3883a543ec874.Basun
# process 000006c8 0000066a 022c82f4 00000000 thread 0001 #
02 01 | 000006c8 0000066a 00015000
02 02 | 00006b2c 047c8042 000b9000
02 02 | 00006b2c 047c8042 00108000
02 02 | 00006b2c 047c8042 00153000
02 02 | 00006b2c 047c8042 00091000
02 02 | 00006b2c 047c8042 00049000
02 02 | 00006b2c 047c8042 000aa000
02 02 | 00006b2c 047c8042 00092000
02 02 | 00006b2c 047c8042 00011000
...
```
(**Note**: Please ignore the first line that begins with `# process ...`.)

Your task in this project is to train a malware detector, which given the sequence of system calls (in the MIST-formatted file like above), predicts one of 10 classes: `{ Agent, Allaple, AutoIt, Basun, NothingFound, Patched, Swizzor, Texel, VB, Virut }`, where `NothingFound` roughly represents no malware is present.
In terms of machine learning terminology, your malware detector $F: X \rightarrow Y$ should learn a mapping from the MIST-encoded behaviour report (the input $x \in X$) to the malware class $y \in Y$.

Consequently, you will primarily tackle two challenges in this project:
  1. "Vectorizing" the input data i.e., representing each input (file) as a tensor
  1. Training an ML model
  

### Some tips:
  * Begin with an extremely simple representation/ML model and get above chance-level classification performance
  * Choose your evaluation metric wisely
  * Save intermediate computations (e.g., a token to index mapping). This will avoid you parsing the entire dataset for every experiment
  * Try using `multiprocessing.Pool` to parallelize your `for` loops

---

# 1. Vectorize Data

## 1.a. Load Raw Data

In [5]:
def load_content(filepath):
    '''Given a filepath, returns (content, classname), where content = [list of lines in file]'''
    with open(filepath, "r") as file:
        data = file.read().split('\n')
    data = data[:-1]
    _, file_extension = os.path.splitext(filepath)

    return data, file_extension[1:]


def load_data(root_path, nworkers=10):
    '''Returns each data sample as a list of strings wher each string correpsonds to one file, and another list corresponding to the labels.'''
    raw_data_samples = []
    labels = []
    for file_name in tqdm(os.listdir(root_path)):
        filepath = os.path.join(root_path, file_name)
        data, label = load_content(filepath)
        lst_initials = []
        for line in data:
            if line[0]=="#":
                continue
            line = line.split(" ")
            line.remove("|")
            if len(line)>=4:
                lst_initials.extend(line[:4])
            else:
                lst_initials.extend(line[:len(line)])
        raw_data_samples.append(" ".join(lst_initials))
        labels.append(label)

    return raw_data_samples, labels

In [6]:
print('=> Loading training data ... ')
train_raw_samples, train_labels = load_data('./data/train')

=> Loading training data ... 


100%|██████████| 13682/13682 [05:40<00:00, 40.23it/s]


In [7]:
print('=> Loading testing data ... ')
test_raw_samples, test_labels = load_data('./data/test')

=> Loading testing data ... 


100%|██████████| 10000/10000 [03:55<00:00, 42.46it/s]


In [8]:
# project_mode = 'trainval'    # trainval, traintest, debug
# np.random.seed(123)          # To perform the same split across multiple runs

# if project_mode == 'trainval':
#     pass
# #     train_raw_samples, val_raw_samples = train_test_split()
# elif project_mode == 'traintest':
#     # Load train and test data
#     pass
# elif project_mode == 'debug':
#     # Optional, use a small subset of the training and validation data for fast debugging
#     pass
# else:
#     raise ValueError('Unrecognized mode')
    
# # print('=> # Train samples = ', len(train_raw_samples))
# # print('=> # Test  samples = ', len(test_raw_samples))

In [9]:
print(len(train_raw_samples), len(train_labels))
print(len(test_raw_samples), len(test_labels))

13682 13682
10000 10000


In [10]:
# Converting labels to numericals
le = LabelEncoder()
train_labels = le.fit_transform(train_labels)
test_labels = le.transform(test_labels)

In [11]:
train_raw, val_raw, labels_train, labels_val = train_test_split(train_raw_samples, train_labels, test_size=0.20, stratify=train_labels, random_state=42)
del train_raw_samples, train_labels
gc.collect()

0

In [12]:
print("Training data len: ", len(train_raw), len(labels_train))
print("Val data len: ", len(val_raw), len(labels_val))
print("Test data len: ", len(test_raw_samples), len(test_labels))

Training data len:  10945 10945
Val data len:  2737 2737
Test data len:  10000 10000


## 1.b. Vectorize: Setup

Make one pass over the inputs to identify relevant features/tokens.

Suggestion:
  - identify tokens (e.g., unigrams, bigrams)
  - create a token -> index (int) mapping. Note that you might have a >10K unique tokens. So, you will have to choose a suitable "vocabulary" size.

In [13]:
# Creating the word2index and index2word dictionaries.
def get_key_idx_map(counter, vocab_size):
    word2idx = {'_PAD': 0, '_UNK': 1}
    word2idx.update({word: i+2 for i, (word, count) in enumerate(counter.most_common(vocab_size))})
    idx2word = {idx: word for word, idx in word2idx.items()}
    return word2idx, idx2word

In [14]:
# Creating the counts of the unigrams from the training data.
counts = Counter()
for line in tqdm(train_raw):
    counts.update(line.split(" "))

100%|██████████| 10945/10945 [01:02<00:00, 175.79it/s]


In [15]:
print("Total unique words: ", len(counts))

Total unique words:  131579


In [16]:
max_vocab_size = 20000

word2idx, idx2word = get_key_idx_map(counts, max_vocab_size)

# Save vocab to file
out_path = 'application_vocab_{}.pkl'.format(max_vocab_size)
with open(out_path, 'wb') as wf:
    dct = {'word2idx': word2idx,
          'idx2word': idx2word}
    pickle.dump(dct, wf)

In [17]:
len(word2idx), len(idx2word)

(20002, 20002)

In [18]:
list(word2idx.items())[:9]

[('_PAD', 0),
 ('_UNK', 1),
 ('03', 2),
 ('02', 3),
 ('000004', 4),
 ('004000', 5),
 ('01', 6),
 ('0a', 7),
 ('047c8042', 8)]

In [19]:
list(idx2word.items())[:9]

[(0, '_PAD'),
 (1, '_UNK'),
 (2, '03'),
 (3, '02'),
 (4, '000004'),
 (5, '004000'),
 (6, '01'),
 (7, '0a'),
 (8, '047c8042')]

## 1.c. Vectorize Data

Use the (token $\rightarrow$ index) mapping you created before to vectorize your data

In [20]:
def convert_to_index(sent, max_length):
  '''Converts the word to indexed representation and also pads or truncate it to max_length'''

  sentence_indexed = []
  for token in sent.split(" "):
    if len(sentence_indexed) >= max_length:
      break
    sentence_indexed.append(word2idx.get(str(token), 1))

  length_sent = len(sentence_indexed)
  if len(sentence_indexed) < max_length:
    sentence_indexed = sentence_indexed + [0] * (max_length - len(sentence_indexed))

  return sentence_indexed, length_sent

In [21]:
def vectorize_raw_samples(raw_samples, nworkers=10):
    vectorized_samples = []
    original_lengths = []
    for line in tqdm(raw_samples):
      sentence_indexed, length = convert_to_index(line, 2000)
      vectorized_samples.append(sentence_indexed)
      original_lengths.append(length)
    return vectorized_samples, original_lengths

In [22]:
print('=> Processing: Train')
train_data, train_lengths = vectorize_raw_samples(train_raw)
del train_raw
gc.collect()

=> Processing: Train


100%|██████████| 10945/10945 [00:19<00:00, 554.09it/s]


27

In [23]:
print('=> Processing: Val')
val_data, val_lengths = vectorize_raw_samples(val_raw)
del val_raw
gc.collect()

=> Processing: Val


100%|██████████| 2737/2737 [00:06<00:00, 419.59it/s]


27

In [24]:
print('=> Processing: Test')
test_data, test_lengths = vectorize_raw_samples(test_raw_samples)
del test_raw_samples
gc.collect()

=> Processing: Test


100%|██████████| 10000/10000 [00:19<00:00, 524.55it/s]


27

In [25]:
train_data, labels_train, train_lengths = torch.LongTensor(train_data), torch.LongTensor(labels_train), torch.Tensor(train_lengths)
val_data, labels_val, val_lengths = torch.LongTensor(val_data), torch.LongTensor(labels_val), torch.Tensor(val_lengths)
test_data, labels_test, test_lengths = torch.LongTensor(test_data), torch.LongTensor(test_labels), torch.Tensor(test_lengths)

In [26]:
print(train_data.shape, labels_train.shape, train_lengths.shape)
print(val_data.shape, labels_val.shape, val_lengths.shape)
print(test_data.shape, labels_test.shape, test_lengths.shape)

torch.Size([10945, 2000]) torch.Size([10945]) torch.Size([10945])
torch.Size([2737, 2000]) torch.Size([2737]) torch.Size([2737])
torch.Size([10000, 2000]) torch.Size([10000]) torch.Size([10000])


In [4]:
# Vocab size to create the embedding layer
vocab_size = len(word2idx) #20002

trainset = TensorDataset(train_data, labels_train, train_lengths)
validset = TensorDataset(val_data, labels_val, val_lengths)
testset = TensorDataset(test_data, labels_test, test_lengths)

# trainset = torch.load('trainset.pt') 
# validset =  torch.load('validset.pt')
# testset = torch.load('testset.pt')

train_loader = DataLoader(trainset, batch_size=64, shuffle=True, num_workers=4)
val_loader = DataLoader(validset, batch_size=64, shuffle=False, num_workers=4)
test_loader = DataLoader(testset, batch_size=64, shuffle=False, num_workers=4)

In [5]:
# Example of using dataloader
for batch_idx, (sents, labels, lens) in enumerate(train_loader):
  print(sents, labels.shape, lens.shape)
  break

tensor([[  3,   6,  11,  ...,   0,   0,   0],
        [  3,   6, 163,  ..., 115,  39,  12],
        [  3,   6, 163,  ...,   0,   0,   0],
        ...,
        [  3,   6, 163,  ...,   0,   0,   0],
        [  3,   6,  11,  ...,  13,   4,   5],
        [  3,   6, 163,  ...,  10,   6,  53]]) torch.Size([64]) torch.Size([64])


# 2. Train Model

You will now train an ML model on the vectorized datasets you created previously.

_Note_: Although we often refer to each input as a 'vector' for simplicity, each of your inputs can also be higher dimensional tensors.

## 2.a. Helpers

In [6]:
DEVICE = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

In [7]:
# Feel free to edit anything in this block

def evaluate_preds(y_gt, y_pred):
    pass


def another_helper(question):
    return 42


def save_model(model, out_path):
    pass


def save_data(eval_data, out_path):
    with open(out_path, 'wb') as wf:
        pickle.dump(eval_data, out_path)

## 2.b. Define Model

Describe your model here.

In [8]:
class biLSTM(nn.ModuleList):
    def __init__(self, params):
        super(biLSTM, self).__init__()
        self.vocab_size = params.vocab_size
        self.embedding_size = params.embedding_size
        self.hidden_dim = params.hidden_dim
        self.n_layers = params.n_layers
        self.dropout = params.dropout
        self.bidirectional = params.bidirectional
        self.output_dim = params.output_dim
        
        # Embedding layer definition
        self.embedding = nn.Embedding(self.vocab_size, self.embedding_size)
        
        self.rnn = nn.LSTM(self.embedding_size,
                            self.hidden_dim, 
                            num_layers=self.n_layers, 
                            bidirectional=self.bidirectional, 
                            batch_first=True, 
                            dropout=self.dropout)

        self.fc = nn.Linear(self.hidden_dim*2, self.output_dim)
        self.dropout = nn.Dropout(self.dropout)
        self.act = nn.Softsign()
    
    def forward(self, text, lens):
        text_emb = self.dropout(self.embedding(text))
        packed_input = pack_padded_sequence(text_emb, lens, batch_first=True, enforce_sorted=False)
        lstm_out, (hidden, ct) = self.rnn(packed_input)
        #concat the final forward and backward hidden state
        hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)
        dense_outputs =  self.fc(hidden)
        outputs = self.act(dense_outputs)
        return outputs

## 2.c. Set Hyperparameters

In [21]:
@dataclass
class Parameters:
    # Model parameters
    vocab_size: int = 20002
    embedding_size: int = 100
    bidirectional: bool = True
    save_model: bool = True
    model_dir: str = "biLSTM"
    output_dim: int = 10    
        
    # Training parameters
    hidden_dim: int = 64
    n_layers: int = 2
    learning_rate: float = 1e-2
    dropout: float = 0.2
    epochs: int = 20
    batch_size: int = 64

## 2.d. Train your Model

In [22]:
class Run:
    '''Training, evaluation and metrics calculation'''

    @staticmethod
    def train(model, data, params):
        loader_train = data["train_loader"]
        loader_valid = data["val_loader"]
        loader_test = data["test_loader"]
        
        # Define optimizer and loss function
        optimizer = optim.RMSprop(model.parameters(), lr=params.learning_rate)
        criterion = nn.CrossEntropyLoss(reduction='sum')

        # Tracking best validation accuracy
        best_accuracy = 0
        
        print("\nStart training...")
        print(f"{'Epoch':^7} | {'Train Acc':^10} | {'F1 Score':^10} |{'Val Loss':^10} | {'Val Acc':^9} | {'Test Acc':^9} | {'Elapsed':^9}")
        print("-"*80)
    
        # Starts training phase
        train_loss_list = []
        val_loss_list = []
        test_acc_list = []
        torch_F1 = torchmetrics.F1(num_classes=params.output_dim, average="micro")
        
        for epoch in range(params.epochs):
            # =======================================
            #               Training
            # =======================================
            # Tracking time and loss
            t0_epoch = time.time()
            total_loss = 0
            train_accuracy = 0
            f1_score = 0

            # Put the model into training mode
            model.train()
        
            # Starts batch training
            for x_batch, y_batch, lens in loader_train:
                
                # pdb.set_trace()
                # Load batch to GPU
                x_batch = x_batch.long()
                y_batch = y_batch.long()
                # lens = lens.to(DEVICE)
                x_batch = x_batch.to(DEVICE)
                y_batch = y_batch.to(DEVICE)
                
                # Feed the model
                
                y_pred = model(x_batch,lens)

                # Compute loss and accumulate the loss values
                loss = criterion(y_pred, y_batch)
                # import pdb; pdb.set_trace()
                total_loss += loss.item()

                # Clean gradients
                optimizer.zero_grad()

                # Gradients calculation
                loss.backward()

                # Gradients update
                optimizer.step()
            
                # Accumulate training accuracy (for all batch in one epoch)
                corrects = (torch.max(y_pred, 1)[1].view(y_batch.size()).data == y_batch.data).sum()
                acc = 100.0 * corrects/loader_train.batch_size
                train_accuracy += acc
                
                # Compute F1 Score over a batch
                f1_score += torch_F1(y_pred.detach().cpu(), y_batch.detach().cpu())
                

            # Calculate the average loss over the entire training data for an epoch
            avg_train_loss = total_loss / len(loader_train)
            train_loss_list.append(avg_train_loss)
            
            # Compute accuracy and F1 averaged over all batches for an epoch
            train_accuracy = train_accuracy / len(loader_train)
            f1_score /= len(loader_train)
            # =======================================
            #               Evaluation
            # =======================================
            
            # After the completion of each training epoch, measure the model's
            # performance on our validation set.          
            
            # Validation metrics
            val_loss, val_accuracy = evaluation(model, loader_valid)
            _, test_accuracy = evaluation(model, loader_test)
            val_loss_list.append(val_loss)
            test_acc_list.append(test_accuracy)
            
            # Track the best accuracy
            if val_accuracy > best_accuracy:
                best_accuracy = val_accuracy
                # Save the best model
                os.makedirs(params.model_dir, exist_ok=True)
                if params.save_model:
                    # Save the best model
                    PATH = params.model_dir + "/bi-dir_lstm_" + str(epoch+1) +"_batch_size_" + str(params.batch_size) + ".pth"
                    torch.save(model.state_dict(), PATH)

            # Print performance over the entire training data
            time_elapsed = time.time() - t0_epoch
            print(f"{epoch + 1:^7} | {train_accuracy:^10.2f} | {f1_score:^10.4f} | {val_loss:^10.3f} | {val_accuracy:^9.2f} | {test_accuracy:^9.2f} | {time_elapsed:^9.2f}")
            
            
        print("\n")
        print(f"Training complete! \nBest accuracy: {best_accuracy:.2f} %.")
        return train_loss_list, val_loss_list, test_acc_list, PATH

def evaluation(model, loader_valid):
    # Set the model in evaluation mode
    model.eval()
    corrects, avg_loss = 0, 0
    criterion = nn.CrossEntropyLoss(reduction='sum')

    # Start evaluation phase   
    with torch.no_grad():
        for x_batch, y_batch, lens in loader_valid:
            x_batch = x_batch.long()
            y_batch = y_batch.long()
            # lens = lens.to(DEVICE)
            x_batch = x_batch.to(DEVICE); 
            y_batch = y_batch.to(DEVICE)
            
            y_pred = model(x_batch, lens)
            
            loss = criterion(y_pred, y_batch)
            avg_loss += loss.item()
            corrects += (torch.max(y_pred, 1) [1].view(y_batch.size()).data == y_batch.data).sum()
            
    size = len(loader_valid.dataset)
    avg_loss /= size
    accuracy = 100.0 * corrects/size
    
    return avg_loss, accuracy

In [23]:
data = {
    "train_loader": train_loader,
    "val_loader": val_loader,
    "test_loader": test_loader,
}
start = time.time()
params = Parameters()
model = biLSTM(params)
model.to(DEVICE)
print(model)

# Train and Evaluate the pipeline
train_loss_list, val_loss_list, test_acc_list, PATH = Run().train(model, data, params)
end = time.time()
print("*** Training Complete ***")
print("Training runtime: {:.2f} s".format(end-start))

biLSTM(
  (embedding): Embedding(20002, 100)
  (rnn): LSTM(100, 64, num_layers=2, batch_first=True, dropout=0.2, bidirectional=True)
  (fc): Linear(in_features=128, out_features=10, bias=True)
  (dropout): Dropout(p=0.2, inplace=False)
  (act): Softsign()
)

Start training...
 Epoch  | Train Acc  |  F1 Score  | Val Loss  |  Val Acc  | Test Acc  |  Elapsed 
--------------------------------------------------------------------------------
   1    |   58.87    |   0.5944   |   1.409    |   63.76   |   64.45   |   55.74  
   2    |   66.91    |   0.6691   |   1.301    |   67.67   |   67.94   |   55.76  
   3    |   69.96    |   0.7053   |   1.248    |   71.61   |   72.01   |   55.28  
   4    |   72.49    |   0.7307   |   1.245    |   72.74   |   73.66   |   55.88  
   5    |   72.85    |   0.7342   |   1.202    |   76.43   |   76.51   |   55.61  
   6    |   74.55    |   0.7455   |   1.207    |   76.25   |   77.50   |   55.72  
   7    |   73.64    |   0.7421   |   1.243    |   71.79   |  

## 2.e. Evaluate model

In [None]:
#
#
# ------- Your Code -------
#
# 

## 2.f. Save Model + Data

In [20]:
print("Best model saved as: ", PATH)

# save train, val, test
torch.save(trainset, 'trainset.pt')
torch.save(validset, 'validset.pt')
torch.save(testset, 'testset.pt')

Best model saved as:  biLSTM/bi-dir_lstm_1_batch_size_64.pth


---

# 3. Analysis

## 3.a. Summary: Main Results

Summarize your approach and results here

## 3.b. Discussion

Enter your final summary here.

For instance, you can address:
- What was the performance you obtained with the simplest approach?
- Which vectorized input representations helped more than the others?
- Which malwares are difficult to detect and why?
- Which approach do you recommend to perform malware classification?