## Log sequence anomaly detection

### Content
- Introduction
- Dataset
- Training and Evaluation
- Conclusion
- Reference

### Introduction
Anomaly detection in sequential log data aims to identify sequences that deviate from the expected behavior
or patterns. For example, software intensive systems often record runtime information by
printing console logs. A large and complex system could produce a massive amount of logs, which can be used for troubleshooting
purposes. The log messages can be modeled as an event sequence. It is critical to detect anomalous states
in a timely manner to ensure the reliability the software system and mitigate the losses.

Log data is usually unstructured text messages, which can help engineers understand the system’s internal
status and facilitate monitoring, administering, and troubleshooting of the system Log messages can be parsed into log events,
which are templates (constant part) of the messages. 
 
This usecase shows a workflow for identifying sequential anomalies from raw log sequence data.

In [1]:
%load_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
import random
import utils, datatools, model
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.optim as optim
from tqdm import tqdm
import time
import math
import os
from sklearn import metrics


SEED = 91
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

#### Dataset processing

The dataset for the example used from BlueGene/L Supercomputer System (BGL). BGL dataset contains 4,747,963 log messages that are collected
from a [BlueGeme/L]('https://zenodo.org/record/3227177/files/BGL.tar.gz?download=1') supercomputer system at Lawrence Livermore National Labs. The log messages can be categorized into alert and not-alert messages. The log message is parsed using [`Drain`](https://github.com/logpai/logparser) parser into structured log format.

For running this workflow we can use portion of parsed BGL dataset taken from  https://github.com/LogIntelligence/LogPPT. 

#### Preprocessing log dataset

In [8]:
df = pd.read_csv('https://raw.githubusercontent.com/LogIntelligence/LogPPT/master/logs/BGL/BGL_2k.log_structured.csv')

In [2]:
# Small dataset for testing
DATASET_NAME = 'https://raw.githubusercontent.com/LogIntelligence/LogPPT/master/logs/BGL/BGL_2k.log_structured.csv' #'BGL_2k'
TRAIN_SIZE = 100 
WINDOW_SIZE = 10
STEP_SIZE = 20
RATIO = 0.1

# Full dataset parsed using DRAIN parser
# DATASET_NAME = 'dataset/bgl_1m.log_structured.csv'
# TRAIN_SIZE = 10000 #00
# WINDOW_SIZE = 100
# STEP_SIZE = 20
# RATIO = 0.1

Create train and test dataset by transforming log dataset into embedding vectors.

In [3]:
train_normal, test_normal, test_abnormal, bigram, unique, weights, train_dict, w2v_dict = datatools.sliding_window(DATASET_NAME, WINDOW_SIZE, STEP_SIZE, TRAIN_SIZE)

Reading: dataset/bgl_1m.log_structured.csv
Total logs in the dataset:  1000000
training size 10000
test normal size 26631
test abnormal size 13365
Number of training keys: 93
Word2Vec model: Word2Vec(vocab=94, size=8, alpha=0.025)


In [4]:
# Hyperparmeters
vocab_dim = len(train_dict)+1
output_dim = 2
emb_dim = 8
hidden_dim = 128
n_layers = 1
dropout = 0.0
batch_size = 32
times = 20

Generate negative samples and split into training data and validation data. Given a set of normal sequences, an anomalous sequences are generated via neative sampling. A binary sequence classifier is trained to classify the negative samples from the true positive samples.

In [6]:
neg_samples = datatools.negative_sampling(train_normal, bigram, unique, times, vocab_dim)
df_neg = datatools.get_dataframe(neg_samples, 1, w2v_dict)
df_pos = datatools.get_dataframe(list(train_normal['EventId']), 0, w2v_dict)
df_pos.columns = df_pos.columns.astype(str)
df_train = pd.concat([df_pos, df_neg], ignore_index = True, axis=0)
df_train.reset_index(drop = True)
y = list(df_train.loc[:,'class_label'])
X = list(df_train['W2V_EventId'])

# split train, validation set
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
X_train = torch.tensor(X_train,requires_grad=False).long()
X_val = torch.tensor(X_val,requires_grad=False).long()
y_train = torch.tensor(y_train).reshape(-1, 1).long()
y_val = torch.tensor(y_val).reshape(-1, 1).long()
train_iter = utils.get_iter(X_train, y_train, batch_size)
val_iter = utils.get_iter(X_val, y_val, batch_size)

In [7]:
df_train

Unnamed: 0,EventId,class_label,W2V_EventId
0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
1,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,"[1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, ...",0,"[29, 22, 29, 22, 29, 22, 29, 22, 29, 22, 29, 2..."
3,"[3, 4, 4, 3, 4, 3, 4, 3, 4, 3, 3, 4, 3, 4, 3, ...",0,"[4, 10, 10, 4, 10, 4, 10, 4, 10, 4, 4, 10, 4, ..."
4,"[5, 6, 5, 6, 5, 6, 6, 5, 6, 5, 6, 5, 6, 5, 6, ...",0,"[23, 39, 23, 39, 23, 39, 39, 23, 39, 23, 39, 2..."
...,...,...,...
209995,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",1,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
209996,"[22, 22, 22, 15, 89, 29, 22, 22, 22, 82, 22, 2...",1,"[28, 28, 28, 56, 84, 30, 28, 28, 28, 87, 28, 2..."
209997,"[56, 30, 56, 30, 56, 56, 30, 56, 30, 56, 30, 5...",1,"[32, 26, 32, 26, 32, 32, 26, 32, 26, 32, 26, 3..."
209998,"[0, 0, 0, 0, 0, 2, 0, 0, 0, 51, 0, 0, 0, 0, 91...",1,"[0, 0, 0, 0, 0, 22, 0, 0, 0, 17, 0, 0, 0, 0, 8..."


### Training and Evaluation

An LSTM model is trained using word2vector input genrated from both positive and negative examples with task of binary classification.

In [8]:
device = torch.device( "cuda" if torch.cuda.is_available() else"cpu")
n_epoch = 10
kwargs = {"matrix_embeddings":weights, 
"vocab_dim": vocab_dim, "output_dim": output_dim, "emb_dim": emb_dim,
"hid_dim": hidden_dim, 
"n_layers": n_layers, 
"dropout": dropout,
"batch_size": batch_size}
LAD_model = model.LogLSTM(weights, vocab_dim, output_dim, emb_dim, hidden_dim, n_layers, dropout,  batch_size).to(device)
optimizer = optim.Adam(LAD_model.parameters())
criterion = nn.CrossEntropyLoss()

try:
    os.makedirs('model')
except:
    pass

# Training LSTM model
clip = 1

best_test_loss = float('inf')

for epoch in tqdm(range(n_epoch)):
    
    start_time = time.time()
    train_loss= model.train(LAD_model, train_iter, optimizer, criterion,  device)        

    val_loss = model.evaluate(LAD_model, val_iter, criterion, device)
    
    end_time = time.time()
    
    epoch_mins, epoch_secs = model.epoch_time(start_time, end_time)
    
    if val_loss < best_test_loss:
        best_test_loss = val_loss
        torch.save({
            'model_state_dict':LAD_model.state_dict(),
            "model_hyperparam": kwargs,
            "W2V_conf": {
            'train_dict': train_dict, 
            'w2v_dict': w2v_dict,
            "WINDOW_SIZE": WINDOW_SIZE,
            "STEP_SIZE": STEP_SIZE
            }
        }, 'model/model_BGL.pt')
    print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
    print(f'\t Val. Loss: {val_loss:.3f} |  Val. PPL: {math.exp(val_loss):7.3f}')



 10%|█         | 1/10 [06:04<54:43, 364.85s/it]

Epoch: 01 | Time: 6m 4s
	Train Loss: 0.060 | Train PPL:   1.061
	 Val. Loss: 0.027 |  Val. PPL:   1.027


 20%|██        | 2/10 [12:08<48:35, 364.43s/it]

Epoch: 02 | Time: 6m 4s
	Train Loss: 0.024 | Train PPL:   1.024
	 Val. Loss: 0.021 |  Val. PPL:   1.022


 30%|███       | 3/10 [18:15<42:37, 365.32s/it]

Epoch: 03 | Time: 6m 6s
	Train Loss: 0.019 | Train PPL:   1.020
	 Val. Loss: 0.017 |  Val. PPL:   1.018


 40%|████      | 4/10 [24:21<36:34, 365.80s/it]

Epoch: 04 | Time: 6m 6s
	Train Loss: 0.016 | Train PPL:   1.016
	 Val. Loss: 0.016 |  Val. PPL:   1.016


 50%|█████     | 5/10 [30:45<31:00, 372.08s/it]

Epoch: 05 | Time: 6m 23s
	Train Loss: 0.018 | Train PPL:   1.018
	 Val. Loss: 0.021 |  Val. PPL:   1.021


 60%|██████    | 6/10 [37:07<25:02, 375.62s/it]

Epoch: 06 | Time: 6m 22s
	Train Loss: 0.018 | Train PPL:   1.018
	 Val. Loss: 0.017 |  Val. PPL:   1.017


 70%|███████   | 7/10 [43:13<18:37, 372.42s/it]

Epoch: 07 | Time: 6m 5s
	Train Loss: 0.015 | Train PPL:   1.015
	 Val. Loss: 0.017 |  Val. PPL:   1.017


 80%|████████  | 8/10 [49:19<12:21, 370.52s/it]

Epoch: 08 | Time: 6m 6s
	Train Loss: 0.014 | Train PPL:   1.014
	 Val. Loss: 0.018 |  Val. PPL:   1.019


 90%|█████████ | 9/10 [55:25<06:08, 368.99s/it]

Epoch: 09 | Time: 6m 5s
	Train Loss: 0.014 | Train PPL:   1.014
	 Val. Loss: 0.013 |  Val. PPL:   1.013


100%|██████████| 10/10 [1:01:30<00:00, 369.01s/it]

Epoch: 10 | Time: 6m 4s
	Train Loss: 0.013 | Train PPL:   1.013
	 Val. Loss: 0.015 |  Val. PPL:   1.015





### Evaluation

The model is evaluated using F1 score.

In [9]:
# For evaluation the 
test_abnormal_ratio = model.ratio_abnormal_sequence(test_abnormal, WINDOW_SIZE, RATIO)
test_ab_X, test_ab_X_key_label = test_abnormal_ratio['W2V_EventId'], test_abnormal_ratio['Key_label']
test_n_X, test_n_X_key_label = test_normal['W2V_EventId'], test_normal['Key_label']
test_ab_y = test_abnormal_ratio['Label']
test_n_y = test_normal['Label']
y, y_pre = model.model_precision(LAD_model, device, test_n_X.values.tolist()[:int(len(test_n_X.values.tolist())*(len(test_abnormal_ratio)/len(test_abnormal)))], \
                           test_ab_X.values.tolist())
f1_acc = metrics.classification_report(y, y_pre, digits=5)
print(f1_acc)

              precision    recall  f1-score   support

           0    0.98641   0.96198   0.97404      2867
           1    0.92781   0.97359   0.95015      1439

    accuracy                        0.96586      4306
   macro avg    0.95711   0.96779   0.96210      4306
weighted avg    0.96683   0.96586   0.96606      4306



In [15]:
test_normal.head(5)

Unnamed: 0,Label,EventId,Key_label,W2V_EventId
10000,0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
10001,0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
10002,0,"[17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 1...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ..."
10003,0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
10004,0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [87]:

# Load trained model and parameters for inference

# check_point  = torch.load('model/model_BGL.pt')

# window_df = datatools.preprocess(df, check_point['W2V_conf']['WINDOW_SIZE'], check_point['W2V_conf']['STEP_SIZE'])


# # convert to input vector
# test_vector = datatools.test_vector(window_df, check_point['W2V_conf']['train_dict'], check_point['W2V_conf']['w2v_dict'])

# # load LogLSTM model
# trained_model_ = model.LogLSTM(**check_point['model_hyperparam']).to(device)
# trained_model_.load_state_dict(check_point['model_state_dict'])

# # predict label
# _, y_pred = model.model_inference(trained_model_, device, test_vector['W2V_EventId'].values.tolist())


### Conclusion

In this workflow, we show a pipeline for training sequence binary classifier to identify anomalous log sequence from set of generated log sequences. We used negative sampling to generate negative examples along normal logs for training the model. The model is evaluated on BGL dataset to identify alerts from non-alert messages. With an F1 score of 0.9 the model is able to identify true alerts from non-alert messages of test log samples.

### Reference

- https://arxiv.org/pdf/2202.04301.pdf
- https://ieeexplore.ieee.org/document/9671642