
# A generic hands-on deep learning method for anomaly detection in sequential data

## Introduction

### Motivation

This notebook demonstrates how unexpected subsequences (anomalies) can be found in sequential data. The example implementation uses [software logs](https://en.m.wikipedia.org/wiki/Logging_(computing)).

Logs can be very helpful in understanding the behavior of a new system, application or environment we have only recently started to work with. The latter situation involves some incremental learning process - from human and ... possibly machine standpoint. That is, machine learning (ML) can be utilized as a powerful tool in log analysis.

Log messages, in general, are very specific to the activities being logged and thus can contain numeric data, so anomaly detection methods based on [natural language processing (NLP)](https://en.wikipedia.org/wiki/Natural_language_processing) or [bag-of-words (BoW)](https://en.wikipedia.org/wiki/Bag-of-words_model) can be applied only to a limited extent. For the purpose of genericity, the most important *component* of a log to be analyzed by this notebook's method is the *sequence* of log messages.

Focusing on the log message *sequences*, there are 2 main tasks involved in preparing log data for ML-aided analysis:
1. cleaning, filtering and sorting - to obtain only the log messages of interest;
2. classification or clusterization of the selected log messages into messages types to be processed as a sequence.

Both task 1 and task 2, but especially task 2, can be accomplished via ML.

Task 1 is important in filtering out only the relevant log information, especially when the log is huge in size (e.g., gigabytes). Regular expressions or even ML can be utilized for solving this task.

Task 2 is based on task 1, and for solving this task some ML text classification or clustering algorithm based on NLP or BoW can be utilized, as compared to using the pure regular expressions approach.

After task 1 and task 2 are completed, a log is converted to a sequence of *message types* rather than messages.

|  Log             |
|------------------|
|  Message type 1  |
|  ...             |
|  Message type n  |


However, mentioning *message type* is just for clarification regarding how the method works, so in this notebook a log entry is just called *log message*.

**For reasons of simplification and conciseness, in this notebook neither task 1 nor task 2 is demonstrated and both tasks are considered as already completed.** Thus, in the [Bluetooth log generator](#bluetooth-log-generator) section is implemented a log generator for simplified Bluetooth communication state changes.

### Method

As per [A survey on the application of deep learning for anomaly detection in logs](#a-survey-on-the-application-of-deep-learning-for-anomaly-detection-in-logs), there exist various sequence-based anomaly detection solutions. **In particular, this notebook demonstrates a hands-on, generic in its simplicity, method for finding anomalies in a potentially large amount of logs by utilizing a [Recurrent Neural Network (RNN)](https://en.m.wikipedia.org/wiki/Recurrent_neural_network).**

The selected method is a [semi-supervised](https://en.wikipedia.org/wiki/Weak_supervision) one based on the following step-by-step algorithm:
1. training process to learn possible log message sequences in already available normal log data (the supervised part);
2. next log message predictions on new log data;
3. [optional] if in step 2 a next log message cannot be predicted based on what is learned in step 1, a human intervention is needed to determine whether this is an anomaly or just a new log message to be learned (the unsupervised part);
4. [optional] if in step 3 are found new normal log messages, go to step 1, else: go to step 2.

#### Case study: anomaly detection in logged communication with remote devices via Bluetooth

In this notebook is demonstrated a method to solve an anomaly detection task: a Bluetooth communication is simplified into logged state changes among states `BT_STATE_NAMES` with each state change following one of the allowed transitions in `BT_STATE_TRANSITIONS`, and the task is to detect unexpected next state as part of a sequence of state changes.

Though not much needed to understand this case study as `state-to-state transitions` are the essence and [state machines](https://en.wikipedia.org/wiki/Finite-state_machine) are a general topic, some clarifications on the used Bluetooth states-related terminology would still be helpful.

A remote Bluetooth device is `unknown` if (obviously) not known to the host system, `pairing` is needed to make a remote device known to the host system and upon successful pairing the remote device becomes `connected` to the host while upon failed pairing the remote device becomes `unknown` again. To utilize a Bluetooth connection, some Bluetooth application protocol needs to be run for this connection or the remote device soon becomes `disconnected`. One standard Bluetooth application protocol (called profile) is `HF` (hands-free, to support a call app). Hence, `connected_hf` state allows for subsequent `call_app` state.

In [1]:
BT_STATE_NAMES = [
  "unknown", "discovery", "pairing",
  "disconnected", "connected",
  "connected_hf", "call_app",
]


BT_STATE_TRANSITIONS = {
  "unknown" : ["discovery"],
  "discovery" : ["unknown", "pairing"],
  "pairing" : ["connected", "unknown"],
  "disconnected" : ["connected", "unknown"],
  "connected" : ["disconnected", "connected_hf"],
  "connected_hf" : ["disconnected", "connected", "call_app"],
  "call_app" : ["connected", "connected_hf"],  # "disconnected"
}

## System setup

[TensorFlow](#tensorflow) and [Keras 2](#keras) are used.

In [None]:
import collections.abc as abc
import os
os.environ["TF_USE_LEGACY_KERAS"] = "1"
import random
import textwrap
import typing

import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.data as tf_data
import tensorflow.keras.backend as tf_backend
import tensorflow.keras.callbacks as tf_callbacks
import tensorflow.keras.layers as tf_layers
import tensorflow.keras.models as tf_models
import tensorflow.keras.saving as tf_saving
import tensorflow.keras.utils as tf_utils

## Configuration

In [3]:
# Common
RANDOM_SEED = 42

# Data
BT_DEVICE_COUNT = 3
BT_LOG_SIZE = 100_000
SEQUENCE_LENGTH = 100
VALIDATION_SPLIT = 10_000

# Output formatting
SEPARATOR = " " * 3

# Model
EMBEDDING_DIM = 256
RNN_UNITS = 1024

# Training
BATCH_SIZE = 128
BUFFER_SIZE = 10_000
EARLY_STOP_PATIENCE = 10
EPOCHS = 1_000

# Testing
NEXT_TOKEN_PROB_THRESHOLD = 0.02

## Bluetooth log generator

### Definitions

The Bluetooth communication log is generated for `BT_DEVICE_COUNT` devices by randomly following the allowed state transitions as per `BT_STATE_NAMES` and `BT_STATE_TRANSITIONS`. The last states of the `BT_DEVICE_COUNT` devices can be obtained via the `BtLogGenerator.device_states` method.

Additionally, here are defined 2 helper methods:
- `new_bt_log_gen` is used to construct a new log generator instance;
- `generate_bt_log` is used to generate a log of size `log_size`, based on the initial states obtained via `BtLogGenerator.device_states`.

The `bt_log_gen` log generator instance is global to the notebook and is used for incremental training.

In [4]:
class BtDevice:
  """Bluetooth device.

  Attributes:
    id: Identifier of this device.
    state_index: A valid index within BT_STATE_NAMES.
  """

  def __init__(self,
               id: int,
               state_index: int = BT_STATE_NAMES.index("unknown")):
    self.id = id
    self.state_index = state_index


class BtLogGenerator:
  """Bluetooth log generator implemented as a callable.

  The log is generated for BT_DEVICE_COUNT devices by randomly following
  the allowed state transitions as per BT_STATE_NAMES and BT_STATE_TRANSITIONS.
  The last states of the BT_DEVICE_COUNT devices can be obtained via the
  BtLogGenerator.device_states method.
  """

  def __init__(self,
               device_list: list[BtDevice]):
    assert device_list
    self._device_list = device_list

  def device_states(self) -> list[str]:
    return [self._state_repr(device.id, BT_STATE_NAMES[device.state_index])
            for device in self._device_list]

  def __call__(self) -> abc.Generator[str]:
    device_id = random.randint(0, len(self._device_list) - 1)
    device = self._device_list[device_id]
    new_state_index, new_state_name = self._pick_new_state(device.state_index)
    device.state_index = new_state_index
    yield self._state_repr(device_id, new_state_name)

  def _state_repr(self,
                  device_id: int,
                  state_name: str) -> str:
    return f"{device_id}-{state_name}"

  def _pick_new_state(self,
                      state_index: int) -> tuple[int, str]:
    state_name = BT_STATE_NAMES[state_index]
    new_states = BT_STATE_TRANSITIONS[state_name]
    new_state_name = random.choice(new_states)
    return BT_STATE_NAMES.index(new_state_name), new_state_name


def new_bt_log_gen() -> BtLogGenerator:
  return BtLogGenerator(device_list=[
    BtDevice(i) for i in range(BT_DEVICE_COUNT)
  ])


def generate_bt_log(bt_log_gen: BtLogGenerator,
                    log_size: int = BT_LOG_SIZE) -> list[str]:
  device_states = bt_log_gen.device_states()
  return [
    "\n".join(device_states + [next(bt_log_gen()) for _ in range(log_size)])
  ]


tf_utils.set_random_seed(RANDOM_SEED)
bt_log_gen = new_bt_log_gen()

### New log

`bt_log_data` contains newly-sampled BT communication log.

In [5]:
bt_log_data = generate_bt_log(bt_log_gen)

## Data preprocessing

The unique log messages are first converted to numeric tokens by the `tensorflow.keras.layers.TextVectorization` layer named `text_vectorizer`, then are split into training and validation data based on `VALIDATION_SPLIT`, and finally are transformed into a `tensorflow.data.Dataset` dataset. The dataset contains `BATCH_SIZE` batches of `SEQUENCE_LENGTH` shift-by-one series in tuples, with each tuple representing an old-new state transition. The training dataset `train_ds` is shuffled over `BUFFER_SIZE` numeric tokens; no need to do shuffling for the validation set `valid_ds`.

In [None]:
def to_dataset(sequence: abc.Iterable[int],
               shuffle: bool = False,
               random_seed: int | None = None) -> tf_data.Dataset:
  """Converts integers to a dataset of shift-by-one series in tuples.

  Args:
    sequence: An iterable of integers.
    shuffle: Whether to shuffle the input sequence.
    seed: The random seed in case shuffling is enabled.

  Returns:
    A dataset that is batched, can be shuffled, and is optimized for access.
  """
  dataset = tf_data.Dataset.from_tensor_slices(sequence)
  dataset = dataset.batch(SEQUENCE_LENGTH+1, drop_remainder=True)
  dataset = dataset.map(lambda seq: (seq[:-1], seq[1:]))
  if shuffle:
    dataset = dataset.shuffle(BUFFER_SIZE, seed=random_seed)
  dataset = dataset.batch(BATCH_SIZE, drop_remainder=False)
  dataset = dataset.prefetch(tf_data.AUTOTUNE)
  return dataset


@tf_saving.register_keras_serializable(package="text_sequence_analysis")
def tokenize(x: str) -> tf.RaggedTensor:
  return tf.strings.split(x, sep="\n")


text_vectorizer = tf_layers.TextVectorization(
  standardize=None, split=tokenize
)
text_vectorizer.adapt(bt_log_data)
bt_log_sequence = text_vectorizer(bt_log_data)[0]

train_ds = to_dataset(bt_log_sequence[:-VALIDATION_SPLIT],
                      shuffle=True, random_seed=RANDOM_SEED)
valid_ds = to_dataset(bt_log_sequence[-VALIDATION_SPLIT:])

## Sequence model

### Definitions

The preprocessed by `text_vectorizer` log sequences are learned by a basic RNN model named `sequence_model`, which consists of:
- a `tensorflow.keras.layers.Embedding` layer with `EMBEDDING_DIM` output vector size;
- a `tensorflow.keras.layers.GRU` sequence-to-sequence layer with `RNN_UNITS` number of units;
- a `tensorflow.keras.layers.Dense` multinomial classification layer with number of units matching the `text_vectorizer`'s vocabulary size.

`sequence_model` is trained for multi-label predictions via `softmax` activation, with `tensorflow.keras.losses.SparseCategoricalCrossentropy` used for loss and `tensorflow.keras.optimizers.Adam` used for optimizer.

In this notebook no hyperparameter tuning is performed because the main purpose of the notebook is to demonstrate a working example.

In [None]:
@tf_saving.register_keras_serializable(package="text_sequence_analysis")
class SequenceModel(tf_models.Model):
  """RNN-based model that learns sequential data.

  Attributes:
    embedding: A Keras Embedding layer which creates vectorized
      representations for tokenized by TextVectorizer text.
    gru: A Keras Gated Recurrent Unit (GRU) layer that learns sequences based
      on the embedding layers's output vectors.
    dense: A Keras Dense layer which performs multi-label next-token
      classification as the model's output.
  """
  def __init__(self,
               embedding_dim: int,
               rnn_units: int,
               vocab_size: int):
    super().__init__()
    self.embedding = tf_layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf_layers.GRU(
      rnn_units, return_sequences=True, return_state=True
    )
    self.dense = tf_layers.Dense(vocab_size, activation="softmax")

  def call(self,
           inputs: tf.Tensor,
           states: tf.Tensor | None = None,
           return_state: bool = False,
           training: bool = False) -> tf.Tensor:
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    return x


sequence_model = SequenceModel(
  embedding_dim=EMBEDDING_DIM,
  rnn_units=RNN_UNITS,
  vocab_size=text_vectorizer.vocabulary_size(),
)
sequence_model.build(input_shape=(None, 1))
sequence_model.compile(loss="sparse_categorical_crossentropy",
                       optimizer="adam")
sequence_model.summary()

### Training

The training process for `sequence_model` is defined in the `train` function. Training does not start in case of an already saved model, i.e., if a `text_sequence_analyzer.keras` file exists. If training does start, overfitting is prevented by usage of a `tensorflow.keras.callbacks.EarlyStopping` callback, with patience `EARLY_STOP_PATIENCE` and `val_loss` as the monitored metric. The maximum number of training epochs is configured in `EPOCHS`.

In [8]:
def train(sequence_model: tf_models.Model,
          train_ds: tf_data.Dataset,
          valid_ds: tf_data.Dataset,
          callbacks: list[abc.Callable]):
  history = sequence_model.fit(
    train_ds,
    validation_data=valid_ds,
    epochs=EPOCHS,
    callbacks=callbacks
  )
  plt.plot(history.history["val_loss"], "r--", label="val_loss")
  plt.xlabel("Epoch")
  plt.xlim([0, EPOCHS])
  plt.legend()
  plt.show()


early_stop_cb = tf_callbacks.EarlyStopping(
  monitor="val_loss",
  patience=EARLY_STOP_PATIENCE,
  restore_best_weights=True
)

if not os.path.exists("text_sequence_analyzer.keras"):
  train(sequence_model, train_ds, valid_ds,
        callbacks=[early_stop_cb])

## Text sequence analyzer

### Definitions

The anomaly detection model is defined here. This model is named `text_sequence_analyzer` and consists of the previously defined in the notebook `text_vectorizer` and `sequence_model`. Thus, `text_sequence_analyzer` is an end-to-end sequence analyzer capable of tokenizing raw textual inputs into sequences of token ids and predicting the next token id. The `detect_anomalies` method contains the anomaly detection logic - if a token id part of the input sequence is not predicted, `anomaly_detected_cb` is called with all data related to the anomaly.

If a `text_sequence_analyzer.keras` model file in Keras format exists - the model is loaded from this file, otherwise the model is saved to `text_sequence_analyzer.keras`.

In [9]:
@tf_saving.register_keras_serializable(package="text_sequence_analysis")
class TextSequenceAnalyzer(tf_models.Model):
  """Detects anomalies in text data represented as a sequence.

  This is an end-to-end model capable of tokenizing raw textual inputs
  into sequences of token ids to predict the next token id.
  The detect_anomalies method contains the anomaly detection logic -
  if a token id part of the input sequence is not predicted,
  anomaly_detected_cb is called with all data related to the anomaly.


  Attributes:
    text_vectorizer: A Keras TextVectorization layer which tokenizes
      raw textual data into a sequence of integer tokens.
    sequence_model: A SequenceModel which predicts next token based on
      the text_vectorizer's output sequence.
  """
  def __init__(self,
               text_vectorizer: tf_layers.TextVectorization,
               sequence_model: tf_models.Model):
    super().__init__()
    self.text_vectorizer = text_vectorizer
    self.sequence_model = sequence_model

  @tf_saving.register_keras_serializable(package="text_sequence_analysis")
  def detect_anomalies(self,
                       text: str,
                       anomaly_detected_cb: abc.Callable):
    """Detects anomalies in input text and notifies about them via callback.

    The input text is tokenized and the tokens are passed one-by-one
    to the model to repeatedly obtain the next token prediction. In every
    iteration the last token must be a valid next token as per the model,
    i.e., with probability greater than NEXT_TOKEN_PROB_THRESHOLD,
    otherwise anomaly is detected and notified via anomaly_detected_cb.

    Args:
      text: Input text that is to be tokenized by self.text_vectorizer.
      anomaly_detected_cb: A callback to be called when anomaly is detected.
    """
    sequence = []
    predictions = None
    states = None
    predicted_token_ids = []
    token_ids = self.text_vectorizer([text]).numpy().ravel().tolist()
    for token_id in token_ids:
      sequence += [token_id]
      if predictions is not None and token_id not in predicted_token_ids:
        anomaly_detected_cb(sequence, token_id, predicted_token_ids)
      predictions, states = self.sequence_model(
        inputs=tf.constant([[token_id]]), states=states, return_state=True
      )
      predictions = predictions[:, -1][0]
      predicted_token_ids = tf.where(
        predictions >= NEXT_TOKEN_PROB_THRESHOLD
      ).numpy().ravel().tolist()

  def get_config(self) -> dict[str, typing.Any]:
    config = super().get_config()
    config.update({
      "text_vectorizer" : tf_saving.serialize_keras_object(
        self.text_vectorizer
      ),
      "sequence_model" : tf_saving.serialize_keras_object(
        self.sequence_model
      ),
    })
    return config

  @classmethod
  def from_config(cls,
                  config: dict[str, typing.Any]) -> "TextSequenceAnalyzer":
    text_vectorizer = tf_saving.deserialize_keras_object(
      config["text_vectorizer"]
    )
    sequence_model = tf_saving.deserialize_keras_object(
      config["sequence_model"]
    )
    return cls(text_vectorizer, sequence_model)


if os.path.exists("text_sequence_analyzer.keras"):
  text_sequence_analyzer = tf_models.load_model(
    "text_sequence_analyzer.keras"
  )
  sequence_model = text_sequence_analyzer.sequence_model
else:
  text_sequence_analyzer = TextSequenceAnalyzer(
    text_vectorizer, sequence_model
  )
  text_sequence_analyzer.save("text_sequence_analyzer.keras")

### Anomaly detection

In this section the `test_for_anomalies` function demonstrates anomaly detection.

It must be noted that there can be *false positives* in text generated by `generate_bt_log`, this is when `sequence_model` is under-trained on otherwise valid Bluetooth state sequences. Such *false positives* can also be detected by setting "high" `NEXT_TOKEN_PROB_THRESHOLD` in combination with "low" `EPOCHS`, again causing `sequence_model` to have a high bias on `train_ds` and mispredict the next token. Setting too small values for `BT_LOG_SIZE`, `SEQUENCE` or `EARLY_STOP_PATIENCE` can also lead to under-trained model and *false positives*. On the other hand, an overfitting model can also have problems with *false positives* because it can assign high probabilities due to focusing too much on certain log messages.

Interestingly, in the above-mentioned cases anomaly detection can be used as an indicator for an under-trained model.

Also important is the regular case of *false positives* - new normal data which simply has to be learned. This case is demonstrated when `detect_anomalies` is called with `bt_log_data_call_app_disconnected` which contains a transition not part of `BT_STATE_TRANSITIONS`: `call_app -> disconnected`.

An actual case of anomaly is demonstrated when `detect_anomalies` is called with `bt_log_data_call_app_discovery` which contains an invalid transition :  `call_app -> discovery`.

In [None]:
def on_anomaly_detected(sequence: list[int],
                        token_id: int,
                        predicted_token_ids: list[int]):
  vocabulary = text_vectorizer.get_vocabulary()
  sequence_str = SEPARATOR.join([
    vocabulary[n_token_id] for n_token_id in sequence
  ])
  token_id_str = vocabulary[token_id]
  predicted_token_ids_str = SEPARATOR.join([
    vocabulary[p_token_id] for p_token_id in predicted_token_ids
  ])
  print("ANOMALY DETECTED")
  print("-" * 80)
  print(f"SEQUENCE:")
  text_wrapper = textwrap.TextWrapper()
  sequence_lines = text_wrapper.wrap(sequence_str)
  for line in sequence_lines:
    print(line)
  print("-" * 80)
  print(f"{token_id_str}\nNOT IN")
  predicted_token_ids_lines = text_wrapper.wrap(
    predicted_token_ids_str
  )
  for line in predicted_token_ids_lines:
    print(line)
  print("\n\n")


def test_for_anomalies(text_sequence_analyzer: tf_models.Model,
                       anomaly_detected_cb: abc.Callable):
  bt_log_data = generate_bt_log(new_bt_log_gen(), SEQUENCE_LENGTH)[0]
  # There should be no anomalies with generate_bt_log
  # unless the model is undertrained.
  text_sequence_analyzer.detect_anomalies(bt_log_data, anomaly_detected_cb)

  bt_log_data_call_app_disconnected = (
    "0-unknown\n1-unknown\n2-unknown\n"
    "1-discovery\n1-pairing\n1-connected\n"
    "1-connected_hf\n1-call_app\n1-disconnected"
  )
  text_sequence_analyzer.detect_anomalies(
    bt_log_data_call_app_disconnected, anomaly_detected_cb
  )

  bt_log_data_call_app_discovery = (
    "0-unknown\n1-unknown\n2-unknown\n"
    "1-discovery\n1-pairing\n1-connected\n"
    "1-connected_hf\n1-call_app\n1-discovery"
  )
  text_sequence_analyzer.detect_anomalies(
    bt_log_data_call_app_discovery, anomaly_detected_cb
  )


test_for_anomalies(text_sequence_analyzer, on_anomaly_detected)

### Corrective action

`BT_STATE_TRANSITIONS` is updated with the missing transition, so that new log generation includes the new transition and thus `sequence_model` can learn that this transition is not anomalous.

In [11]:
if BT_STATE_TRANSITIONS["call_app"].count("disconnected") == 0:
  BT_STATE_TRANSITIONS["call_app"].append("disconnected")

### Incremental training and evaluation

This section performs incremental training and evaluation by summarized code from other sections. Given that the code in the [Corrective action](#corrective-action) section has been executed, after some incremental training no *false positives* should be detected here.

In [None]:
if os.path.exists("text_sequence_analyzer.keras"):
  text_sequence_analyzer = tf_models.load_model(
    "text_sequence_analyzer.keras"
  )
  sequence_model = text_sequence_analyzer.sequence_model
bt_log_data = generate_bt_log(bt_log_gen)
bt_log_sequence = text_vectorizer(bt_log_data)[0]
train_ds = to_dataset(bt_log_sequence[:-VALIDATION_SPLIT],
                      shuffle=True, random_seed=RANDOM_SEED)
valid_ds = to_dataset(bt_log_sequence[-VALIDATION_SPLIT:])
train(sequence_model, train_ds, valid_ds,
      callbacks=[early_stop_cb])

test_for_anomalies(text_sequence_analyzer, on_anomaly_detected)
text_sequence_analyzer.save("text_sequence_analyzer.keras")

## Global state reset

In [13]:
tf_backend.clear_session()

## Towards a comprehensive end-to-end solution

To develop an end-to-end solution, the mentioned in the [Motivation](#motivation) section tasks 1 and 2 can be implemented in addition to this notebook's sequential method. However, it is problematic to achieve a *generic* solution for log message filtering, classification or clusterization. Thus, as a subject to further investigation, some *customized configuration* can provide for log message filtering, classification or clusterization in a generic way.

## References

### A survey on the application of deep learning for anomaly detection in logs

<pre style="white-space: pre-wrap;">
Patrick Himler, Max Landauer, Florian Skopik, Markus Wurzenberger,
Anomaly detection in log-event sequences: A federated deep learning approach and open challenges,
Machine Learning with Applications,
Volume 16,
2024,
100554,
ISSN 2666-8270,
https://doi.org/10.1016/j.mlwa.2024.100554.
(https://www.sciencedirect.com/science/article/pii/S2666827024000306)
Abstract: Anomaly Detection (AD) is an important area to reliably detect malicious behavior and attacks on computer systems. Log data is a rich source of information about systems and thus provides a suitable input for AD. With the sheer amount of log data available today, for years Machine Learning (ML) and more recently Deep Learning (DL) have been applied to create models for AD. Especially when processing complex log data, DL has shown some promising results in recent research to spot anomalies. It is necessary to group these log lines into log-event sequences, to detect anomalous patterns that span over multiple log lines. This work uses a centralized approach using a Long Short-Term Memory (LSTM) model for AD as its basis which is one of the most important approaches to represent long-range temporal dependencies in log-event sequences of arbitrary length. Therefore, we use past information to predict whether future events are normal or anomalous. For the LSTM model we adapt a state of the art open source implementation called LogDeep. For the evaluation, we use a Hadoop Distributed File System (HDFS) data set, which is well studied in current research. In this paper we show that without padding, which is a commonly used preprocessing step that strongly influences the AD process and artificially improves detection results and thus accuracy in lab testing, it is not possible to achieve the same high quality of results shown in literature. With the large quantity of log data, issues arise with the transfer of log data to a central entity where model computation can be done. Federated Learning (FL) tries to overcome this problem, by learning local models simultaneously on edge devices and overcome biases due to a lack of heterogeneity in training data through exchange of model parameters and finally arrive at a converging global model. Processing log data locally takes privacy and legal concerns into account, which could improve coordination and collaboration between researchers, cyber security companies, etc., in the future. Currently, there are only few scientific publications on log-based AD which use FL. Implementing FL gives the advantage of converging models even if the log data are heterogeneously distributed among participants as our results show. Furthermore, by varying individual LSTM model parameters, the results can be greatly improved. Further scientific research will be necessary to optimize FL approaches.
Keywords: Log event sequences; Anomaly detection; Deep learning; LSTM; Federated learning
</pre>

### GitHub repos
- [GitHub - ageron/handson-ml3: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.](https://github.com/ageron/handson-ml3)
  - [handson-ml3/16_nlp_with_rnns_and_attention.ipynb at main · ageron/handson-ml3 · GitHub](https://github.com/ageron/handson-ml3/blob/main/16_nlp_with_rnns_and_attention.ipynb)

### Guides and tutorials
- [Save, serialize, and export models  |  TensorFlow Core](https://www.tensorflow.org/guide/keras/serialization_and_saving#custom_objects)
- [Text generation with an RNN  |  TensorFlow](https://www.tensorflow.org/text/tutorials/text_generation)
- [Working with preprocessing layers  |  TensorFlow Core](https://www.tensorflow.org/guide/keras/preprocessing_layers)

### Keras

<pre style="white-space: pre-wrap;">
Chollet, François and others. (2015). Keras. Retrieved from https://keras.io
</pre>

- [Getting started with Keras](https://keras.io/getting_started/#tensorflow--keras-2-backwards-compatibility)

### Matplotlib

<pre style="white-space: pre-wrap;">
J. D. Hunter, "Matplotlib: A 2D Graphics Environment," in Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, May-June 2007, doi: 10.1109/MCSE.2007.55. keywords: {Graphics;Interpolation;Equations;Graphical user interfaces;Packaging;Image generation;User interfaces;Operating systems;Computer languages;Programming profession;Python;scripting languages;application development;scientific programming},
</pre>

### TensorFlow

<pre style="white-space: pre-wrap;">
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo,
Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis,
Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow,
Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia,
Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster,
Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens,
Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker,
Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas,
Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke,
Yuan Yu, and Xiaoqiang Zheng.
TensorFlow: Large-scale machine learning on heterogeneous systems,
2015. Software available from tensorflow.org.
</pre>

- [Introduction to TensorFlow](https://www.tensorflow.org/learn)