BERT-AE

Implementation of a Transformer based Approach to Anomaly-Detection in system logs.

Description

Setup

Project Dependencies

Management of project dependencies is done with Poetry. Installation of dependencies is done with

poetry install

To activate the environment, use:

poetry shell

If an CUDA-GPU is available on the system replace following lines in pyproject.toml

[tool.poetry.dependencies]
torch = { version = "^2.0.1", source = "torch-cpu" }
[tool.poetry.source]
name = "torch-cpu"
url = "https://download.pytorch.org/whl/cpu"
priority = "explicit"

with

[tool.poetry.dependencies]
torch = "^2.0.1"

Dataset

The evaluation is done on the HDFS Log Dataset. The dataset can be retrieved with the script fetch_hdfs.sh via

./HDFS/fetch_hdfs.sh

Beforehand you might want to make the script executable with

chmod 777 ./HDFS/fetch_data.sh

We apply LogParsing using the Drain3 Parser to bring the log data into a processable form. To do so, following commands from project root are necessary:

cd ./hdfs
python3 preprocess.py

Evaluation

Experiment Tracking

The capturing of Evaluation Metrics/Performance is done through a PyTorch Lightning. The Script currently uses the MLflow Experiment Tracker to track metrics. A local MLflow instance can be setup reasonably fast.

If you want to change the used Logger, you can switch out the used MLFlowLogger objects in ./HDFS/BERT-AE/training.py to one of Lightnings other loggers implementations.

logger = MLFlowLogger(experiment_name=ae_trainer_config.experiment_name,
                      run_name=f"{run_name}",
                      log_model=True)

To access the logged models, just start the MLFlow Server with

poetry shell
mlflow server

To server should be available under http://127.0.0.1:5000

Training & Evaluation

The training of the models is done through the training.py script under their respective directories. Evaluation of the models is also done in the same scripts. The scripts should be called from project root.

The hyperparameters of the models are documented in a separate README in their respective directories.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
HDFS		HDFS
figures		figures
src		src
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT-AE

Description

Setup

Project Dependencies

Dataset

Evaluation

Experiment Tracking

Training & Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BERT-AE

Description

Setup

Project Dependencies

Dataset

Evaluation

Experiment Tracking

Training & Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages