X-WAD (eXplainable Web Anomaly Detection)

X-WAD is a tool for explainable web anomaly detection using Transformer-based Language Models (TLMs) to identify and explain malicious network traffic patterns.

This work was initially developed during my master thesis "Explainable Anomaly Detection in Web Data using Transformer-based Language Models".

demo_explainability.mp4

Features

Supports both CLM and MLM paradigms for anomaly detection and explainability
Natively supported models: ModernBERT (MLM) and SmolLM2-360M (CLM)
Evaluation on SRBH2020 (CAPEC) dataset

Installation

Install the required dependencies:

# Optionally create venv
python3 -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

Note on optimization dependencies

This tool supports specific optimizations for NVIDIA GPUs based on Ampere archietecture or newer. The package flash-attn is required for such optimization to be used, however it is not directly included in the requirements file for compatibility purposes. If you want to use the available optimization install the package manually.

Model download

To automatically download the models execute the setup.sh script.

Evaluation dataset download

SR-BH2020 Dataset

To download the SRBH2020 dataset needed for evaluation, download it from the above link and extract the file data_capec_multilabel.csv into this directory datasets/srbh/original/. It is possible to change the location of the dataset directory through the .env file.

Usage

The tool is based on three different phases:

The model need to be finetuned or trained from scratch on the specific dataset through the scripts in the experiment folder
(optional) the model can be evaluated on the dataset, through the same scripts in the experiment folder
(optional) The samples can be explained by the trained model with the explain.py script

Running Individual Experiments

You can run any experiment script directly from the experiments/srbh2020/ directory, for example:

cd experiments/srbh2020/
python srbh_modernbert.py --do-preprocess --do-split --do-train --do-test --do-val --do-plot

Each experiment supports the following optional flags:

--do-preprocess: Run data preprocessing
--do-split: Split dataset into train/val/test sets
--do-train: Train the model
--do-val: Run evaluation on test set
--do-test: Generate predictions on test set
--do-plot: Generate visualization plots

NOTE: After the dataset is preprocessed and splitted the first time, it is not necessary to do it again. Just remove --do-preprocess --do-split, otherwise the dataset splits will change.

Running All Experiments

To run all experiments with a single command, use the run_all.py script:

python run_all.py

Note: By default the script assumes that the dataset is in the datasets/ folder.

Explain the samples

This project is still a prototype and still misses a dedicated UI. In order to visualize the explainability it is possible to use the explain.py editing it to specify the model to use (that should have previously been trained) and the sample to be explained. The file is configured by default to explain a sample from the srbh fix dataset using SmolLM2-360M.

python explain.py

Project Structure

x-wad/
├── experiments/
│   └── srbh2020/
│       ├── srbh_modernbert.py      # ModernBERT original
│       ├── srbh_smol.py            # SmolLM2-360M original
│       ├── srbh_fix_modernbert_s.py # Fixed ModernBERT variant
│       ├── srbh_fix_smol_s.py      # Fixed SmolLM2-360M
│       └── tools.py                # Common utilities for srbh
├── run_all.py                      # Script to run all experiments
├── requirements.txt                # Python dependencies
└── README.md                       # This file

License

   Copyright 2026 Matteo Bitussi

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
common		common
experiments		experiments
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
THIRD-PARTY-LICENSES		THIRD-PARTY-LICENSES
explain.py		explain.py
requirements.txt		requirements.txt
run_all.py		run_all.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

X-WAD (eXplainable Web Anomaly Detection)

Features

Installation

Note on optimization dependencies

Model download

Evaluation dataset download

Usage

Running Individual Experiments

Running All Experiments

Explain the samples

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

X-WAD (eXplainable Web Anomaly Detection)

Features

Installation

Note on optimization dependencies

Model download

Evaluation dataset download

Usage

Running Individual Experiments

Running All Experiments

Explain the samples

Project Structure

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages