<font size=20>A novel pattern-based edit distance for automatic log parsing</font>

The purpose of this notebook is to allow users to reproduce the experiments described in the paper "A novel pattern-based edit distance for automatic log parsing" by M. Raynal, M.O. Buob and G. Quénot in the International Conference for Pattern Recognition 2022.

# Installation

To install the `pattern_clustering` module, please follow [these steps](https://github.com/nokia/pattern-clustering/wiki/Experiments-with-Drain-and-Logmine).

# Loading datasets and hyperparameters

## Dataset

In [None]:
import pathlib
import os
from utils import load_data

ROOT_PATH = "."
ROOT_LOG_PATH = pathlib.Path(ROOT_PATH) / "logs"

(templates_dict, logs_dict) = load_data(ROOT_LOG_PATH)
assert templates_dict
assert logs_dict

(templates_dict_modified, logs_dict_modified) = load_data(ROOT_LOG_PATH, load_modified=True)
assert templates_dict_modified
assert logs_dict_modified

## Hyperparameters

In [None]:
import pathlib
from utils import load_pattern_collection

PARAMETERS_PATH = pathlib.Path(ROOT_PATH) / "parameters/"

FUNDAMENTAL_COLLECTION = load_pattern_collection(PARAMETERS_PATH / "fundamental_collection.json")
assert FUNDAMENTAL_COLLECTION
BASIC_COLLECTION = load_pattern_collection(PARAMETERS_PATH / "basic_collection.json")
assert BASIC_COLLECTION
SPECIFIC_COLLECTION = load_pattern_collection(PARAMETERS_PATH / "specific_collection.json")
assert SPECIFIC_COLLECTION

In [None]:
# Threshold values for the pattern clustering and logmine algorithms
# MAX_DIST_VALUES = [0.001, 0.003, 0.005, 0.007, 0.01, 0.015, 0.02, 0.025, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.12, 0.14, 0.16, 0.18, 0.2, 0.22, 0.24, 0.26, 0.28, 0.3, 0.32, 0.36, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.99]
MAX_DIST_VALUES = [0.1]
# Parameters for the drain algorithm
DRAIN_SIM_TH_LIST = [0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.99]
DRAIN_DEPTH_LIST = [3, 4, 5, 6]
DRAIN_PARAMS_LIST = [
    (sim_th, depth) for sim_th in DRAIN_SIM_TH_LIST for depth in DRAIN_DEPTH_LIST
]

LOGMINE_REPO_PATH = os.path.join(ROOT_PATH, "../../..", "logmine")

# Experiments

## Minimal collection

In this setting, each algorithm is given the same pattern collection as parameter for all logs.

In [None]:
from utils import eval_on_all_logs_and_save, METRICS


eval_on_all_logs_and_save(
    root_log_path=ROOT_LOG_PATH,
    output_path="results_minimal_collection.json",
    metrics=METRICS,
    templates_dict=templates_dict_modified,
    fundamental_collection=FUNDAMENTAL_COLLECTION,
    basic_collection=BASIC_COLLECTION,
    specific_collection=None,
    max_dist_pc_list=MAX_DIST_VALUES,
    max_dist_logmine_list=MAX_DIST_VALUES,
    drain_params_list=DRAIN_PARAMS_LIST,
    logmine_repo_path=LOGMINE_REPO_PATH,
)

## Specific collection

In [None]:
from utils import eval_on_all_logs_and_save, METRICS


eval_on_all_logs_and_save(
    root_log_path=ROOT_LOG_PATH,
    output_path="results_minimal_collection.json",
    metrics=METRICS,
    templates_dict=templates_dict_modified,
    fundamental_collection=FUNDAMENTAL_COLLECTION,
    basic_collection=BASIC_COLLECTION,
    specific_collection=SPECIFIC_COLLECTION,
    max_dist_pc_list=MAX_DIST_VALUES,
    max_dist_logmine_list=MAX_DIST_VALUES,
    drain_params_list=DRAIN_PARAMS_LIST,
    logmine_repo_path=LOGMINE_REPO_PATH,
)