<font size=20>A novel pattern-based edit distance for automatic log parsing</font>

The purpose of this notebook is to allow users to reproduce the experiments described in the paper "A novel pattern-based edit distance for automatic log parsing" by M. Raynal, M.O. Buob and G. Quénot in the International Conference for Pattern Recognition 2022.

# Installation

To install the `pattern_clustering` module, please follow [these steps](https://github.com/nokia/pattern-clustering/wiki/Experiments-with-Drain-and-Logmine).

# Loading datasets and hyperparameters

## Dataset

In [2]:
import os
from utils import load_data

ROOT_PATH = "../../"
ROOT_LOG_PATH = os.path.join(ROOT_PATH, "logs/")

(templates_dict, logs_dict) = load_data(ROOT_LOG_PATH)
assert templates_dict
assert logs_dict

(templates_dict_modified, logs_dict_modified) = load_data(str(ROOT_LOG_PATH), load_modified=True)
assert templates_dict_modified
assert logs_dict_modified

## Hyperparameters

In [6]:
import pathlib
from utils import load_pattern_collection

PARAMETERS_PATH = pathlib.Path(ROOT_PATH) / "parameters/"

FUNDAMENTAL_COLLECTION = load_pattern_collection(PARAMETERS_PATH / "fundamental_collection.json")
assert FUNDAMENTAL_COLLECTION
BASIC_COLLECTION = load_pattern_collection(PARAMETERS_PATH / "basic_collection.json")
assert BASIC_COLLECTION
SPECIFIC_COLLECTION = load_pattern_collection(PARAMETERS_PATH / "specific_collection.json")
assert SPECIFIC_COLLECTION

## Experiments

### LogMine 

In [7]:
from utils import METRICS, evaluate_logmine_clustering, to_logmine_params

LOGMINE_REPO_PATH = os.path.join(ROOT_PATH, "..", "logmine")
APACHE_LOG_PATH = os.path.join(ROOT_LOG_PATH, "Apache/Apache_2k.log")

print(
    evaluate_logmine_clustering(
        LOGMINE_REPO_PATH,
        APACHE_LOG_PATH,
        templates_dict["Apache"],
        METRICS,
        max_dist=0.06,
        logmine_regexps=to_logmine_params(BASIC_COLLECTION)
    )
)

{'parsing accuracy': 0.2905, 'adjusted rand index': 0.9628006226024521, 'time': 0.5753438472747803, 'number of clusters': 4}


### Drain

In [8]:
from utils import METRICS, evaluate_drain_clustering

print(
    evaluate_drain_clustering(
        APACHE_LOG_PATH,
        templates_dict["Apache"],
        METRICS,
        BASIC_COLLECTION,
        show_clusters=False,
        sim_th=0.03,
        depth=3
    )
)

{'parsing accuracy': 1.0, 'adjusted rand index': 1.0, 'time': 0.5610649585723877, 'number of clusters': 6}


### Pattern clustering

In [9]:
import string
from utils import evaluate_fast_pattern_clustering, make_map_name_dfa_densities
from pattern_clustering.multi_grep import MultiGrepFunctorLargest

PC_BASIC_COLLECTION = {**FUNDAMENTAL_COLLECTION, **BASIC_COLLECTION}
ALPHABET = set(string.printable)
MAP_NAME_DFA, MAP_NAME_DENSITY = make_map_name_dfa_densities(PC_BASIC_COLLECTION, ALPHABET)

print(
    evaluate_fast_pattern_clustering(
        APACHE_LOG_PATH,
        templates_dict["Apache"],
        METRICS,
        MAP_NAME_DFA,
        MultiGrepFunctorLargest,
        MAP_NAME_DENSITY,
        0.15,
        show_clusters=False,
    )
)

{'parsing accuracy': 0.3065, 'adjusted rand index': 0.5517378285183966, 'time': 9.207051515579224, 'number of clusters': 4}
