Skip to content

Latest commit

 

History

History

data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Datasets

Loghub_2k

The loghub_2k datasets are sampled from loghub logs, containing 2,000 lines of log messages for each log. The message templates are extracted based on regular expressions and then manually validated and annotated. The loghub_2k datasets have been initially used for benchmarking log parsers by the work "Tools and Benchmarks for Automated Log Parsing" in ICSE 2019.

Loghub_2k_corrected

The loghub_2k_corrected datasets are developed by the work "Guidelines for Assessing the Accuracy of Log Message Template Identification Techniques" in ICSE 2022, which further refines and fixes some of the incorrected ground-truth event templates of the original loghub_2k datasets.

Loghub

Loghub provides a large collection of system log datasets, which are freely accessible for AI-driven log analytics research. The raw logs can be accessed at https://github.com/logpai/loghub.

LogPub

Loghub provides large-scale raw logs, but lacks annotated event templates in such scale. To evaluate log parsers in a more rigorous and practical setting, LogPub provides large-scale mannual annotations for raw logs in Loghub. The LogPub datasets can be accessed at https://github.com/logpai/LogPub.