loghub/Spark at master · mkober/loghub

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
Spark_2k.log		Spark_2k.log
Spark_2k.log_structured.csv		Spark_2k.log_structured.csv
Spark_2k.log_templates.csv		Spark_2k.log_templates.csv

README.md

Spark

Apache Spark (https://spark.apache.org) is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. Spark has been widely deployed in industry for big data processing.

The log set was collected by aggregating logs from the Spark system in our lab at CUHK, which comprises a total of 32 machines. The logs are aggregated at the machine level. However, three machines have been repaired and unfortunately some logs are lost. The logs have a huge size (over 2GB) and are provided as-is without further modification or labelling, which involve both normal and abnormal application runs.

Download

The raw logs are available for downloading at https://github.com/logpai/loghub.

Citation

If you use this dataset from loghub in your research, please cite the following papers.

Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, Michael R. Lyu. Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics. IEEE International Symposium on Software Reliability Engineering (ISSRE), 2023.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark

Spark

README.md

Spark

Download

Citation

Files

Spark

Directory actions

More options

Directory actions

More options

Latest commit

History

Spark

Folders and files

parent directory

README.md

Spark

Download

Citation