DOMI_dataset

DOMI_dataset is the dataset for the paper "Detecting Outlier Machine Instances through One Dimensional CNN Gaussian Mixture Variational AutoEncoder"

Data

Dataset Information

Dataset name	time length of each instance	metric number of each instance	matrix shape of each instance
DOMI_dataset	288	19	19 * 288
Training set size	Outlier Ratio in Training set (%)	Testing set size	Outlier Ratio in Testing set (%)
54630	18.62	27315	22.26

DOMI_dataset

DOMI_dataset is a server machine dataset collected from a top global Internet company. This dataset contains 1821 machines last for one and a half months, with 5-minute equal-spaced timestamps. Every instance named M-X@D-Y (means machine X at day Y) is a T * M matrix, where M and T are the number of metrics and time points in one day, respectively. In our dataset, each machine is constituted of 19 metrics (i.e., M=19), and each day has 288 time points (i.e., T=288).

We divide the overall dataset into two parts, the first month for training and the second half month for testing. For the testing dataset, we provide labels for outlier machine instances, and interpretation labels for outlier instances.

Thus DOMI_dataset is made up by the following parts:

train_data/: Training set.
test_data/: Testing set.
test_label/: The labels of the testing set, which indicate whether an instance is an outlier.
interpretation_label.txt: The ground truth lists of metrics that contribute to outlier judgment.

Detailedly, in train_data/, test_data/, test_label/, each file contains the monitoring data or ground truth of one machine, thus we have 1821 files (i.e., 1821 machines) in each directory. Moreover, for files in train_data/, test_data/, each line in the file is the 19 monitored metrics at one timestamp. There are 8640 (=288 * 30) and 4320 (=288 * 15) lines in train_data/ and test_data/, respectively. For files in test_label/, each line is the ground truth (1:outlier, 0:normal) of one machine instance (i.e., 288 lines of the corresponding file in test_data/), and thus each file contains 15 lines (i.e., 15 days). In interpretation_label.txt, each line is formatted as "X+Y,index1 of metric1,index2 of metric2,...", and it means the index list of metrics for Machine-X at Day-Y.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitattributes		.gitattributes
README.md		README.md
publicDataset.zip		publicDataset.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

README.md

README.md

publicDataset.zip

publicDataset.zip

Repository files navigation

DOMI_dataset

Data

Dataset Information

DOMI_dataset

About

Releases

Packages

Contributors 2

NetManAIOps/DOMI_dataset

Folders and files

Latest commit

History

Repository files navigation

DOMI_dataset

Data

Dataset Information

DOMI_dataset

About

Resources

Stars

Watchers

Forks