Accurate Open-set Recognition for Memory Workload

These are codes and datasets for "Accurate Open-set Recognition for Memory Workload", published in TKDD journal, 2023.

Dataset

We used 2 real-world workload sequence datasets in the experiment. The following table describes datasets used in our experiment.

dataset	# of known	# of unknown	# of train	# of test of known	# of test of unknown
SEC-seq	40	4	586,885	293,444	93,491
Memtest86-seq	31	3	433,334	216,696	77,018

Sample data for Memtest86-seq data

Due to the size limit (about 2TB) and the anonymous policy, we cannot upload the entire raw data for the second data. We upload the sample raw data and their feature vectors of the second dataset (Memtest86-seq).

raw_data: [Download]
intermediate_data:[Download]
final_data:[Download]

Full feature vectors for Memtest86-seq data

We also upload the full feature vectors of the second dataset (Memtest86-seq). If you train and test our model, we recommend using the full version of our feature vectors.

final_data_original:[Download]

Raw data description

The following is the first 10 rows in P01 workload:

5,1,1,0,648
5,1,0,0,648
1,0,2,1,35778
1,1,2,1,35778
5,0,2,1,640
1,0,1,3,32991
5,1,2,1,640
1,1,1,3,32991
5,0,1,3,640
5,1,1,3,640

Where each column indicates:

CMD - Command ID: Values: [1,3,5,6,7]
Rank - Rank number: Range: [0,1]
Bank Group - Bank group number. Range: [0,3]
Bank - Bank number. Range: [0,3]
Address - Row address number when CMD is ACT and col address number when CMD is RDA or WRA. Range: [-1, 131071]

Decoding for the CMD is as following:

1 - ACT
3 - RDA
5 - WRA
6 - PRE
7 - PREA

More about CMD field can be found in Documentation.

Model

We provide MLP and SVD-based detectors trained for the second dataset in models directory.

model_mlp.pt: a trained 2-layer MLP model
detectors.npy: SVD-based detectors for all class. Due to the size limit, we provide the download link: [Download].
detectors_threshold.npy: thresholds for all class

Code Information

Codes in this directory are implemented by Python 3.7. This repository contains the code for Acorn, an ACcurate Open-set recognition method for woRkload sequeNce.

The code of Acorn is in this directory.
- main.py: the code related to training a classification model, constructing unknown class detectors, and measuring accuracies for our metrics.
- svd_detector.py: the code related to constructing unknown class detectors and evaluating test samples with the detectors.
- model.py: the code related to a known classification model (MLP).
- preprocess.py: the code that preprocesses raw data and creates feature vectors.
- utils/workload_to_sep.py: the code that takes the cmd field from the raw workloads for the convenience of usage in further steps.
- utils/calculate_ngrams.py: the code that calculates n-grams for the training samples.
- utils/distributed_coverage_search.py: the code that creates n-gram set and calculates n-gram vectors.
- utils/bank_access_count.py: the code that counts access to each bank and creates bank access vectors.
- utils/row_col_address_access.py: the code that counts address access and creates address access vectors.
- utils/dataloader.py: the code related to loading workload sequence data, and extracting feature vectors.

Usage

The required Python packages are described in ./requirments.txt. If pip3 is installed on your system, you can type the following command to install the required packages:

    pip install -r requirements.txt

How to extract feature vectors for sample data

Type the following command to extract feature vectors for the cmd field and the address-related fields:

    python preprocess.py

The script will create the following two folders:

current directory
├── final_data
└── intermediate_data

and reprocessed files are stored in:

current directory
└── final_data
    ├── 7-grams
    ├── 11-grams
    ├── 15-grams
    ├── data_split_ids
    ├── bank_access_counts
    └── row_col_address_access_counts

How to train and test unknown workload detection for sample data

Type the following command for training and testing new workload detection:

    python main.py --data './final_data_original' --batch_size 128 --learning_rate 0.0001 --alpha 2 --only_test false

If you already have models and want to test them, type the following command:

    python main.py --data './final_data_original' --batch_size 128 --learning_rate 0.0001 --alpha 2 --only_test true

Citation

Please cite this paper when you use our code.

@article{jang2023accurate,
  title={Accurate Open-set Recognition for Memory Workload},
  author={Jang, Jun-Gi and Shim, Sooyeon and Egay, Vladimir and Lee, Jeeyong and Park, Jongmin and Chae, Suhyun and Kang, U},
  journal={ACM Transactions on Knowledge Discovery from Data},
  volume={17},
  number={9},
  pages={1--14},
  year={2023},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accurate Open-set Recognition for Memory Workload

Dataset

Sample data for Memtest86-seq data

Full feature vectors for Memtest86-seq data

Raw data description

Model

Code Information

Usage

How to extract feature vectors for sample data

How to train and test unknown workload detection for sample data

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
models		models
utils		utils
README.md		README.md
main.py		main.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
svd_detector.py		svd_detector.py

Folders and files

Latest commit

History

Repository files navigation

Accurate Open-set Recognition for Memory Workload

Dataset

Sample data for Memtest86-seq data

Full feature vectors for Memtest86-seq data

Raw data description

Model

Code Information

Usage

How to extract feature vectors for sample data

How to train and test unknown workload detection for sample data

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages