Skip to content

snudatalab/Acorn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Accurate Open-set Recognition for Memory Workload

These are codes and datasets for "Accurate Open-set Recognition for Memory Workload", published in TKDD journal, 2023.

Dataset

We used 2 real-world workload sequence datasets in the experiment. The following table describes datasets used in our experiment.

dataset # of known # of unknown # of train # of test of known # of test of unknown
SEC-seq 40 4 586,885 293,444 93,491
Memtest86-seq 31 3 433,334 216,696 77,018

Sample data for Memtest86-seq data

Due to the size limit (about 2TB) and the anonymous policy, we cannot upload the entire raw data for the second data. We upload the sample raw data and their feature vectors of the second dataset (Memtest86-seq).

Full feature vectors for Memtest86-seq data

We also upload the full feature vectors of the second dataset (Memtest86-seq). If you train and test our model, we recommend using the full version of our feature vectors.

Raw data description

The following is the first 10 rows in P01 workload:

5,1,1,0,648
5,1,0,0,648
1,0,2,1,35778
1,1,2,1,35778
5,0,2,1,640
1,0,1,3,32991
5,1,2,1,640
1,1,1,3,32991
5,0,1,3,640
5,1,1,3,640

Where each column indicates:

  1. CMD - Command ID: Values: [1,3,5,6,7]
  2. Rank - Rank number: Range: [0,1]
  3. Bank Group - Bank group number. Range: [0,3]
  4. Bank - Bank number. Range: [0,3]
  5. Address - Row address number when CMD is ACT and col address number when CMD is RDA or WRA. Range: [-1, 131071]

Decoding for the CMD is as following:

  • 1 - ACT
  • 3 - RDA
  • 5 - WRA
  • 6 - PRE
  • 7 - PREA

More about CMD field can be found in Documentation.

Model

We provide MLP and SVD-based detectors trained for the second dataset in models directory.

  • model_mlp.pt: a trained 2-layer MLP model
  • detectors.npy: SVD-based detectors for all class. Due to the size limit, we provide the download link: [Download].
  • detectors_threshold.npy: thresholds for all class

Code Information

Codes in this directory are implemented by Python 3.7. This repository contains the code for Acorn, an ACcurate Open-set recognition method for woRkload sequeNce.

  • The code of Acorn is in this directory.
    • main.py: the code related to training a classification model, constructing unknown class detectors, and measuring accuracies for our metrics.
    • svd_detector.py: the code related to constructing unknown class detectors and evaluating test samples with the detectors.
    • model.py: the code related to a known classification model (MLP).
    • preprocess.py: the code that preprocesses raw data and creates feature vectors.
    • utils/workload_to_sep.py: the code that takes the cmd field from the raw workloads for the convenience of usage in further steps.
    • utils/calculate_ngrams.py: the code that calculates n-grams for the training samples.
    • utils/distributed_coverage_search.py: the code that creates n-gram set and calculates n-gram vectors.
    • utils/bank_access_count.py: the code that counts access to each bank and creates bank access vectors.
    • utils/row_col_address_access.py: the code that counts address access and creates address access vectors.
    • utils/dataloader.py: the code related to loading workload sequence data, and extracting feature vectors.

Usage

The required Python packages are described in ./requirments.txt. If pip3 is installed on your system, you can type the following command to install the required packages:

    pip install -r requirements.txt

How to extract feature vectors for sample data

Type the following command to extract feature vectors for the cmd field and the address-related fields:

    python preprocess.py

The script will create the following two folders:

current directory
├── final_data
└── intermediate_data

and reprocessed files are stored in:

current directory
└── final_data
    ├── 7-grams
    ├── 11-grams
    ├── 15-grams
    ├── data_split_ids
    ├── bank_access_counts
    └── row_col_address_access_counts

How to train and test unknown workload detection for sample data

Type the following command for training and testing new workload detection:

    python main.py --data './final_data_original' --batch_size 128 --learning_rate 0.0001 --alpha 2 --only_test false

If you already have models and want to test them, type the following command:

    python main.py --data './final_data_original' --batch_size 128 --learning_rate 0.0001 --alpha 2 --only_test true

Citation

Please cite this paper when you use our code.

@article{jang2023accurate,
  title={Accurate Open-set Recognition for Memory Workload},
  author={Jang, Jun-Gi and Shim, Sooyeon and Egay, Vladimir and Lee, Jeeyong and Park, Jongmin and Chae, Suhyun and Kang, U},
  journal={ACM Transactions on Knowledge Discovery from Data},
  volume={17},
  number={9},
  pages={1--14},
  year={2023},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages