Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 43 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
# MLPerf™ Storage Benchmark Suite
MLPerf Storage is a benchmark suite to characterize the performance of storage systems that support machine learning workloads.

- [Overview](#Overview)
- [Installation](#Installation)
- [Configuration](#Configuration)
- [Workloads](#Workloads)
- [U-Net3D](#U-Net3D)
- [BERT](#BERT)
- [DLRM](#DLRM)
- [Parameters](#Parameters)
- [Releases](#Releases)
- [Overview](#overview)
- [Installation](#installation)
- [Configuration](#configuration)
- [Workloads](#workloads)
- [U-Net3D](#u-net3d)
- [BERT](#bert)
- [DLRM](#dlrm)
- [Parameters](#parameters)
- [CLOSED](#closed)
- [OPEN](#open)
- [Submission Rules](#submission-rules)
## Overview

This section describes how to use the MLPerf™ Storage Benchmark to measure the performance of a storage system supporting a compute cluster running AI/ML training tasks.
Expand Down Expand Up @@ -69,6 +71,7 @@ The working directory structure is as follows
```
|---storage
|---benchmark.sh
|---report.py
|---dlio_benchmark
|---storage-conf
|---workload(folder contains configs of all workloads)
Expand Down Expand Up @@ -165,21 +168,21 @@ For running benchmark on `unet3d` workload with data located in `unet3d_data` di
./benchmark.sh run --workload unet3d --num-accelerators 4 --results-dir unet3d_results --param dataset.data_folder=unet3d_data
```

4. Reports are generated from the benchmark results
4. Benchmark submission report is generated by aggregating the individual run results.

```bash
./benchmark.sh reportgen -h

Usage: ./benchmark.sh reportgen [options]
Generate a report from the benchmark results. Supports single host and multi host run.
Generate a report from the benchmark results.


Options:
-h, --help Print this message
-r, --results-dir Location to the results directory
```

For multi-host run, the results need to be in the following structure.
The result directory needs to be in the following structure which must include 5 runs.

```
sample-results
Expand All @@ -200,7 +203,7 @@ sample-results
|---host-n
|---summary.json
.....
|---run-n
|---run-5
|---host-1
|---summary.json
|---host-2
Expand All @@ -210,20 +213,21 @@ sample-results
|---summary.json
```

To generate multi host report,
To generate the benchmark report,

```bash
./benchmark.sh reportgen --results-dir sample-results/
```

For reference, a sample result directory structure can be found [here](https://github.com/johnugeorge/mlperf-storage-sample-results).

## Workloads
Currently, the storage benchmark suite supports benchmarking of 3 deep learning workloads
- Image segmentation using U-Net3D model ([unet3d](./storage-conf/workloads/unet3d.yaml))
- Natural language processing using BERT model ([bert](./storage-conf/workloads/bert.yaml))
- Recommendation using DLRM model (TODO)

### U-Net3D Workload
### U-Net3D

Calculate minimum dataset size required for the benchmark run

Expand All @@ -243,14 +247,14 @@ Run the benchmark.
./benchmark.sh run --workload unet3d --num-accelerators 8 --param dataset.num_files_train=3200
```

All results will be stored in ```results/unet3d/$DATE-$TIME``` folder or in the directory when overriden using `--results-dir`(or `-r`) argument. To generate the final report, one can do
All results will be stored in ```results/unet3d/$DATE-$TIME``` folder or in the directory when overridden using `--results-dir`(or `-r`) argument. To generate the final report, one can do

```bash
./benchmark.sh reportgen --results-dir results/unet3d/$DATE-$TIME
```
This will generate ```mlperf_storage_report.json``` in the output folder.

### BERT Workload
### BERT

Calculate minimum dataset size required for the benchmark run

Expand All @@ -269,21 +273,22 @@ Run the benchmark
./benchmark.sh run --workload bert --num-accelerators 8 --param dataset.num_files_train=350
```

All results will be stored in ```results/bert/$DATE-$TIME``` folder or in the directory when overriden using `--results-dir`(or `-r`) argument. To generate the final report, one can do
All results will be stored in ```results/bert/$DATE-$TIME``` folder or in the directory when overridden using `--results-dir`(or `-r`) argument. To generate the final report, one can do

```bash
./benchmark.sh reportgen -r results/bert/$DATE-$TIME
```
This will generate ```mlperf_storage_report.json``` in the output folder.


### DLRM Workload
### DLRM

To be added

## Parameters

Below table displays the list of configurable paramters for the benchmark.
### CLOSED
Below table displays the list of configurable parameters for the benchmark in the closed category.

| Parameter | Description |Default|
| ------------------------------ | ------------------------------------------------------------ |-------|
Expand All @@ -293,10 +298,27 @@ Below table displays the list of configurable paramters for the benchmark.
| dataset.data_folder | The path where dataset is stored | --|
| **Reader params** | | |
| reader.read_threads | Number of threads to load the data | --|
| reader.computation_threads | Number of threads to preprocess the data(only for bert) | --|
| reader.computation_threads | Number of threads to preprocess the data(only for Bert) | --|
| **Checkpoint params** | | |
| checkpoint.checkpoint_folder | The folder to save the checkpoints | --|
| **Storage params** | | |
| storage.storage_root | The storage root directory | ./|
| storage.storage_type | The storage type |local_fs|


### OPEN
In addition to what can be changed in the CLOSED category, the following parameters can be changed in the OPEN category.

| Parameter | Description |Default|
| ------------------------------ | ------------------------------------------------------------ |-------|
| framework | The machine learning framework |Pytorch for 3D U-Net, Tensorflow for Bert |
| **Dataset params** | | |
| dataset.format | Format of the dataset | .npz for 3D U-Net and tfrecord for Bert|
| dataset.num_samples_per_file | Number of samples per file(only for Tensorflow using tfrecord datasets) | For 3D U-Net: 1 and for Bert: 313532|
| **Reader params** |
| reader.data_loader | Data loader type(Tensorflow or PyTorch or custom) | PyTorch for 3D U-Net, and Tensorflow for Bert|
| reader.transfer_size | Number of bytes in the read buffer(only for Tensorflow) | For BERT: 262144|

## Submission Rules

MLPerf™ Storage Benchmark submission rules are described in this [doc](https://docs.google.com/document/d/1QOaCLiWb82H9cwdVX5KyeDZWt0781y4SgMQPhoij-b4/edit). If you have questions, please contact [Storage WG chairs](https://mlcommons.org/en/groups/research-storage/).
10 changes: 8 additions & 2 deletions benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ WORKLOADS=("unet3d" "bert")
UNET3D_CONFIG_FILE=${CONFIG_PATH}/workload/unet3d.yaml
BERT_CONFIG_FILE=${CONFIG_PATH}/workload/bert.yaml
# Currently only "closed" category is supported
CATEGORIES=("closed")
CATEGORIES=("closed" "open")
DEFAULT_CATEGORY="closed"
CLOSED_CATEGORY_PARAMS=(
# dataset params
Expand All @@ -25,6 +25,12 @@ CLOSED_CATEGORY_PARAMS=(
OPEN_CATEGORY_PARAMS=(
# all closed params
"${CLOSED_CATEGORY_PARAMS[@]}"
# framework params
"framework"
# dataset params
"dataset.format" "dataset.num_samples_per_file"
# reader params
"reader.data_loader" "reader.transfer_size"
)
HYDRA_OUTPUT_CONFIG_DIR="configs"
EXTRA_PARAMS=(
Expand Down Expand Up @@ -272,7 +278,7 @@ configview() {

postprocess() {
local results_dir=$1
python3 report.py --result-dir $results_dir --multi-host --create-report
python3 report.py --result-dir $results_dir
}

main() {
Expand Down
Loading