Skip to content

Commit

Permalink
Release - SuperBench v0.4.0 (#278)
Browse files Browse the repository at this point in the history
__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)

Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
  • Loading branch information
abuccts and yukirora committed Dec 30, 2021
1 parent 682ed06 commit ff563b6
Show file tree
Hide file tree
Showing 68 changed files with 2,012 additions and 1,621 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@
[![Docker Pulls](https://img.shields.io/docker/pulls/superbench/superbench.svg)](https://hub.docker.com/r/superbench/superbench/tags)
[![License](https://img.shields.io/github/license/microsoft/superbenchmark.svg)](LICENSE)

| Azure Pipelines | Build Status |
| :---: | :---: |
| cpu-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cpu-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=77&branchName=main) |
| cuda-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cuda-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=80&branchName=main) |
| Azure Pipelines | Build Status |
|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cpu-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cpu-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=77&branchName=main) |
| cuda-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cuda-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=80&branchName=main) |
| ansible-integration-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/ansible-integration-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=82&branchName=main) |

__SuperBench__ is a validation and profiling tool for AI infrastructure.

📢 [v0.3.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.3.0) has been released!
📢 [v0.4.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.4.0) has been released!

## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._

Expand Down
18 changes: 9 additions & 9 deletions dockerfile/rocm4.0-pytorch1.7.0.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -63,26 +63,26 @@ RUN mkdir -p /root/.ssh && \
echo -e "* soft nofile 1048576\n* hard nofile 1048576" >> /etc/security/limits.conf && \
echo -e "root soft nofile 1048576\nroot hard nofile 1048576" >> /etc/security/limits.conf

# Install OFED
ENV OFED_VERSION=5.2-2.2.3.0
RUN cd /tmp && \
wget -q http://content.mellanox.com/ofed/MLNX_OFED-${OFED_VERSION}/MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tgz && \
tar xzf MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tgz && \
PATH=/usr/bin:${PATH} MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64/mlnxofedinstall --user-space-only --without-fw-update --force --all && \
rm -rf MLNX_OFED_LINUX-${OFED_VERSION}*

# Install OpenMPI
ENV OPENMPI_VERSION=4.0.5
RUN cd /tmp && \
wget -q https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz && \
tar xzf openmpi-${OPENMPI_VERSION}.tar.gz && \
cd openmpi-${OPENMPI_VERSION} && \
./configure --enable-orterun-prefix-by-default && \
./configure --enable-orterun-prefix-by-default --with-ucx=/usr --enable-mca-no-build=btl-uct && \
make -j $(nproc) all && \
make install && \
ldconfig && \
rm -rf /tmp/openmpi-${OPENMPI_VERSION}*

# Install OFED
ENV OFED_VERSION=5.2-2.2.3.0
RUN cd /tmp && \
wget -q http://content.mellanox.com/ofed/MLNX_OFED-${OFED_VERSION}/MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tgz && \
tar xzf MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tgz && \
PATH=/usr/bin:${PATH} MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64/mlnxofedinstall --user-space-only --without-fw-update --force --all && \
rm -rf MLNX_OFED_LINUX-${OFED_VERSION}*

# Install HPC-X
RUN cd /opt && \
wget -q https://azhpcstor.blob.core.windows.net/azhpc-images-store/hpcx-v2.8.3-gcc-MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tbz && \
Expand Down
2 changes: 1 addition & 1 deletion dockerfile/rocm4.2-pytorch1.7.0.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ RUN cd /tmp && \
wget -q https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz && \
tar xzf openmpi-${OPENMPI_VERSION}.tar.gz && \
cd openmpi-${OPENMPI_VERSION} && \
./configure --enable-orterun-prefix-by-default && \
./configure --enable-orterun-prefix-by-default --with-ucx=/opt/ucx --enable-mca-no-build=btl-uct && \
make -j $(nproc) all && \
make install && \
ldconfig && \
Expand Down
2 changes: 1 addition & 1 deletion docs/getting-started/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ You can clone the source from GitHub and build it.
:::note Note
You should checkout corresponding tag to use release version, for example,

`git clone -b v0.3.0 https://github.com/microsoft/superbenchmark`
`git clone -b v0.4.0 https://github.com/microsoft/superbenchmark`
:::

```bash
Expand Down
2 changes: 1 addition & 1 deletion docs/getting-started/run-superbench.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password]
:::note Note
You should deploy corresponding Docker image to use release version, for example,

`sb deploy -f local.ini -i superbench/superbench:v0.3.0-cuda11.1.1`
`sb deploy -f local.ini -i superbench/superbench:v0.4.0-cuda11.1.1`

You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone.

Expand Down
2 changes: 1 addition & 1 deletion docs/superbench-config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ superbench:
<TabItem value='example'>

```yaml
version: v0.3
version: v0.4
superbench:
enable: benchmark_1
monitor:
Expand Down
33 changes: 31 additions & 2 deletions docs/user-tutorial/benchmarks/micro-benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,40 @@ Large scale matmul operation using `torch.matmul` with one GPU.

### `cublas-function`

TODO
#### Introduction

Measure the performance of most common Nvidia cuBLAS functions with parameters in models training including ResNet, VGG, DenseNet, LSTM, BERT, and GPT-2.

The supported functions for cuBLAS are as follows:
- cublasSgemm
- cublasSgemmStridedBatched
- cublasGemmStridedBatchedEx
- cublasGemmEx
- cublasCgemm3mStridedBatched
- cublasCgemm

#### Metrics

| Name | Unit | Description |
|----------------------------------------------------------|-----------|-------------------------------------------------------------------|
| cublas-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cublas function with the parameters. |

### `cudnn-function`

TODO
#### Introduction

Measure the performance of most common Nvidia cuDNN functions with parameters in models training including ResNet, VGG, DenseNet, LSTM, BERT, and GPT-2.

The supported functions for cuDNN are as follows:
- cudnnConvolutionBackwardFilter
- cudnnConvolutionBackwardData
- cudnnConvolutionForward

#### Metrics

| Name | Unit | Description |
|---------------------------------------------------------|-----------|------------------------------------------------------------------|
| cudnn-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cudnn function with the parameters. |

### `tensorrt-inference`

Expand Down
3 changes: 3 additions & 0 deletions docs/user-tutorial/container-images.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ available tags are listed below for all stable versions.

| Tag | Description |
| ----------------- | ---------------------------------- |
| v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 |
| v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 |
| v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 |
| v0.2.0-cuda11.1.1 | SuperBench v0.2.0 with CUDA 11.1.1 |
Expand All @@ -38,6 +39,8 @@ available tags are listed below for all stable versions.

| Tag | Description |
| --------------------------- | ---------------------------------------------- |
| v0.4.0-rocm4.2-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.2, PyTorch 1.7.0 |
| v0.4.0-rocm4.0-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.0, PyTorch 1.7.0 |
| v0.3.0-rocm4.2-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.2, PyTorch 1.7.0 |
| v0.3.0-rocm4.0-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.0, PyTorch 1.7.0 |

Expand Down
2 changes: 1 addition & 1 deletion docs/user-tutorial/data-diagnosis.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ superbench:
example:
```yaml
# SuperBench rules
version: v0.3
version: v0.4
superbench:
rules:
failure-rule:
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ def run(self):
'pytest>=6.2.2',
'types-pyyaml',
'vcrpy>=4.1.1',
'yapf>=0.30.0',
'yapf==0.31.0',
],
'nvidia': ['py3nvml>=0.2.6'],
'ort': [
Expand Down
2 changes: 1 addition & 1 deletion superbench/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@
Provide hardware and software benchmarks for AI systems.
"""

__version__ = '0.3.0'
__version__ = '0.4.0'
__author__ = 'Microsoft'
31 changes: 19 additions & 12 deletions superbench/analyzer/data_diagnosis.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@

import re
from typing import Callable
from pathlib import Path

import pandas as pd

from superbench.common.utils import logger
from superbench.analyzer.diagnosis_rule_op import RuleOp, DiagnosisRuleType
import superbench.analyzer.file_handler as file_handler
from superbench.analyzer import file_handler


class DataDiagnosis():
Expand All @@ -31,10 +32,15 @@ def _get_metrics_by_benchmarks(self, metrics_list):
"""
benchmarks_metrics = {}
for metric in metrics_list:
benchmark = metric.split('/')[0]
if benchmark not in benchmarks_metrics:
benchmarks_metrics[benchmark] = set()
benchmarks_metrics[benchmark].add(metric)
if '/' not in metric:
logger.warning(
'DataDiagnosis: get_metrics_by_benchmarks - {} does not have benchmark_name'.format(metric)
)
else:
benchmark = metric.split('/')[0]
if benchmark not in benchmarks_metrics:
benchmarks_metrics[benchmark] = set()
benchmarks_metrics[benchmark].add(metric)
return benchmarks_metrics

def _check_rules(self, rule, name):
Expand Down Expand Up @@ -133,6 +139,7 @@ def _get_criteria(self, rule_file, baseline_file):
if re.search(metric_regex, metric):
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
self._enable_metrics.append(metric)
self._enable_metrics.sort()
except Exception as e:
logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e)))
return False
Expand Down Expand Up @@ -171,8 +178,8 @@ def _run_diagnosis_rules_for_single_node(self, node):
issue_label = True
if issue_label:
# Add category information
general_cat_str = ','.join(categories)
details_cat_str = ','.join(details)
general_cat_str = ','.join(sorted(list(categories)))
details_cat_str = ','.join(sorted((details)))
details_row = [general_cat_str, details_cat_str]
return details_row, summary_data_row

Expand Down Expand Up @@ -236,15 +243,15 @@ def run(self, raw_data_file, rule_file, baseline_file, output_dir, output_format
try:
self._raw_data_df = file_handler.read_raw_data(raw_data_file)
self._metrics = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns))
logger.info('DataDiagnosis: Begin to processe {} nodes'.format(len(self._raw_data_df)))
logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df)))
data_not_accept_df, label_df = self.run_diagnosis_rules(rule_file, baseline_file)
logger.info('DataDiagnosis: Processed finished')
outpout_path = ''
output_path = ''
if output_format == 'excel':
output_path = output_dir + '/diagnosis_summary.xlsx'
file_handler.output_excel(self._raw_data_df, data_not_accept_df, outpout_path, self._sb_rules)
output_path = str(Path(output_dir) / 'diagnosis_summary.xlsx')
file_handler.output_excel(self._raw_data_df, data_not_accept_df, output_path, self._sb_rules)
elif output_format == 'json':
output_path = output_dir + '/diagnosis_summary.jsonl'
output_path = str(Path(output_dir) / 'diagnosis_summary.jsonl')
file_handler.output_json_data_not_accept(data_not_accept_df, output_path)
else:
logger.error('DataDiagnosis: output failed - unsupported output format')
Expand Down
25 changes: 17 additions & 8 deletions superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,10 +129,11 @@ def export_torchvision_model(self, model_name, batch_size=1):
if not self.check_torchvision_model(model_name):
return ''
file_name = str(self._onnx_model_path / (model_name + '.onnx'))
input_shape = (batch_size, 3, 224, 224)
model = getattr(torchvision.models, model_name)(pretrained=False).eval().cuda()
dummy_input = torch.randn((batch_size, 3, 224, 224), device='cuda')
torch.onnx.export(
getattr(torchvision.models, model_name)(pretrained=False).eval().cuda(),
torch.randn(input_shape, device='cuda'),
model,
dummy_input,
file_name,
opset_version=10,
operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK,
Expand All @@ -147,6 +148,10 @@ def export_torchvision_model(self, model_name, batch_size=1):
}
},
)

del model
del dummy_input
torch.cuda.empty_cache()
return file_name

def export_benchmark_model(self, model_name, batch_size=1, seq_length=512):
Expand All @@ -163,13 +168,13 @@ def export_benchmark_model(self, model_name, batch_size=1, seq_length=512):
if not self.check_benchmark_model(model_name):
return
file_name = str(self._onnx_model_path / (model_name + '.onnx'))
input_shape, dtype = (batch_size, seq_length), torch.int64
model = self.benchmark_models[model_name]().eval().cuda()
dummy_input = torch.ones((batch_size, seq_length), dtype=torch.int64, device='cuda')
if model_name == 'lstm':
input_shape += (self.lstm_input_size, )
dtype = None
dummy_input = torch.ones((batch_size, seq_length, self.lstm_input_size), device='cuda')
torch.onnx.export(
self.benchmark_models[model_name]().eval().cuda(),
torch.ones(input_shape, dtype=dtype, device='cuda'),
model,
dummy_input,
file_name,
opset_version=10,
do_constant_folding=True,
Expand All @@ -185,4 +190,8 @@ def export_benchmark_model(self, model_name, batch_size=1, seq_length=512):
}
},
)

del model
del dummy_input
torch.cuda.empty_cache()
return file_name
4 changes: 2 additions & 2 deletions superbench/benchmarks/micro_benchmarks/cublas_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -291,8 +291,8 @@ def _process_raw_result(self, cmd_idx, raw_output):
raw_data = raw_data.split(',')
raw_data.pop()
raw_data = [float(item) for item in raw_data]
self._result.add_result(metric, statistics.mean(raw_data))
self._result.add_raw_data(metric, raw_data)
self._result.add_result(metric.lower() + '_time', statistics.mean(raw_data))
self._result.add_raw_data(metric.lower() + '_time', raw_data)
if 'Error' in line:
error = True
except BaseException as e:
Expand Down
5 changes: 3 additions & 2 deletions superbench/benchmarks/micro_benchmarks/cudnn_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import os
import json
import yaml
import statistics

from superbench.common.utils import logger
from superbench.benchmarks import Platform, BenchmarkRegistry, ReturnCode
Expand Down Expand Up @@ -424,8 +425,8 @@ def _process_raw_result(self, cmd_idx, raw_output):
raw_data = raw_data.split(',')
raw_data.pop()
raw_data = [float(item) for item in raw_data]
self._result.add_result(metric, sum(raw_data) / len(raw_data))
self._result.add_raw_data(metric, raw_data)
self._result.add_result(metric.lower() + '_time', statistics.mean(raw_data) * 1000)
self._result.add_raw_data(metric.lower() + '_time', raw_data)
if 'Error' in line:
error = True
except BaseException as e:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ def __prepare_general_ib_command_params(self):
msg_size = '-s ' + str(self._args.msg_size)
# Add GPUDirect for ib command
gpu_enable = ''
if self._args.gpu_index:
if self._args.gpu_index is not None:
gpu = GPU()
if gpu.vendor == 'nvidia':
gpu_enable = ' --use_cuda={gpu_index}'.format(gpu_index=str(self._args.gpu_index))
Expand Down
28 changes: 25 additions & 3 deletions superbench/config/amd_mi100_hpe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Server:
# - Product: HPE Apollo 6500

version: v0.3
version: v0.4
superbench:
enable: null
var:
Expand Down Expand Up @@ -99,9 +99,31 @@ superbench:
copy_type:
- sm
- dma
ort-inference:
<<: *default_local_mode
ib-traffic:
enable: false
modes:
- name: mpi
proc_num: 1
mca:
btl: tcp,self
pml: ob1
btl_tcp_if_include: ens17f0
gpcnet-network-test:
enable: false
modes:
- name: mpi
proc_num: 1
mca:
pml: ucx
btl: ^uct
btl_tcp_if_include: ens17f0
tcp-connectivity:
enable: false
modes:
- name: local
parallel: no
parameters:
port: 22
ort-models:
enable: false
modes:
Expand Down
2 changes: 1 addition & 1 deletion superbench/config/amd_mi100_z53.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# - Product: G482-Z53
# - Link: https://www.gigabyte.cn/FileUpload/Global/MicroSite/553/G482-Z53.html

version: v0.3
version: v0.4
superbench:
enable: null
var:
Expand Down

0 comments on commit ff563b6

Please sign in to comment.