Release - SuperBench v0.4.0 (#278)

__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
microsoft · Dec 30, 2021 · ff563b6 · ff563b6
1 parent 682ed06
commit ff563b6
Show file tree

Hide file tree

Showing 68 changed files with 2,012 additions and 1,621 deletions.
diff --git a/README.md b/README.md
@@ -7,15 +7,15 @@
 [![Docker Pulls](https://img.shields.io/docker/pulls/superbench/superbench.svg)](https://hub.docker.com/r/superbench/superbench/tags)
 [![License](https://img.shields.io/github/license/microsoft/superbenchmark.svg)](LICENSE)
 
-| Azure Pipelines | Build Status |
-| :---: | :---: |
-| cpu-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cpu-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=77&branchName=main) |
-| cuda-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cuda-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=80&branchName=main) |
+| Azure Pipelines          | Build Status                                                                                                                                                                                                            |
+|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| cpu-unit-test            | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cpu-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=77&branchName=main)            |
+| cuda-unit-test           | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cuda-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=80&branchName=main)           |
 | ansible-integration-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/ansible-integration-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=82&branchName=main) |
 
 __SuperBench__ is a validation and profiling tool for AI infrastructure.
 
-📢 [v0.3.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.3.0) has been released!
+📢 [v0.4.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.4.0) has been released!
 
 ## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._
 

diff --git a/dockerfile/rocm4.0-pytorch1.7.0.dockerfile b/dockerfile/rocm4.0-pytorch1.7.0.dockerfile
@@ -63,26 +63,26 @@ RUN mkdir -p /root/.ssh && \
     echo -e "* soft nofile 1048576\n* hard nofile 1048576" >> /etc/security/limits.conf && \
     echo -e "root soft nofile 1048576\nroot hard nofile 1048576" >> /etc/security/limits.conf
 
+# Install OFED
+ENV OFED_VERSION=5.2-2.2.3.0
+RUN cd /tmp && \
+    wget -q http://content.mellanox.com/ofed/MLNX_OFED-${OFED_VERSION}/MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tgz && \
+    tar xzf MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tgz && \
+    PATH=/usr/bin:${PATH} MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64/mlnxofedinstall --user-space-only --without-fw-update --force --all && \
+    rm -rf MLNX_OFED_LINUX-${OFED_VERSION}*
+
 # Install OpenMPI
 ENV OPENMPI_VERSION=4.0.5
 RUN cd /tmp && \
     wget -q https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz && \
     tar xzf openmpi-${OPENMPI_VERSION}.tar.gz && \
     cd openmpi-${OPENMPI_VERSION} && \
-    ./configure --enable-orterun-prefix-by-default && \
+    ./configure --enable-orterun-prefix-by-default --with-ucx=/usr --enable-mca-no-build=btl-uct && \
     make -j $(nproc) all && \
     make install && \
     ldconfig && \
     rm -rf /tmp/openmpi-${OPENMPI_VERSION}*
 
-# Install OFED
-ENV OFED_VERSION=5.2-2.2.3.0
-RUN cd /tmp && \
-    wget -q http://content.mellanox.com/ofed/MLNX_OFED-${OFED_VERSION}/MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tgz && \
-    tar xzf MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tgz && \
-    PATH=/usr/bin:${PATH} MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64/mlnxofedinstall --user-space-only --without-fw-update --force --all && \
-    rm -rf MLNX_OFED_LINUX-${OFED_VERSION}*
-
 # Install HPC-X
 RUN cd /opt && \
     wget -q https://azhpcstor.blob.core.windows.net/azhpc-images-store/hpcx-v2.8.3-gcc-MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tbz && \

diff --git a/dockerfile/rocm4.2-pytorch1.7.0.dockerfile b/dockerfile/rocm4.2-pytorch1.7.0.dockerfile
@@ -69,7 +69,7 @@ RUN cd /tmp && \
     wget -q https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz && \
     tar xzf openmpi-${OPENMPI_VERSION}.tar.gz && \
     cd openmpi-${OPENMPI_VERSION} && \
-    ./configure --enable-orterun-prefix-by-default && \
+    ./configure --enable-orterun-prefix-by-default --with-ucx=/opt/ucx --enable-mca-no-build=btl-uct && \
     make -j $(nproc) all && \
     make install && \
     ldconfig && \

diff --git a/docs/getting-started/installation.mdx b/docs/getting-started/installation.mdx
@@ -61,7 +61,7 @@ You can clone the source from GitHub and build it.
 :::note Note
 You should checkout corresponding tag to use release version, for example,
 
-`git clone -b v0.3.0 https://github.com/microsoft/superbenchmark`
+`git clone -b v0.4.0 https://github.com/microsoft/superbenchmark`
 :::
 
 ```bash

diff --git a/docs/getting-started/run-superbench.md b/docs/getting-started/run-superbench.md
@@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password]
 :::note Note
 You should deploy corresponding Docker image to use release version, for example,
 
-`sb deploy -f local.ini -i superbench/superbench:v0.3.0-cuda11.1.1`
+`sb deploy -f local.ini -i superbench/superbench:v0.4.0-cuda11.1.1`
 
 You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone.
 

diff --git a/docs/superbench-config.mdx b/docs/superbench-config.mdx
@@ -70,7 +70,7 @@ superbench:
 <TabItem value='example'>
 
 ```yaml
-version: v0.3
+version: v0.4
 superbench:
   enable: benchmark_1
   monitor:

diff --git a/docs/user-tutorial/benchmarks/micro-benchmarks.md b/docs/user-tutorial/benchmarks/micro-benchmarks.md
@@ -60,11 +60,40 @@ Large scale matmul operation using `torch.matmul` with one GPU.
 
 ### `cublas-function`
 
-TODO
+#### Introduction
+
+Measure the performance of most common Nvidia cuBLAS functions with parameters in models training including ResNet, VGG, DenseNet, LSTM, BERT, and GPT-2.
+
+The supported functions for cuBLAS are as follows:
+ - cublasSgemm
+ - cublasSgemmStridedBatched
+ - cublasGemmStridedBatchedEx
+ - cublasGemmEx
+ - cublasCgemm3mStridedBatched
+ - cublasCgemm
+
+#### Metrics
+
+| Name                                                     | Unit      | Description                                                       |
+|----------------------------------------------------------|-----------|-------------------------------------------------------------------|
+| cublas-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cublas function with the parameters. |
 
 ### `cudnn-function`
 
-TODO
+#### Introduction
+
+Measure the performance of most common Nvidia cuDNN functions with parameters in models training including ResNet, VGG, DenseNet, LSTM, BERT, and GPT-2.
+
+The supported functions for cuDNN are as follows:
+ - cudnnConvolutionBackwardFilter
+ - cudnnConvolutionBackwardData
+ - cudnnConvolutionForward
+
+#### Metrics
+
+| Name                                                    | Unit      | Description                                                      |
+|---------------------------------------------------------|-----------|------------------------------------------------------------------|
+| cudnn-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cudnn function with the parameters. |
 
 ### `tensorrt-inference`
 

diff --git a/docs/user-tutorial/container-images.mdx b/docs/user-tutorial/container-images.mdx
@@ -29,6 +29,7 @@ available tags are listed below for all stable versions.
 
 | Tag               | Description                        |
 | ----------------- | ---------------------------------- |
+| v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 |
 | v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 |
 | v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 |
 | v0.2.0-cuda11.1.1 | SuperBench v0.2.0 with CUDA 11.1.1 |
@@ -38,6 +39,8 @@ available tags are listed below for all stable versions.
 
 | Tag                         | Description                                    |
 | --------------------------- | ---------------------------------------------- |
+| v0.4.0-rocm4.2-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.2, PyTorch 1.7.0 |
+| v0.4.0-rocm4.0-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.0, PyTorch 1.7.0 |
 | v0.3.0-rocm4.2-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.2, PyTorch 1.7.0 |
 | v0.3.0-rocm4.0-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.0, PyTorch 1.7.0 |
 

diff --git a/docs/user-tutorial/data-diagnosis.md b/docs/user-tutorial/data-diagnosis.md
@@ -64,7 +64,7 @@ superbench:
 example:
 ```yaml
 # SuperBench rules
-version: v0.3
+version: v0.4
 superbench:
   rules:
     failure-rule:

diff --git a/setup.py b/setup.py
@@ -165,7 +165,7 @@ def run(self):
             'pytest>=6.2.2',
             'types-pyyaml',
             'vcrpy>=4.1.1',
-            'yapf>=0.30.0',
+            'yapf==0.31.0',
         ],
         'nvidia': ['py3nvml>=0.2.6'],
         'ort': [

diff --git a/superbench/__init__.py b/superbench/__init__.py
@@ -6,5 +6,5 @@
 Provide hardware and software benchmarks for AI systems.
 """
 
-__version__ = '0.3.0'
+__version__ = '0.4.0'
 __author__ = 'Microsoft'
diff --git a/superbench/analyzer/data_diagnosis.py b/superbench/analyzer/data_diagnosis.py
@@ -5,12 +5,13 @@
 
 import re
 from typing import Callable
+from pathlib import Path
 
 import pandas as pd
 
 from superbench.common.utils import logger
 from superbench.analyzer.diagnosis_rule_op import RuleOp, DiagnosisRuleType
-import superbench.analyzer.file_handler as file_handler
+from superbench.analyzer import file_handler
 
 
 class DataDiagnosis():
@@ -31,10 +32,15 @@ def _get_metrics_by_benchmarks(self, metrics_list):
         """
         benchmarks_metrics = {}
         for metric in metrics_list:
-            benchmark = metric.split('/')[0]
-            if benchmark not in benchmarks_metrics:
-                benchmarks_metrics[benchmark] = set()
-            benchmarks_metrics[benchmark].add(metric)
+            if '/' not in metric:
+                logger.warning(
+                    'DataDiagnosis: get_metrics_by_benchmarks - {} does not have benchmark_name'.format(metric)
+                )
+            else:
+                benchmark = metric.split('/')[0]
+                if benchmark not in benchmarks_metrics:
+                    benchmarks_metrics[benchmark] = set()
+                benchmarks_metrics[benchmark].add(metric)
         return benchmarks_metrics
 
     def _check_rules(self, rule, name):
@@ -133,6 +139,7 @@ def _get_criteria(self, rule_file, baseline_file):
                             if re.search(metric_regex, metric):
                                 self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
                                 self._enable_metrics.append(metric)
+            self._enable_metrics.sort()
         except Exception as e:
             logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e)))
             return False
@@ -171,8 +178,8 @@ def _run_diagnosis_rules_for_single_node(self, node):
                 issue_label = True
         if issue_label:
             # Add category information
-            general_cat_str = ','.join(categories)
-            details_cat_str = ','.join(details)
+            general_cat_str = ','.join(sorted(list(categories)))
+            details_cat_str = ','.join(sorted((details)))
             details_row = [general_cat_str, details_cat_str]
             return details_row, summary_data_row
 
@@ -236,15 +243,15 @@ def run(self, raw_data_file, rule_file, baseline_file, output_dir, output_format
         try:
             self._raw_data_df = file_handler.read_raw_data(raw_data_file)
             self._metrics = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns))
-            logger.info('DataDiagnosis: Begin to processe {} nodes'.format(len(self._raw_data_df)))
+            logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df)))
             data_not_accept_df, label_df = self.run_diagnosis_rules(rule_file, baseline_file)
             logger.info('DataDiagnosis: Processed finished')
-            outpout_path = ''
+            output_path = ''
             if output_format == 'excel':
-                output_path = output_dir + '/diagnosis_summary.xlsx'
-                file_handler.output_excel(self._raw_data_df, data_not_accept_df, outpout_path, self._sb_rules)
+                output_path = str(Path(output_dir) / 'diagnosis_summary.xlsx')
+                file_handler.output_excel(self._raw_data_df, data_not_accept_df, output_path, self._sb_rules)
             elif output_format == 'json':
-                output_path = output_dir + '/diagnosis_summary.jsonl'
+                output_path = str(Path(output_dir) / 'diagnosis_summary.jsonl')
                 file_handler.output_json_data_not_accept(data_not_accept_df, output_path)
             else:
                 logger.error('DataDiagnosis: output failed - unsupported output format')

diff --git a/superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py b/superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py
@@ -129,10 +129,11 @@ def export_torchvision_model(self, model_name, batch_size=1):
         if not self.check_torchvision_model(model_name):
             return ''
         file_name = str(self._onnx_model_path / (model_name + '.onnx'))
-        input_shape = (batch_size, 3, 224, 224)
+        model = getattr(torchvision.models, model_name)(pretrained=False).eval().cuda()
+        dummy_input = torch.randn((batch_size, 3, 224, 224), device='cuda')
         torch.onnx.export(
-            getattr(torchvision.models, model_name)(pretrained=False).eval().cuda(),
-            torch.randn(input_shape, device='cuda'),
+            model,
+            dummy_input,
             file_name,
             opset_version=10,
             operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK,
@@ -147,6 +148,10 @@ def export_torchvision_model(self, model_name, batch_size=1):
                 }
             },
         )
+
+        del model
+        del dummy_input
+        torch.cuda.empty_cache()
         return file_name
 
     def export_benchmark_model(self, model_name, batch_size=1, seq_length=512):
@@ -163,13 +168,13 @@ def export_benchmark_model(self, model_name, batch_size=1, seq_length=512):
         if not self.check_benchmark_model(model_name):
             return
         file_name = str(self._onnx_model_path / (model_name + '.onnx'))
-        input_shape, dtype = (batch_size, seq_length), torch.int64
+        model = self.benchmark_models[model_name]().eval().cuda()
+        dummy_input = torch.ones((batch_size, seq_length), dtype=torch.int64, device='cuda')
         if model_name == 'lstm':
-            input_shape += (self.lstm_input_size, )
-            dtype = None
+            dummy_input = torch.ones((batch_size, seq_length, self.lstm_input_size), device='cuda')
         torch.onnx.export(
-            self.benchmark_models[model_name]().eval().cuda(),
-            torch.ones(input_shape, dtype=dtype, device='cuda'),
+            model,
+            dummy_input,
             file_name,
             opset_version=10,
             do_constant_folding=True,
@@ -185,4 +190,8 @@ def export_benchmark_model(self, model_name, batch_size=1, seq_length=512):
                 }
             },
         )
+
+        del model
+        del dummy_input
+        torch.cuda.empty_cache()
         return file_name
diff --git a/superbench/benchmarks/micro_benchmarks/cublas_function.py b/superbench/benchmarks/micro_benchmarks/cublas_function.py
@@ -291,8 +291,8 @@ def _process_raw_result(self, cmd_idx, raw_output):
                     raw_data = raw_data.split(',')
                     raw_data.pop()
                     raw_data = [float(item) for item in raw_data]
-                    self._result.add_result(metric, statistics.mean(raw_data))
-                    self._result.add_raw_data(metric, raw_data)
+                    self._result.add_result(metric.lower() + '_time', statistics.mean(raw_data))
+                    self._result.add_raw_data(metric.lower() + '_time', raw_data)
                 if 'Error' in line:
                     error = True
         except BaseException as e:

diff --git a/superbench/benchmarks/micro_benchmarks/cudnn_function.py b/superbench/benchmarks/micro_benchmarks/cudnn_function.py
@@ -6,6 +6,7 @@
 import os
 import json
 import yaml
+import statistics
 
 from superbench.common.utils import logger
 from superbench.benchmarks import Platform, BenchmarkRegistry, ReturnCode
@@ -424,8 +425,8 @@ def _process_raw_result(self, cmd_idx, raw_output):
                     raw_data = raw_data.split(',')
                     raw_data.pop()
                     raw_data = [float(item) for item in raw_data]
-                    self._result.add_result(metric, sum(raw_data) / len(raw_data))
-                    self._result.add_raw_data(metric, raw_data)
+                    self._result.add_result(metric.lower() + '_time', statistics.mean(raw_data) * 1000)
+                    self._result.add_raw_data(metric.lower() + '_time', raw_data)
                 if 'Error' in line:
                     error = True
         except BaseException as e:

diff --git a/superbench/benchmarks/micro_benchmarks/ib_validation_performance.py b/superbench/benchmarks/micro_benchmarks/ib_validation_performance.py
@@ -249,7 +249,7 @@ def __prepare_general_ib_command_params(self):
             msg_size = '-s ' + str(self._args.msg_size)
         # Add GPUDirect for ib command
         gpu_enable = ''
-        if self._args.gpu_index:
+        if self._args.gpu_index is not None:
             gpu = GPU()
             if gpu.vendor == 'nvidia':
                 gpu_enable = ' --use_cuda={gpu_index}'.format(gpu_index=str(self._args.gpu_index))

diff --git a/superbench/config/amd_mi100_hpe.yaml b/superbench/config/amd_mi100_hpe.yaml
@@ -3,7 +3,7 @@
 # Server:
 #   - Product: HPE Apollo 6500
 
-version: v0.3
+version: v0.4
 superbench:
   enable: null
   var:
@@ -99,9 +99,31 @@ superbench:
         copy_type:
           - sm
           - dma
-    ort-inference:
-      <<: *default_local_mode
+    ib-traffic:
+      enable: false
+      modes:
+        - name: mpi
+          proc_num: 1
+          mca:
+            btl: tcp,self
+            pml: ob1
+            btl_tcp_if_include: ens17f0
+    gpcnet-network-test:
       enable: false
+      modes:
+        - name: mpi
+          proc_num: 1
+          mca:
+            pml: ucx
+            btl: ^uct
+            btl_tcp_if_include: ens17f0
+    tcp-connectivity:
+      enable: false
+      modes:
+        - name: local
+          parallel: no
+      parameters:
+        port: 22
     ort-models:
       enable: false
       modes:

diff --git a/superbench/config/amd_mi100_z53.yaml b/superbench/config/amd_mi100_z53.yaml
@@ -4,7 +4,7 @@
 #   - Product: G482-Z53
 #   - Link: https://www.gigabyte.cn/FileUpload/Global/MicroSite/553/G482-Z53.html
 
-version: v0.3
+version: v0.4
 superbench:
   enable: null
   var: