Run benchmark failed (superbenchmark-0.8.0) #518

edisonchan · 2023-04-14T11:31:15Z

What's the issue, what's expected?:

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
[2023-04-14 19:17:19,195 u22:21920][ansible.py:79][INFO] Run succeed, return code 0.
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/cublas-function/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/cudnn-function/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/gemm-flops/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/gpu-burn/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/mem-bw/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/nccl-bw:default/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/nccl-bw:gdr-only/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/ort-inference/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchm

How to reproduce it?:
OS: ubuntu 22.04.02
GPU: GeForce RTX 3060 x1

wget https://github.com/microsoft/superbenchmark/archive/refs/tags/v0.8.0.tar.gz
tar xf v0.8.0.tar.gz
cd superbenchmark-0.8.0/
python3 -m venv --system-site-packages ./venv
source ./venv/bin/activate
python3 -m pip install .
python3 -m pip install --upgrade pip setuptools==65.7
make postinstall
cp superbench/config/default.yaml sb.yaml # and change the proc_num: 8 to proc_num: 1
nano local.ini
set +H
sb deploy -f local.ini --host-password=mysshpassword

docker images # check docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
superbench/superbench latest 36fe2cd49200 2 hours ago 19.5GB

docker run -it --rm --gpus all -e NVIDIA_VISIBLE_DEVICES=0 --shm-size=1g --ulimit memlock=-1 superbench/superbench
nvidia-smi #it works.
exit

sb run -f local.ini -c sb.yaml --host-password=mysshpassword

Log message or shapshot?:

see attached

Additional information:
2023-04-14_19-16-21.tar.gz

The text was updated successfully, but these errors were encountered:

abuccts · 2023-04-15T15:16:16Z

Is there any error in the stdout/stderr of sb deploy?

As shown in the output of sb run, seems there's no CUDA GPUs found in container, can you also check whether both /dev/nvidiactl and /dev/nvidia-uvm exist in the host?

edisonchan · 2023-04-16T03:09:12Z

Is there any error in the stdout/stderr of sb deploy?

As shown in the output of sb run, seems there's no CUDA GPUs found in container, can you also check whether both /dev/nvidiactl and /dev/nvidia-uvm exist in the host?

There is not error while running sb deploy.

It's all fine now after kill all running cotainers.

edisonchan closed this as completed Apr 16, 2023

cp5555 mentioned this issue Jul 5, 2023

V0.9.0 Release Plan #472

Closed

21 tasks

cp5555 mentioned this issue Jul 26, 2023

V0.10.0 Release Plan #559

Closed

30 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run benchmark failed (superbenchmark-0.8.0) #518

Run benchmark failed (superbenchmark-0.8.0) #518

edisonchan commented Apr 14, 2023 •

edited

abuccts commented Apr 15, 2023

edisonchan commented Apr 16, 2023 •

edited

Run benchmark failed (superbenchmark-0.8.0) #518

Run benchmark failed (superbenchmark-0.8.0) #518

Comments

edisonchan commented Apr 14, 2023 • edited

abuccts commented Apr 15, 2023

edisonchan commented Apr 16, 2023 • edited

edisonchan commented Apr 14, 2023 •

edited

edisonchan commented Apr 16, 2023 •

edited