Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run benchmark failed (superbenchmark-0.8.0) #518

Closed
edisonchan opened this issue Apr 14, 2023 · 2 comments
Closed

Run benchmark failed (superbenchmark-0.8.0) #518

edisonchan opened this issue Apr 14, 2023 · 2 comments

Comments

@edisonchan
Copy link

edisonchan commented Apr 14, 2023

What's the issue, what's expected?:

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
[2023-04-14 19:17:19,195 u22:21920][ansible.py:79][INFO] Run succeed, return code 0.
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/cublas-function/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/cudnn-function/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/gemm-flops/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/gpu-burn/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/mem-bw/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/nccl-bw:default/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/nccl-bw:gdr-only/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchmarks/ort-inference/rank0/results.json
[2023-04-14 19:17:19,199 u22:21920][runner.py:275][ERROR] Invalid content in JSON file: /home/edison/Downloads/superbenchmark-0.8.0/outputs/2023-04-14_19-16-21/nodes/u22/benchm

How to reproduce it?:
OS: ubuntu 22.04.02
GPU: GeForce RTX 3060 x1

wget https://github.com/microsoft/superbenchmark/archive/refs/tags/v0.8.0.tar.gz
tar xf v0.8.0.tar.gz
cd superbenchmark-0.8.0/
python3 -m venv --system-site-packages ./venv
source ./venv/bin/activate
python3 -m pip install .
python3 -m pip install --upgrade pip setuptools==65.7
make postinstall
cp superbench/config/default.yaml sb.yaml # and change the proc_num: 8 to proc_num: 1
nano local.ini
set +H
sb deploy -f local.ini --host-password=mysshpassword

docker images # check docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
superbench/superbench latest 36fe2cd49200 2 hours ago 19.5GB

docker run -it --rm --gpus all -e NVIDIA_VISIBLE_DEVICES=0 --shm-size=1g --ulimit memlock=-1 superbench/superbench
nvidia-smi #it works.
exit

sb run -f local.ini -c sb.yaml --host-password=mysshpassword

Log message or shapshot?:
image

see attached

Additional information:
2023-04-14_19-16-21.tar.gz

@abuccts
Copy link
Member

abuccts commented Apr 15, 2023

Is there any error in the stdout/stderr of sb deploy?

As shown in the output of sb run, seems there's no CUDA GPUs found in container, can you also check whether both /dev/nvidiactl and /dev/nvidia-uvm exist in the host?

@edisonchan
Copy link
Author

edisonchan commented Apr 16, 2023

Is there any error in the stdout/stderr of sb deploy?

As shown in the output of sb run, seems there's no CUDA GPUs found in container, can you also check whether both /dev/nvidiactl and /dev/nvidia-uvm exist in the host?

There is not error while running sb deploy.

It's all fine now after kill all running cotainers.

@cp5555 cp5555 mentioned this issue Jul 5, 2023
21 tasks
@cp5555 cp5555 mentioned this issue Jul 26, 2023
30 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants