Skip to content

Commit 83a316d

Browse files
xuzhao9facebook-github-bot
authored andcommitted
Enable the a10g instance ci (#1476)
Summary: Enable the PR CI on AWS A10G to save CI costs. Fixes #1470 Pull Request resolved: #1476 Reviewed By: weiwangmeta Differential Revision: D43995490 Pulled By: xuzhao9 fbshipit-source-id: 56496a247500fd141fa6f3cf1f8701a03a5d37fe
1 parent c78f1f3 commit 83a316d

File tree

6 files changed

+92
-46
lines changed

6 files changed

+92
-46
lines changed

.github/workflows/pr-a10g.yml

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
name: TorchBench PR Test on A10G
2+
on:
3+
pull_request:
4+
workflow_dispatch:
5+
push:
6+
branches:
7+
- main
8+
9+
env:
10+
CONDA_ENV: "pr-test"
11+
12+
jobs:
13+
pr-test:
14+
# AWS A10G GPU instance label: linux.g5.4xlarge.nvidia.gpu
15+
# OS version: Amazon Linux 2
16+
runs-on: [self-hosted, linux.g5.4xlarge.nvidia.gpu]
17+
timeout-minutes: 1440 # 24 hours
18+
steps:
19+
- name: Checkout TorchBench
20+
uses: actions/checkout@v3
21+
- name: Install Conda and basic packages
22+
run: |
23+
bash scripts/install_basics_amzn_linux_2.sh
24+
- name: Install NVIDIA Driver
25+
run: |
26+
bash scripts/setup_nvda_11.7.sh
27+
- name: GPU Tuning
28+
run: |
29+
sudo nvidia-smi -pm 1
30+
- name: Setup Conda Env
31+
run: |
32+
. ${HOME}/miniconda3/etc/profile.d/conda.sh
33+
conda activate
34+
python utils/python_utils.py --create-conda-env "${CONDA_ENV}"
35+
conda activate "${CONDA_ENV}"
36+
python utils/cuda_utils.py --install-torch-deps
37+
- name: Install PyTorch nightly
38+
run: |
39+
. ${HOME}/miniconda3/etc/profile.d/conda.sh
40+
conda activate "${CONDA_ENV}"
41+
python utils/cuda_utils.py --install-torch-nightly
42+
- name: Install TorchBench
43+
run: |
44+
. ${HOME}/miniconda3/etc/profile.d/conda.sh
45+
conda activate "${CONDA_ENV}"
46+
python install.py
47+
- name: Validate benchmark components (Worker)
48+
run: |
49+
. ${HOME}/miniconda3/etc/profile.d/conda.sh
50+
conda activate "${CONDA_ENV}"
51+
python -m components.test.test_subprocess
52+
python -m components.test.test_worker
53+
- name: Validate benchmark components (Model)
54+
run: |
55+
. ${HOME}/miniconda3/etc/profile.d/conda.sh
56+
conda activate "${CONDA_ENV}"
57+
python test.py
58+
59+
concurrency:
60+
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
61+
cancel-in-progress: true

scripts/install_basics.sh

Lines changed: 0 additions & 39 deletions
This file was deleted.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
CONDA=https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
2+
filename=$(basename "$CONDA")
3+
wget "$CONDA"
4+
chmod +x "$filename"
5+
./"$filename" -b -u
6+
7+
sudo yum makecache --refresh
8+
sudo yum install -y git jq \
9+
vim wget curl ninja-build cmake \
10+
libglvnd-glx libsndfile
11+
12+
. ${HOME}/miniconda3/etc/profile.d/conda.sh
13+
conda activate

scripts/install_conda.sh

Lines changed: 0 additions & 5 deletions
This file was deleted.

scripts/setup_ci.sh

Lines changed: 0 additions & 2 deletions
This file was deleted.

scripts/setup_nvda_11.7.sh

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/usr/bin/env bash
2+
set -ex -o pipefail
3+
4+
# Setup NVIDIA Driver
5+
DRIVER_FN="NVIDIA-Linux-x86_64-515.76.run"
6+
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
7+
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
8+
nvidia-smi
9+
10+
# Setup CUDA 11.7 and cuDNN 8.5.0.96
11+
wget -q https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run -O cuda_11.7.0_515.43.04_linux.run
12+
sudo bash ./cuda_11.7.0_515.43.04_linux.run --toolkit --silent
13+
wget -q https://ossci-linux.s3.amazonaws.com/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz -O cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
14+
tar xJf cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
15+
cd cudnn-linux-x86_64-8.5.0.96_cuda11-archive && \
16+
sudo cp include/* /usr/local/cuda-11.7/include && \
17+
sudo cp lib/* /usr/local/cuda-11.7/lib64 && \
18+
sudo ldconfig

0 commit comments

Comments
 (0)