-
Notifications
You must be signed in to change notification settings - Fork 348
Closed
Labels
bugSomething isn't workingSomething isn't workingcml-runnerSubcommandSubcommandp0-criticalMax priority (ASAP)Max priority (ASAP)
Description
I have the following YAML workflow:
on:
push:
branches:
- GPU-debug
jobs:
deploy-runner:
runs-on: [ubuntu-latest]
steps:
- uses: iterative/setup-cml@v1
- uses: actions/checkout@v2
- name: Deploy runner on EC2
env:
PERSONAL_ACCESS_TOKEN: ${{ secrets.REPO_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-1
run: |
cml-runner \
--repo https://github.com/sergeychuvakin/DVC_CML_sanbox \
--token=$PERSONAL_ACCESS_TOKEN \
--cloud aws \
--cloud-region us-west-1 \
--cloud-type=g3.4xlarge \
--labels=cml-runner \
--idle-timeout 30
model-training:
timeout-minutes: 5000
needs: [deploy-runner]
runs-on: [self-hosted, cml-runner]
container:
image: docker://dvcorg/cml:0-dvc2-base0-gpu
options: --gpus all
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.7'
- name: Train model
env:
repo_token: ${{ secrets.REPO_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
nvidia-smi
shell: bashI face the following error, while image building:
Namely I tried different images:
docker://dvcorg/cml:0-dvc2-base0-gpu or
docker://dvcorg/cml:0-dvc2-base1-gpu
gave me the same error
When i disabled options --gpus all - this error was resolved but at the same time nvidia-smi was not found
Thanks in advance!
0x2b3bfa00x2b3bfa0
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingcml-runnerSubcommandSubcommandp0-criticalMax priority (ASAP)Max priority (ASAP)

