Skip to content

--gpus option not working on recently updated docker image #698

@sergeychuvakin

Description

@sergeychuvakin

I have the following YAML workflow:

on:
  push:
    branches:
      - GPU-debug

jobs:
  deploy-runner:
    runs-on: [ubuntu-latest]
    steps:
      - uses: iterative/setup-cml@v1
      - uses: actions/checkout@v2
      - name: Deploy runner on EC2
        env:
          PERSONAL_ACCESS_TOKEN: ${{ secrets.REPO_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-west-1
        run: |
          cml-runner \
              --repo https://github.com/sergeychuvakin/DVC_CML_sanbox \
              --token=$PERSONAL_ACCESS_TOKEN \
              --cloud aws \
              --cloud-region us-west-1 \
              --cloud-type=g3.4xlarge \
              --labels=cml-runner \
              --idle-timeout 30
    
  model-training:
    timeout-minutes: 5000
    needs: [deploy-runner]
    runs-on: [self-hosted, cml-runner]
    container:
      image: docker://dvcorg/cml:0-dvc2-base0-gpu
      options: --gpus all
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: '3.7'
      - name: Train model
        env:
          repo_token: ${{ secrets.REPO_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          nvidia-smi
        shell: bash

I face the following error, while image building:

Screenshot 2021-08-17 at 17 47 12

Namely I tried different images:
docker://dvcorg/cml:0-dvc2-base0-gpu or
docker://dvcorg/cml:0-dvc2-base1-gpu
gave me the same error

When i disabled options --gpus all - this error was resolved but at the same time nvidia-smi was not found

Screenshot 2021-08-17 at 17 50 38

Thanks in advance!

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcml-runnerSubcommandp0-criticalMax priority (ASAP)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions