cml-cloud-run with gpus

I'm testing out using an EC2 GPU w/ the cloud container cml-gpu-py3-cloud-runner. I wanted to make sure I'm on the right track:

```yaml
name: train-my-model

on: [push]

jobs:
  deploy-cloud-runner:
    runs-on: [ubuntu-latest]
    container: docker://dvcorg/cml-gpu-cloud-runner

    steps:
      - name: deploy
        env:
          repo_token: ${{ secrets.REPO_TOKEN }} 
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          echo "Deploying..."
          MACHINE="CML-$(openssl rand -hex 12)"
          docker-machine create \
              --driver amazonec2 \
              --amazonec2-instance-type g3s.xlarge \
              --amazonec2-region us-east-2 \
              --amazonec2-zone a \
              --amazonec2-vpc-id vpc-76f1f01e \
              --amazonec2-ssh-user ubuntu \
              $MACHINE
          eval "$(docker-machine env --shell sh $MACHINE)"
          ( 
          docker-machine ssh $MACHINE "sudo mkdir -p /docker_machine && sudo chmod 777 /docker_machine" && \
          docker-machine scp -r -q ~/.docker/machine/ $MACHINE:/docker_machine && \
          docker run --name runner -d \
            -v /docker_machine/machine:/root/.docker/machine \
            -e RUNNER_IDLE_TIMEOUT=120 \
            -e DOCKER_MACHINE=${MACHINE} \
            -e RUNNER_LABELS=cml \
            -e repo_token=$repo_token \
            -e NVIDIA_VISIBLE_DEVICES=all \
            -e RUNNER_REPO=https://github.com/andronovhopf/test_cloud \
           dvcorg/cml-gpu-py3-cloud-runner && \
               sleep 20 && echo "Deployed $MACHINE"
          ) || (echo y | docker-machine rm $MACHINE && exit 1)
  train:
    needs: deploy-cloud-runner
    runs-on: [self-hosted,cml]
    
    steps:
      - uses: actions/checkout@v2

      - name: cml_run
        env:
          repo_token: ${{ secrets.REPO_TOKEN }} 
        run: |
          nvidia-smi

```

[This isn't working yet](https://github.com/andronovhopf/test_cloud/runs/766747576?check_suite_focus=true); looks to be issues getting the drivers setup on the self-hosted runner. I'm betting I have a flag wrong somewhere in the `deploy` job. I tried adding the flag `--gpus all` to `docker run` but that didn't work. Any ideas?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cml-cloud-run with gpus #117

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cml-cloud-run with gpus #117

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions