Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically re-run command when GPU isn't available #1069

Merged
merged 4 commits into from
Jun 14, 2023

Conversation

mattt
Copy link
Member

@mattt mattt commented Jun 7, 2023

Fixes #590

I was able to reproduce the described behavior with the following example project:

# cog.yaml
build:
  python_version: "3.8"
  gpu: true
  python_packages:
    - torch
    - numpy
predict: "predict.py:Predictor"
# predict.py
from cog import BasePredictor
import torch

class Predictor(BasePredictor):
    def predict(self, size: int) -> int:
        n = torch.randn([size])

        if torch.cuda.is_available():
            n = n.cuda()

        return n.sum().item()

With the latest version of Cog, cog predict fails on my local machine with the following error:

$ cog predict -i size=100
Starting Docker image cog-gpu-project-base and running setup()...
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ⅹ Failed to start container: exit status 125

This PR updates the cog run and cog predict subcommands to automatically re-run themselves in the event of this error, bringing them in line with cog build which already has this behavior.

Signed-off-by: Mattt Zmuda <mattt@replicate.com>
pkg/cli/predict.go Outdated Show resolved Hide resolved
pkg/cli/run.go Outdated Show resolved Hide resolved
@zeke
Copy link
Member

zeke commented Jun 7, 2023

I took this for a spin locally on my M1 Mac but ran into this (unrelated) error:

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ⅹ Failed to start container: exit status 125

@zeke
Copy link
Member

zeke commented Jun 7, 2023

Trying from a CPU GitHub Codespace now...

@zeke
Copy link
Member

zeke commented Jun 7, 2023

Hey it worked on the non-GPU codespace!

Starting Docker image cog-cog-run-without-gpu-base and running setup()...
Error response from daemon: page not found
Running prediction...
-7

Not sure what that page not found is about though.

@mattt
Copy link
Member Author

mattt commented Jun 7, 2023

I took this for a spin locally on my M1 Mac but ran into this (unrelated) error:

@zeke Thanks for trying this out! This looks like the original error. Something I ran into during development was that I wasn't actually calling the new version of Cog. In my case, cog resolved to the Homebrew installation. Try doing a which cog to make sure it's the right one, and do [sudo] make install as necessary.

@zeke
Copy link
Member

zeke commented Jun 10, 2023

I'm pretty sure I was calling ~/go/bin/cog explicitly, but maybe I forgot that time. I can try it again.

Copy link
Contributor

@hongchaodeng hongchaodeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR. Add some minor suggestion and hope that would be helpful :)

pkg/cli/predict.go Outdated Show resolved Hide resolved
pkg/cli/predict.go Outdated Show resolved Hide resolved
Copy link
Member

@zeke zeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this out the door!

Signed-off-by: Mattt Zmuda <mattt@replicate.com>
Signed-off-by: Mattt Zmuda <mattt@replicate.com>
Signed-off-by: Mattt Zmuda <mattt@replicate.com>
@mattt mattt force-pushed the mattt/cog-run-without-gpu branch from ef1ace7 to 6f88cdd Compare June 14, 2023 13:19
@mattt mattt enabled auto-merge (squash) June 14, 2023 13:19
@mattt mattt merged commit f84611e into main Jun 14, 2023
22 checks passed
@mattt mattt deleted the mattt/cog-run-without-gpu branch June 14, 2023 13:28
zeke added a commit to zeke/cog-stable-diffusion that referenced this pull request Jun 14, 2023
@ramesh144561
Copy link

hh

@nasimkhan1
Copy link

A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make cog run and cog predict work on GPU images, even if you don't have a GPU
6 participants