Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non tpu inference #4

Closed
darkacorn opened this issue Feb 14, 2024 · 1 comment
Closed

non tpu inference #4

darkacorn opened this issue Feb 14, 2024 · 1 comment

Comments

@darkacorn
Copy link

we need some samples that can run actually inference on vision / image samples on gpu

(lwm) ➜ LWM git:(main) ./scripts/run_sample_image.sh
WARNING: Logging before InitGoogle() is written to STDERR
I0000 00:00:1707909968.893833 11548 common_lib.cc:148] Failed to fetch URL on try 1 out of 6: Couldn't connect to server
I0000 00:00:1707909972.473827 11548 common_lib.cc:148] Failed to fetch URL on try 2 out of 6: Couldn't connect to server
I0000 00:00:1707909976.025912 11548 common_lib.cc:148] Failed to fetch URL on try 3 out of 6: Couldn't connect to server
^C^CI0000 00:00:1707909979.577878 11548 common_lib.cc:148] Failed to fetch URL on try 4 out of 6: Couldn't connect to server
^CI0000 00:00:1707909983.129666 11548 common_lib.cc:148] Failed to fetch URL on try 5 out of 6: Couldn't connect to server
I0000 00:00:1707909986.681892 11548 common_lib.cc:148] Failed to fetch URL on try 6 out of 6: Couldn't connect to server
Failed to get 'tpu-env' from instance metadata: INTERNAL: Couldn't connect to server
=== Source Location Trace: ===
learning/45eac/tfrc/runtime/common_lib.cc:145
learning/45eac/tfrc/runtime/common_lib.cc:162
learning/45eac/tfrc/runtime/common_lib.cc:188

I0000 00:00:1707909990.233946 11548 common_lib.cc:148] Failed to fetch URL on try 1 out of 6: Couldn't connect to server
I0000 00:00:1707909993.785913 11548 common_lib.cc:148] Failed to fetch URL on try 2 out of 6: Couldn't connect to server
I0000 00:00:1707909997.337871 11548 common_lib.cc:148] Failed to fetch URL on try 3 out of 6: Couldn't connect to server
I0000 00:00:1707910000.890123 11548 common_lib.cc:148] Failed to fetch URL on try 4 out of 6: Couldn't connect to server
I0000 00:00:1707910004.442720 11548 common_lib.cc:148] Failed to fetch URL on try 5 out of 6: Couldn't connect to server
I0000 00:00:1707910007.994498 11548 common_lib.cc:148] Failed to fetch URL on try 6 out of 6: Couldn't connect to server
Failed to get 'tpu-env' from instance metadata: INTERNAL: Couldn't connect to server
=== Source Location Trace: ===
learning/45eac/tfrc/runtime/common_lib.cc:145
learning/45eac/tfrc/runtime/common_lib.cc:162
learning/45eac/tfrc/runtime/common_lib.cc:188

@wilson1yan
Copy link
Contributor

Hi, inference should work for GPU as well. Perhaps try these installation instructions to see if it works?

conda create -n lwm python=3.10
pip install -U "jax[cuda12_pip]==0.4.23" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install -r requirements.txt

Then bash scripts/run_sample_image.sh (with paths filled out). I tested on an A100 GPU, CUDA 12.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants