Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jax] absl issues #14907

Closed
stas00 opened this issue Dec 23, 2021 · 3 comments
Closed

[jax] absl issues #14907

stas00 opened this issue Dec 23, 2021 · 3 comments

Comments

@stas00
Copy link
Contributor

stas00 commented Dec 23, 2021

update:

So the problem was that jax wasn't detecting a GPU when there was one.

The solution is to install jax correctly for cuda and it is:

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html 

more details: #14907 (comment)

will auto-close this issue when #14909 is merged.


Original:

$ python -c "import transformers.testing_utils"
INFO:absl:Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
INFO:absl:Unable to initialize backend 'gpu': NOT_FOUND: Could not find registered platform with name: "cuda". Available platform names are: Interpreter Host
INFO:absl:Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

The issue comes from absl-py package. Don't know anything about it.

Could we please fix it, as this is a JAX issue which impacts everybody and not only JAX users?

The only way I found to turn it off is by explicitly disabling USE_JAX=0

I tried upgrading the libs

pip install jax jaxlib absl-py -U

but the issue is still there, probably did come in the recent libraries:

This seems to be related: #12434 but it was never resolved.

The transformers was set up to carefully not load any of torch/tf/jax until one of them is actually used. But it doesn't seem to work here.

Thank you.

@patil-suraj

@stas00
Copy link
Contributor Author

stas00 commented Dec 23, 2021

So this is one of the triggers:

python -c "import jax; jax.default_backend()"

and it is looking for TPUs:

TF_CPP_MIN_LOG_LEVEL=0 python -c "import jax; jax.default_backend()"
2021-12-23 10:06:40.249749: I external/org_tensorflow/tensorflow/core/tpu/tpu_initializer_helper.cc:94] libtpu.so already in use by another process. Run "$ sudo lsof -w /dev/accel0" to figure out which process is using the TPU. Not attempting to load libtpu.so in this process.
2021-12-23 10:06:40.249776: I external/org_tensorflow/tensorflow/core/tpu/tpu_api_dlsym_initializer.cc:116] Libtpu path is: libtpu.so
2021-12-23 10:06:40.251821: I external/org_tensorflow/tensorflow/core/tpu/tpu_executor_dlsym_initializer.cc:68] Libtpu path is: libtpu.so
2021-12-23 10:06:40.709202: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:171] XLA service 0x55bc1aee0390 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2021-12-23 10:06:40.709222: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:179]   StreamExecutor device (0): Interpreter, <undefined>
2021-12-23 10:06:40.711252: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:165] TfrtCpuClient created.
2021-12-23 10:06:40.711669: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

@stas00
Copy link
Contributor Author

stas00 commented Dec 23, 2021

The solution is to install jax correctly for cuda and it is:

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html 

not sure how we could help users with this as our auto-dependencies installer can't automatically know if cuda version is needed or not.

It's still looking for TPUs though:

python -c "import transformers.testing_utils"
INFO:absl:Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
INFO:absl:Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.

but at least it finds the GPU now

@stas00
Copy link
Contributor Author

stas00 commented Dec 31, 2021

Posted solution at the top of the OP, plus #14909 got merged so closing this one.

@stas00 stas00 closed this as completed Dec 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant