-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TPU VM Training Error - tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'SentencepieceOp' in binary running #965
Comments
For context, the same thing runs perfectly on standard gcloud VM, starting TPUs the ol' way (tensorflow 2.7 version). I however strongly prefer TPU VMs and would love to get it running. |
Hi @TambourineMan42 , I've seen the same error message on TPU VM now. Were you able to solve the problem? I could also run the pre-training on a "normal" TPU in combination with a VM and I didn't get that strange error message... |
@craffel do you have an idea, what is going wrong here? I'm also using TF 2.8 in the TPU VM (exact version that I've been using in the normal VM), and my (custom) sentence piece model is stored on a GCP bucket. |
@adarob I'm running into the same error message too. |
@broken can you PTAL? |
The error indicates that tensorflow-text is not installed on the TPU VMs, which the documentation looks to confirm this as well. It is installed on the standard gcloud VMs. I was actually discussing releases recently with the that team relating to having the tf text package available. I'll reach out again to get more current info. In the immediate-term, I think you will need to create your own VM image with the tensorflow-text package installed. edit: or can you just |
It's odd since tensorflow-text should already be installed by pip when t5 is. Can you check the versions on the VM of TF & TF-Text? Are they the same major & minor (ie. tf 2.8.x & tf-text 2.8.x)? |
Hi @broken , unfortunately, there are some caveats, I can't just install Update of the TensorFlow version will result in an error, so that TensorFlow (then updated to 2.8) won't be able to find my local TPU. I could verify it via this script. So before an update the script is detecting my TPU, after TF to 2.8 - which I tried to build I'm currently using a v4-8 TPU VM with the |
Can you try |
I've tried it on a fresh new instance: stefan@t1v-n--w-0:~$ pip show tf-nightly
Name: tf-nightly
Version: 2.7.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /usr/local/lib/python3.8/dist-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras-nightly, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, six, tb-nightly, termcolor, tf-estimator-nightly, typing-extensions, wheel, wrapt
Required-by:
stefan@t1v-n--w-0:~$ pip install --no-deps tensorflow-text==2.7.3
Defaulting to user installation because normal site-packages is not writeable
Collecting tensorflow-text==2.7.3
Downloading tensorflow_text-2.7.3-cp38-cp38-manylinux2010_x86_64.whl (4.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 62.9 MB/s eta 0:00:00
Installing collected packages: tensorflow-text
Successfully installed tensorflow-text-2.7.3
stefan@t1v-n--w-0:~$
stefan@t1v-n--w-0:~$
stefan@t1v-n--w-0:~$
stefan@t1v-n--w-0:~$ python3
Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow_text
2022-02-24 00:14:28.722217: I tensorflow/core/tpu/tpu_api_dlsym_initializer.cc:116] Libtpu path is: libtpu.so
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/stefan/.local/lib/python3.8/site-packages/tensorflow_text/__init__.py", line 20, in <module>
from tensorflow_text.core.pybinds import tflite_registrar
ImportError: /home/stefan/.local/lib/python3.8/site-packages/tensorflow_text/core/pybinds/tflite_registrar.so: undefined symbol: _ZN4absl12lts_2021032420raw_logging_internal21internal_log_functionE |
The undefined symbol errors are a result of using tensorflow-text against a version of tf that it wasn't built against. In this case, it was built against the stable version of tf, not nightly. The different build environments could create the symbol tables differently. What I find odd is that your tf-nightly version is 2.7.0, but tf-nightly versions are generally of the form Can you tell me what you get for
|
Unfortunately, it outputs >>> import tensorflow as tf
2022-02-24 08:35:58.067887: I tensorflow/core/tpu/tpu_api_dlsym_initializer.cc:116] Libtpu path is: libtpu.so
>>> print(tf.__git_version__)
unknown |
Apparently the TF-2.8 TPU VM has tensorflow_text already installed. Can you use it or does it need to be the older image? |
I don't know if it will be useful, I had a similar problem using Kaggle's TPU VM. Iterating a keras dataset was throwing these errors (no errors with CPU). The dataset pipeline contained a map to a numpy function using tf.py_function. I fixed the errors by removing the @tf.function directive in a sub-function of this function |
Describe the bug
When I run any of the fine-tuning scripts with my own training tsv file on TPU-VM (using v3-8 and v2-alpha-pod), it prematurely ends training (fails to even start).
To Reproduce
Steps to reproduce the behavior:
pip install t5[gcp]
to installing from source both mesh and text-to-text-transfer-transformer.Expected behavior
Here is the entire stack trace:
The text was updated successfully, but these errors were encountered: