Chapter 11, Part 1: TextVectorization with output_mode="tf_idf" #190

liganega · 2021-12-05T05:07:26Z

The 24th code cell raises the following error when it runs on a PC using Anaconda(Python 3.8.5 + Tensorflow 2.6 or 2.7) while it runs well on Google Colab.

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-25-6747a8415a37> in <module>
----> 1 text_vectorization.adapt(text_only_train_ds)
      2 
      3 tfidf_2gram_train_ds = train_ds.map(lambda x, y: (text_vectorization(x), y))
      4 tfidf_2gram_val_ds = val_ds.map(lambda x, y: (text_vectorization(x), y))
      5 tfidf_2gram_test_ds = test_ds.map(lambda x, y: (text_vectorization(x), y))

~\anaconda3\lib\site-packages\keras\engine\base_preprocessing_layer.py in adapt(self, data, batch_size, steps)
    242       with data_handler.catch_stop_iteration():
    243         for _ in data_handler.steps():
--> 244           self._adapt_function(iterator)
    245           if data_handler.should_sync:
    246             context.async_wait()
...

When the output_mode is not "tf_idf", then everything is ok. The error occurred first with TF 2.6, but it continues to happen even with TF 2.7.

text_vectorization = TextVectorization(
    ngrams=2,
    max_tokens=20000,
    output_mode="tf_idf",
)

The python environment is as follows:

OS: Windows 11 (no WSL2)
Anaconda + TF 2.6 or 2.7

I am wondering WHY!

Many thanks in advance for any help.

liganega · 2021-12-06T13:51:07Z

I was wrong with Google Colab. The same error occurs even on Google Colab.
(I don't know why it worked yesterday or maybe I believed so at least.)

liganega · 2021-12-06T14:36:41Z

I think I found the main reason for the confusion.

Whether the error occurs depends on the GPU support.

with GPU support: Yes, the error occurs.
without GPU support: No error.

jmbo1190 · 2021-12-10T15:49:18Z

Hello, I've observed the same error on GPU only: in cell with code from Listing 11.11 (training new model with TF-IDF bigram model)

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No, using code from Deep Learning with Python, 2nd edition, Listing 11.11 Training and testing the TF-IDF bigram model
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04, release='5.4.0-90-generic', version='#101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021', machine='x86_64', processor='x86_64'
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA
TensorFlow installed from (source or binary): binary:
https://conda.anaconda.org/conda-forge/linux-64/tensorflow-base-2.6.2-cuda112py37h8d33417_2.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/tensorflow-estimator-2.6.2-cuda112py37h474db6c_2.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/tensorflow-2.6.2-cuda112py37h474db6c_2.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/tensorflow-gpu-2.6.2-cuda112py37h0bbbad9_2.tar.bz2
TensorFlow version (use command below): unknown 2.6.2
Python version: Python 3.7.12
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: 11.2/8201
GPU model and memory: GeForce GTX 1660 SUPER, 5944MiB

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:

TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior Error message:

/tmp/ipykernel_901495/519244469.py in <module>
----> 1 text_vectorization.adapt(text_only_train_ds)
      2 
      3 tfidf_2gram_train_ds = train_ds.map(
      4     lambda x, y: (text_vectorization(x), y),
      5     num_parallel_calls=4)

see full log below

Describe the expected behavior
The code should run without errors.

Contributing

Do you want to contribute a PR? (yes/no): no
Briefly describe your candidate solution(if contributing):

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
Original code from repo

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

GPU run log: chapter11_part01_introduction_reprex.md
CPU run produces no error

jasonbrancazio · 2021-12-16T18:25:35Z

I just ran into similar issues running on Google Colab with a GPU.

Stacktrace:

/usr/local/lib/python3.7/dist-packages/keras/engine/base_preprocessing_layer.py in adapt(self, data, batch_size, steps)
    242       with data_handler.catch_stop_iteration():
    243         for _ in data_handler.steps():
--> 244           self._adapt_function(iterator)
    245           if data_handler.should_sync:
    246             context.async_wait()

/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/traceback_utils.py in error_handler(*args, **kwargs)
    151     except Exception as e:
    152       filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153       raise e.with_traceback(filtered_tb) from None
    154     finally:
    155       del filtered_tb

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57     ctx.ensure_initialized()
     58     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 59                                         inputs, attrs, num_outputs)
     60   except core._NotOkStatusException as e:
     61     if name is not None:

InvalidArgumentError: 2 root error(s) found.
  (0) INVALID_ARGUMENT:  During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
	 [[{{node map/TensorArrayUnstack/TensorListFromTensor/_42}}]]
	 [[Func/map/while/body/_1/input/_50/_58]]
  (1) INVALID_ARGUMENT:  During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
	 [[{{node map/TensorArrayUnstack/TensorListFromTensor/_42}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_adapt_step_77044]

Function call stack:
adapt_step -> adapt_step

SteffenBauer · 2022-05-28T19:40:07Z

I just stumbled over the same problem when trying to run the code in a Jupyter lab notebook on a Jetson Nano (TF 2.7)

I managed to get it to work there by specifying CPU as the device for the adapt operation:

with tf.device("cpu"):
    text_vectorization.adapt(text_only_train_ds)

ifond · 2022-05-28T19:40:36Z

YES!!! I have received your E-mail——Steven Lee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 11, Part 1: TextVectorization with output_mode="tf_idf" #190

Chapter 11, Part 1: TextVectorization with output_mode="tf_idf" #190

liganega commented Dec 5, 2021 •

edited

liganega commented Dec 6, 2021

liganega commented Dec 6, 2021 •

edited

jmbo1190 commented Dec 10, 2021 •

edited

jasonbrancazio commented Dec 16, 2021

SteffenBauer commented May 28, 2022 •

edited

ifond commented May 28, 2022 via email

Chapter 11, Part 1: TextVectorization with output_mode="tf_idf" #190

Chapter 11, Part 1: TextVectorization with output_mode="tf_idf" #190

Comments

liganega commented Dec 5, 2021 • edited

liganega commented Dec 6, 2021

liganega commented Dec 6, 2021 • edited

jmbo1190 commented Dec 10, 2021 • edited

jasonbrancazio commented Dec 16, 2021

SteffenBauer commented May 28, 2022 • edited

ifond commented May 28, 2022 via email

liganega commented Dec 5, 2021 •

edited

liganega commented Dec 6, 2021 •

edited

jmbo1190 commented Dec 10, 2021 •

edited

SteffenBauer commented May 28, 2022 •

edited