Skip to content
This repository has been archived by the owner on Sep 6, 2022. It is now read-only.

Inference is giving ValueError: When input_signature is provided, all inputs to the Python function must be convertible to tensors: #32

Open
tumusudheer opened this issue Jun 12, 2020 · 5 comments

Comments

@tumusudheer
Copy link

tumusudheer commented Jun 12, 2020

Hi,

I started training the model using the entire Common Voice dataset given in the github page.
I'm using tensorflow 2.2.0 with python 3.6. The training command used
python run_rnnt.py --mode train --data_dir data_trail/preprocessed --batch_size 8 --eval_size 100
using 1080Ti single GPU. I got OOM error after about 18k steps (still in Epoch 0) and my loss was about 116.7. The Accuracy graph in tensorboard is showing about 0.42.

Since a checkpoint is getting saved for every 1000 steps, I tried to run evaluation:
python transcribe_file.py --checkpoint model/checkpoint_15000_109.9516.hdf5 --i data_trail/clips/common_voice_en_19945797.wav

But I'm getting the following error:

2020-06-12 11:19:38.910255: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-12 11:19:38.929092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-06-12 11:19:38.929267: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-12 11:19:38.930665: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-12 11:19:38.931896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-12 11:19:38.932090: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-12 11:19:38.933532: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-12 11:19:38.934265: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-12 11:19:38.937197: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-12 11:19:38.938298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-12 11:19:38.938559: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
2020-06-12 11:19:38.943923: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3398040000 Hz
2020-06-12 11:19:38.944538: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4f70350 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-12 11:19:38.944555: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-12 11:19:39.013200: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2bfda90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-12 11:19:39.013246: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-06-12 11:19:39.014703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-06-12 11:19:39.014784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-12 11:19:39.014824: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-12 11:19:39.014860: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-12 11:19:39.014896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-12 11:19:39.014931: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-12 11:19:39.014962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-12 11:19:39.014991: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-12 11:19:39.017390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-12 11:19:39.017452: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-12 11:19:39.020326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-12 11:19:39.020350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-06-12 11:19:39.020361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-06-12 11:19:39.022881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9907 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-06-12 11:19:41.880417: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-12 11:19:41.984154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
Traceback (most recent call last):
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2293, in _convert_inputs_to_signature
    value, dtype_hint=spec.dtype)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1341, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 321, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 262, in constant
    allow_broadcast=True)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 270, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "transcribe_file.py", line 59, in <module>
    main(args)
  File "transcribe_file.py", line 38, in main
    decoded = decoder_fn(log_melspec)[0]
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 648, in _call
    *args, **kwds)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2238, in canonicalize_function_inputs
    self._flat_input_signature)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2299, in _convert_inputs_to_signature
    format_error_message(inputs, input_signature))
ValueError: When input_signature is provided, all inputs to the Python function must be convertible to tensors:
  inputs: (
    tf.Tensor(
[[[ -9.891962   -10.041118   -10.170887   ...  -2.5574753   -3.1098373
    -2.8594036 ]
  [ -4.2638397   -3.8721824   -3.818324   ...  -1.2381899   -1.4718239
    -1.2757974 ]
  [ -3.8065548   -3.9217172   -3.9833403  ...  -2.5127609   -2.4093955
    -1.8164482 ]
  ...
  [  0.26996142   0.24929267   0.10105902 ...  -1.764302    -1.2930858
    -1.6539826 ]
  [ -1.3995155   -1.8580544   -2.5036726  ...  -1.9249303   -2.1395605
    -1.7865329 ]
  [ -2.521644    -2.1898646   -2.1456     ...  -2.134868    -2.5040653
    -2.1412349 ]]], shape=(1, 166, 240), dtype=float32),
    None)
  input_signature: (
    TensorSpec(shape=(None, None, 240), dtype=tf.float32, name=None),
    TensorSpec(shape=(), dtype=tf.int32, name=None))

Is this because here (hparams is not a tensor but a json) ?

@omerasif-itu
Copy link

Have you fixed the issue anyway?

@tumusudheer
Copy link
Author

Hi,

No didn't fix this issue. Not sure how to fix this

@stefan-falk
Copy link

@tumusudheer What did the word-error-rate look like during your training?

Mine does not look very promising:

image

@VictorChen2012
Copy link

VictorChen2012 commented Sep 29, 2020

Hi,

No didn't fix this issue. Not sure how to fix this

Finally, the issue is fixed following the 2nd solution in the issue of tensorflow repo.

iterator = iter(train_dataset)
@tf.function(input_signature=[iterator.element_spec])
def train_step(dataset_inputs):
    def step_fn(inputs):
        # ... 
for batch, inputs in enumerate(train_dataset):
    loss, metrics_results = train_step(next(iterator))

@li563042811
Copy link

Hi,
I met the same problem and tried to fix it according to the 2nd solution in tensorflow/tensorflow#29911 (comment)

but it didn't work the bug is here.

Traceback (most recent call last):
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2293, in _convert_inputs_to_signature
value, dtype_hint=spec.dtype)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1341, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 321, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 262, in constant
allow_broadcast=True)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 270, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (PerReplica:{
0: <tf.Tensor: shape=(1, 310, 240), dtype=float32, numpy=
array([[[-8.555949 , -8.693979 , -8.79496 , ..., -1.2911978,...

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_rnnt.py", line 598, in
app.run(main)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "run_rnnt.py", line 557, in main
eval_metrics=[accuracy_fn, wer_fn])
File "run_rnnt.py", line 357, in run_training
checkpoint_model()
File "run_rnnt.py", line 322, in checkpoint_model
metrics=eval_metrics)
File "run_rnnt.py", line 444, in run_evaluate
loss, metrics_results = eval_step(inputs)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 580, in call
result = self._call(*args, **kwds)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 648, in _call
*args, **kwds)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2238, in canonicalize_function_inputs
self._flat_input_signature)
File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2299, in _convert_inputs_to_signature
format_error_message(inputs, input_signature))
ValueError: When input_signature is provided, all inputs to the Python function must be convertible to tensors:
inputs: (
(PerReplica:{
0: <tf.Tensor: shape=(1, 310, 240), dtype=float32, numpy=
array([[[-8.555949 , -8.693979 , -8.79496 , ..., -1.2911978,...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants