Autokeras timeseries_forecaster official Tutorial : Colab script works with CPU, but not with GPU : CudnnRNN "Fail to find the dnn implementation." #1638

Metawhy · 2021-10-13T10:25:42Z

Bug Description

When we call the ".fit()" method on the TimeSeriesForcaster autokeras model , it throws :

UnknownError:    Fail to find the dnn implementation.
	 [[{{node CudnnRNN}}]]
	 [[model/bidirectional/backward_lstm/PartitionedCall]] [Op:__inference_train_function_16009]

Function call stack:
train_function -> train_function -> train_function

Similar to issues like tensorflow/tensorflow#36508

But all solutions tested failed

Bug Reproduction

Simply switching from CPU to GPU provokes the error

Can be reproduced with Keras official tutorial colab link : https://colab.research.google.com/github/keras-team/autokeras/blob/master/docs/ipynb/timeseries_forecaster.ipynb

Code for reproducing the bug, including 4 different solutions tested independently & then together, to no success :

https://colab.research.google.com/drive/1HOpCzGvjU3t3Mg1Ptscshr2rHvWJobHX?usp=sharing

Data used by the code:

The standard data from the AutoKearas tutorial example https://archive.ics.uci.edu/ml/machine-learning-databases/00360/AirQualityUCI.zip

Manually downloaded & uploaded when "tf.keras.utils.get_file()" does not work : AirQualityUCI.csv

Expected Behavior

Runs without the error like when the session uses a CPU

Setup Details

Include the details about the versions of:

OS type and version: Colab
Python: 3.7
autokeras: 1.0.16
keras-tuner: 1.0.4
scikit-learn: 0.22.2.post1
numpy: 1.19.5
pandas: 1.1.5
tensorflow: 2.5.0
cuda : 11.1.105
cudnn : 7.6.5

Additional context

Tried & included code of these solutions but did not work

Solution 0 : try to install tensorflow GPU adapted verison for autokeras

!pip3 install tensorflow==2.5.0 --upgrade 
!pip3 install tensorflow-gpu==2.5.0 --upgrade #https://pypi.org/project/tensorflow-gpu/#history

Solution test # 1 : seen many times on github & stackoverflow

From https://stackoverflow.com/questions/54473254/cudnnlstm-unknownerror-fail-to-find-the-dnn-implementation

# 
import tensorflow as tf
gpus_exp = tf.config.experimental.list_physical_devices('GPU')
if len(gpus_exp) > 0:
    print(f'Len gpus_exp={len(gpus_exp)}, changing memory growth param')
    try:
        # From https://blog.csdn.net/ljyljyok/article/details/107619881
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except Exception as e:
        # Invalid device or cannot modify virtual devices once initialized.
        print(f'Did not manage to change memory growth param. Error :')
        print(e)
    pass

print('\n')

Solution test # 1 bis

From https://blog.csdn.net/ljyljyok/article/details/107619881

# Try alternative if did not find with first method, but should be equivalent
gpus = tf.config.list_physical_devices('GPU')
if len(gpus) > 0:
    print(f'Len gpus={len(gpus)}, changing memory growth param')
    try:
        for gpu in gpus:
            tf.config.set_memory_growth(gpu, True)
    except Exception as e:
        # Invalid device or cannot modify virtual devices once initialized.
        print(f'Did not manage to change memory growth param. Error :')
        print(e)
    pass

Solution # 2

From https://soowankim.github.io/2020-05-29/Keras-Cudnn-failed-initialize/

import os
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

Solution # 3

From https://leimao.github.io/blog/TensorFlow-cuDNN-Failure/ | https://www.titanwolf.org/Network/q/7812eb9a-c361-44c4-ad9f-dd4a437ba164/y

# PS: Seems to be for TF 1.X
tf_config = tf.compat.v1.ConfigProto()
tf_config.gpu_options.allow_growth = True
tf_interactive_session = tf.compat.v1.InteractiveSession(config=tf_config)
tf_session = tf.compat.v1.Session(config=tf_config)

ak.session = tf_session

The text was updated successfully, but these errors were encountered:

AschHarwood · 2021-10-20T12:21:12Z

I'm having the exact same problem in colab

I used the exact code from the timeseries tutorial with my tabular data:

predict_from = 1
predict_until = 10
lookback = 3
clf = ak.TimeseriesForecaster(
    lookback=lookback,
    predict_from=predict_from,
    predict_until=predict_until,
    max_trials=1,
    objective="val_loss",
)

clf.fit(
    x=x_train,
    y=y_train,
    validation_data=(x_test, y_test),
    batch_size=32,
    epochs=10,
)

Output:

Search: Running Trial #1

Hyperparameter    |Value             |Best Value So Far 
timeseries_bloc...|True              |?                 
timeseries_bloc...|lstm              |?                 
timeseries_bloc...|2                 |?                 
regression_head...|0                 |?                 
optimizer         |adam              |?                 
learning_rate     |0.001             |?                 

Epoch 1/10
2021-10-20 11:51:51.322633: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-10-20 11:51:52.437468: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-10-20 11:51:52.438853: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at cudnn_rnn_ops.cc:1553 : Unknown: Fail to find the dnn implementation.
2021-10-20 11:51:52.441772: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-10-20 11:51:52.442871: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at cudnn_rnn_ops.cc:1553 : Unknown: Fail to find the dnn implementation.
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
/tmp/ipykernel_8334/638650037.py in <module>
      4     validation_data=(x_test, y_test),
      5     batch_size=32,
----> 6     epochs=10,
      7 )

17 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

UnknownError:    Fail to find the dnn implementation.
	 [[{{node CudnnRNN}}]]
	 [[model/bidirectional/backward_lstm/PartitionedCall]] [Op:__inference_train_function_477706]

Function call stack:
train_function -> train_function -> train_function

younader · 2022-01-10T21:15:35Z

Same issue for me, I can run on CPU but not in gpu, without any code modifications to the text classification notebook, running all on Colab throws

UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/conv1d/conv1d (defined at /usr/local/lib/python3.7/dist-packages/autokeras/utils/utils.py:88) ]]
[[gradient_tape/model/embedding/embedding_lookup/Reshape/_76]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/conv1d/conv1d (defined at /usr/local/lib/python3.7/dist-packages/autokeras/utils/utils.py:88) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_5467]

Function call stack:
train_function -> train_function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autokeras timeseries_forecaster official Tutorial : Colab script works with CPU, but not with GPU : CudnnRNN "Fail to find the dnn implementation." #1638

Autokeras timeseries_forecaster official Tutorial : Colab script works with CPU, but not with GPU : CudnnRNN "Fail to find the dnn implementation." #1638

Metawhy commented Oct 13, 2021 •

edited

AschHarwood commented Oct 20, 2021

younader commented Jan 10, 2022

Autokeras timeseries_forecaster official Tutorial : Colab script works with CPU, but not with GPU : CudnnRNN "Fail to find the dnn implementation." #1638

Autokeras timeseries_forecaster official Tutorial : Colab script works with CPU, but not with GPU : CudnnRNN "Fail to find the dnn implementation." #1638

Comments

Metawhy commented Oct 13, 2021 • edited

Bug Description

Bug Reproduction

Data used by the code:

Expected Behavior

Setup Details

Additional context

Solution 0 : try to install tensorflow GPU adapted verison for autokeras

Solution test # 1 : seen many times on github & stackoverflow

Solution test # 1 bis

Solution # 2

Solution # 3

AschHarwood commented Oct 20, 2021

younader commented Jan 10, 2022

Metawhy commented Oct 13, 2021 •

edited