-
Notifications
You must be signed in to change notification settings - Fork 126
Closed
Labels
staleNo recent activityNo recent activity
Description
I try following
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#training-the-model
in google colab
but when training is started it always fails
2021-11-24 04:51:47.954507: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-11-24 04:51:47.958479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Traceback (most recent call last):
File "model_main_tf2.py", line 115, in <module>
tf.compat.v1.app.run()
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 112, in main
record_summaries=FLAGS.record_summaries)
File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 603, in train_loop
train_input, unpad_groundtruth_tensors)
File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 394, in load_fine_tune_checkpoint
_ensure_model_is_built(model, input_dataset, unpad_groundtruth_tensors)
File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 176, in _ensure_model_is_built
labels,
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1286, in run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 2849, in call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/mirrored_strategy.py", line 671, in _call_for_each_replica
self._container_strategy(), fn, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 86, in call_for_each_replica
return wrapped(args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call
return self._stateless_fn(*args, **kwds)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3040, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1964, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 596, in call
ctx=ctx)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/conv1_conv/Conv2D (defined at /local/lib/python3.7/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1346) ]]
[[Loss/RPNLoss/BalancedPositiveNegativeSampler_1/Cast_8/_588]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/conv1_conv/Conv2D (defined at /local/lib/python3.7/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1346) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__dummy_computation_fn_44910]
its there any way I can work around it?
I try using lower version TensorFlow like 2.4.0 but not working
the information about google colab CUDA is like this
!nvcc --version
!nvidia-smi
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
Wed Nov 24 04:45:06 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 40C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
and tensoflow version that I install using pip
import tensorflow as tf
print(tf.__version__)
sys_details = tf.sysconfig.get_build_info()
cuda_version = sys_details["cuda_version"]
print(cuda_version)
cudnn_version = sys_details["cudnn_version"]
print(cudnn_version)
2.7.0
11.2
8
Metadata
Metadata
Assignees
Labels
staleNo recent activityNo recent activity