You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am trying to run the model on a CIFAR100 dataset. I am getting the following error. I have 4 Tesla V100 GPUs.
2022-08-05 10:00:26.833817: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.00MiB (rounded to 9437184)requested by op
2022-08-05 10:00:26.835182: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:491] *********************************************************************************x**************x***
2022-08-05 10:00:26.835281: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9437184 bytes.
BufferAssignment OOM Debugging.
The complete running logs can be found here. Please help me with solving the issue.
===============
For your information, I was getting a RuntimeError: Visible devices cannot be modified after being initialized error. Hence, I added the following code snippet in main.py from https://www.tensorflow.org/guide/gpu, and it solved the issue.
"""Main file for running the example."""
import os
os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"] = "false"
imports ...
FLAGS = flags.FLAGS
...
def main(argv):
del argv
# Hide any GPUs form TensorFlow. Otherwise TF might reserve memory and make
# it unavailable to JAX.
# tf.config.experimental.set_visible_devices([], "GPU")
gpus = tf.config.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
# Visible devices must be set before GPUs have been initialized
print(e)
# if gpus:
# # Create 2 virtual GPUs with 1GB memory each
# try:
# tf.config.set_logical_device_configuration(
# gpus[0],
# [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
# tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
# logical_gpus = tf.config.list_logical_devices('GPU')
# print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
# except RuntimeError as e:
# # Virtual devices must be set before GPUs have been initialized
# print(e)
if FLAGS.exp_id:
...
The text was updated successfully, but these errors were encountered:
Hi, I am trying to run the model on a CIFAR100 dataset. I am getting the following error. I have 4 Tesla V100 GPUs.
The complete running logs can be found here. Please help me with solving the issue.
===============
For your information, I was getting a
RuntimeError: Visible devices cannot be modified after being initialized
error. Hence, I added the following code snippet inmain.py
from https://www.tensorflow.org/guide/gpu, and it solved the issue.The text was updated successfully, but these errors were encountered: