Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RESOURCE_EXHAUSTED: Out of memory while trying to allocate # bytes. #20

Closed
vgthengane opened this issue Aug 5, 2022 · 0 comments
Closed

Comments

@vgthengane
Copy link

Hi, I am trying to run the model on a CIFAR100 dataset. I am getting the following error. I have 4 Tesla V100 GPUs.

2022-08-05 10:00:26.833817: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.00MiB (rounded to 9437184)requested by op 
2022-08-05 10:00:26.835182: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:491] *********************************************************************************x**************x***
2022-08-05 10:00:26.835281: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9437184 bytes.
BufferAssignment OOM Debugging.

The complete running logs can be found here. Please help me with solving the issue.

===============

For your information, I was getting a RuntimeError: Visible devices cannot be modified after being initialized error. Hence, I added the following code snippet in main.py from https://www.tensorflow.org/guide/gpu, and it solved the issue.

"""Main file for running the example."""

import os
os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"] = "false"

imports ...

FLAGS = flags.FLAGS
...

def main(argv):
  del argv

  # Hide any GPUs form TensorFlow. Otherwise TF might reserve memory and make
  # it unavailable to JAX.
  # tf.config.experimental.set_visible_devices([], "GPU")

  gpus = tf.config.list_physical_devices('GPU')
  if gpus:
    # Restrict TensorFlow to only use the first GPU
    try:
      tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
      logical_gpus = tf.config.list_logical_devices('GPU')
      print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
    except RuntimeError as e:
      # Visible devices must be set before GPUs have been initialized
      print(e)

  # if gpus:
  #   # Create 2 virtual GPUs with 1GB memory each
  #   try:
  #     tf.config.set_logical_device_configuration(
  #         gpus[0],
  #         [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
  #         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
  #     logical_gpus = tf.config.list_logical_devices('GPU')
  #     print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  #   except RuntimeError as e:
  #     # Virtual devices must be set before GPUs have been initialized
  #     print(e)


  if FLAGS.exp_id:
     ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant