Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug issue on Graph Generation #59

Closed
kshitijrajsharma opened this issue Jan 31, 2023 · 1 comment
Closed

Debug issue on Graph Generation #59

kshitijrajsharma opened this issue Jan 31, 2023 · 1 comment
Labels
bug Something isn't working component : backend

Comments

@kshitijrajsharma
Copy link
Member

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/celery/app/trace.py", line 451, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/celery/app/trace.py", line 734, in protected_call
return self.run(*args, **kwargs)
File "/app/core/tasks.py", line 95, in train_model
raise ex
File "/app/core/tasks.py", line 58, in train_model
final_accuracy, final_model_path = train(
File "/usr/local/lib/python3.8/dist-packages/hot_fair_utilities/training/train.py", line 56, in train
run_main_train_code(cfg)
File "/usr/local/lib/python3.8/dist-packages/hot_fair_utilities/training/run_training.py", line 279, in run_main_train_code
history = the_model.fit(
File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:

OOM when allocating tensor with shape[16,64,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/decoder_stage2b_relu/Relu-0-1-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_2701887]

@kshitijrajsharma kshitijrajsharma added bug Something isn't working component : backend labels Jan 31, 2023
@kshitijrajsharma
Copy link
Member Author

Couldn't reproduce this issue as well , will reopen if encountered again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component : backend
Projects
None yet
Development

No branches or pull requests

1 participant