Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch out of range & loss value becomes 'nan' when running monocular depth estimation #1832

Open
Gacha76 opened this issue Apr 13, 2024 · 6 comments
Assignees
Labels

Comments

@Gacha76
Copy link

Gacha76 commented Apr 13, 2024

Issue Type

Bug

Source

source

Keras Version

Keras 2.10

Custom Code

No

OS Platform and Distribution

Windows 11

Python version

3.10.13

GPU model and memory

RTX 3050 6GB

Current Behavior?

When calling the .fit() function to train the model, the 1st epoch runs as expected and stops when all batches have been iterated.

The problem starts from the 2nd epoch onwards where batches start running out of the given range and loss values become nan. Once the epoch is complete, the UI becomes normal again but this behavior is observed again for the 3rd epoch and so on.

Screenshot 2024-04-13 111550

All tutorials on Youtube running the same Colab notebook given by the Keras Team seem to run without having any issues and the model trains properly but this isn't the case when I run the notebook locally or on Colab using both CPU and GPU.

Standalone code to reproduce the issue or tutorial link

https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/depth_estimation.ipynb

Relevant log output

No response

@sachinprasadhs
Copy link
Collaborator

You seem to be using the older Keras version in your local system. Keras 3 with multi backend feature is available, you can upgrade the Keras package and try again.

pip install -U keras

@Gacha76
Copy link
Author

Gacha76 commented Apr 16, 2024

Below screenshot is using Keras 3. Batch no longer goes out of range but loss values still become nan.

Screenshot 2024-04-16 164217

@Gacha76 Gacha76 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 16, 2024
@Gacha76 Gacha76 reopened this Apr 16, 2024
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@sachinprasadhs
Copy link
Collaborator

The tutorial which you are referring to has not been migrated to Keras 3 yet, possibly due to some dependency on Tensorflow or Keras 2 APIs.

I was able ti run the tutorial successfully for 1 epoch with TensorFlow 2.15 which uses Keras 2.15 in it's backend.
Attaching the working Gist here for reference https://colab.sandbox.google.com/gist/sachinprasadhs/5aead85438db273c01e72ec257d6c09e/depth_estimation.ipynb

@Gacha76
Copy link
Author

Gacha76 commented Apr 17, 2024

It works for 1 epoch for me as well in both Keras 2 and Keras 3. The issue arises when I need to train the model for more than 1 epoch which results in the above behavior. Also, since the loss values become undefined, the neural network starts to output nothing but a black screen as shown here.

Screenshot 2024-04-17 090156

@sachinprasadhs
Copy link
Collaborator

In the published tutorial we can see output for more number of epochs.
Since the tutorial is not yet migrated to Keras 3, we can look at it once the tutorial is migrated to Keras 3, Keras team doesn't have enough bandwidth to migrate tutorials.
Community contributions are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants