Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeWarning: invalid value encountered in less #59

Closed
rafaleo opened this issue Mar 16, 2020 · 1 comment
Closed

RuntimeWarning: invalid value encountered in less #59

rafaleo opened this issue Mar 16, 2020 · 1 comment

Comments

@rafaleo
Copy link

rafaleo commented Mar 16, 2020

I'm training attention model on different data. I've encountered some strange error after several epochs of running:

Using TensorFlow backend.
num samples: 29024
input seq: 29024
Found 5000 unique input tokens.
target seq: 29024 | inp: 29024
Found 5000 unique output tokens.
encoder_data.shape: (29024, 11)
encoder_data[0]: [ 0  0  0  0  0  0  0  0  0  0 43]
decoder_data[0]: [  3 266   1   0   0   0   0   0   0   0   0   0   0   0]
decoder_data.shape: (29024, 14)
Loading word vectors...
Found 400000 word vectors.
Filling pre-trained embeddings...
OUTPUT size: (29024, 14, 5001)
2020-03-16 09:37:03.339535: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default
inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
C:\Users\cp\Anaconda3\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:433: UserWarning: Converting
sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Train on 23219 samples, validate on 5805 samples
Epoch 1/50
23219/23219 [==============================] - 1056s 45ms/step - loss: 1.6740 - acc: 0.4022 - val_loss: 2.0796 - val_acc
: 0.3890

Epoch 00001: val_loss improved from inf to 2.07957, saving model to ./large_files/weights/engpol-30k-epoch.01-loss.2.08.
hdf5
Epoch 2/50
23219/23219 [==============================] - 1019s 44ms/step - loss: 1.2243 - acc: 0.5152 - val_loss: 1.8456 - val_acc
: 0.4375

Epoch 00002: val_loss improved from 2.07957 to 1.84557, saving model to ./large_files/weights/engpol-30k-epoch.02-loss.1
.85.hdf5
Epoch 3/50
23219/23219 [==============================] - 1051s 45ms/step - loss: 0.9595 - acc: 0.5739 - val_loss: 1.7147 - val_acc
: 0.4640

Epoch 00003: val_loss improved from 1.84557 to 1.71466, saving model to ./large_files/weights/engpol-30k-epoch.03-loss.1
.71.hdf5
Epoch 4/50
23219/23219 [==============================] - 1099s 47ms/step - loss: 0.7664 - acc: 0.6238 - val_loss: 1.6391 - val_acc
: 0.4823

Epoch 00004: val_loss improved from 1.71466 to 1.63908, saving model to ./large_files/weights/engpol-30k-epoch.04-loss.1
.64.hdf5
Epoch 5/50
23219/23219 [==============================] - 1021s 44ms/step - loss: 0.6217 - acc: 0.6725 - val_loss: 1.6114 - val_acc
: 0.4919

Epoch 00005: val_loss improved from 1.63908 to 1.61137, saving model to ./large_files/weights/engpol-30k-epoch.05-loss.1
.61.hdf5
Epoch 6/50
23219/23219 [==============================] - 1021s 44ms/step - loss: 0.5111 - acc: 0.7154 - val_loss: 1.6024 - val_acc
: 0.5002

Epoch 00006: val_loss improved from 1.61137 to 1.60242, saving model to ./large_files/weights/engpol-30k-epoch.06-loss.1
.60.hdf5
Epoch 7/50
23219/23219 [==============================] - 1034s 45ms/step - loss: nan - acc: 0.4895 - val_loss: nan - val_acc: 0.00
00e+00
C:\Users\cp\Anaconda3\lib\site-packages\keras\callbacks\callbacks.py:709: RuntimeWarning: invalid value encountered in l
ess
  if self.monitor_op(current, self.best):

Epoch 00007: val_loss did not improve from 1.60242
Epoch 8/50
 9796/23219 [===========>..................] - ETA: 11:31 - loss: nan - acc: 0.0000e+00Traceback (most recent call last)

I've checked input data, it seems ok. No missing values. What could cause the issue during train? How can I monitor what went wrong? is it possible than some value goes to infinity (in the current format of matrix data)? The problem occurs always when validation loss is close to converge (apparently).

@lazyprogrammer
Copy link
Owner

Please use the course Q&A for course-related questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants