Training Donut raises "Tensor had NaN values" #43

WGierke · 2018-05-31T13:54:15Z

While training Donut on Synthetic Shift Outliers an exception occured:gradient for model/donut/p_x_given_z/mean/dense/bias:0 has numeric issue : Tensor had NaN values [[Node: quiet_donut_trainer_9/CheckNumerics_13 = CheckNumerics[T=DT_FLOAT, message="gradient for model/donut/p_x_given_z/mean/dense/bias:0 has numeric issue", _device="/job:localhost/replica:0/task:0/device:CPU:0"](quiet_donut_trainer_9/clip_by_norm_13/truediv)]]

The text was updated successfully, but these errors were encountered:

maxifischer · 2018-06-05T10:09:17Z

It looks like the gradients become 0, something is going wrong in the training. Setting optimizer_params = {epsilon:1e-05} in QuietDonutTrainer (increasing epsilon for AdamOptimizer) removed the error for me. But it's worth a discussion whether and how much we should improve the parameters of Donut.

* Adapt Donut to missing data * Always use 'num_epochs', replace NaNs with 0 in RNN_EBM and LSTM_Enc_Dec * Remove use_zero to let the detectors decide what happens with NaN's * Lower epochs of LSTM_Enc_Dec

WGierke · 2018-06-19T15:40:47Z

Solved via #82

WGierke added the bug Something isn't working label May 31, 2018

WGierke added this to To do in MP Jun 13, 2018

WGierke mentioned this issue Jun 14, 2018

#43 #75 #81 Adapt Donut to Missing Values #82

Merged

WGierke closed this as completed Jun 19, 2018

MP automation moved this from To do to Done Jun 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Donut raises "Tensor had NaN values" #43

Training Donut raises "Tensor had NaN values" #43

WGierke commented May 31, 2018

maxifischer commented Jun 5, 2018

WGierke commented Jun 19, 2018

Training Donut raises "Tensor had NaN values" #43

Training Donut raises "Tensor had NaN values" #43

Comments

WGierke commented May 31, 2018

maxifischer commented Jun 5, 2018

WGierke commented Jun 19, 2018