COM prediction values are NaN #128

yuan0821 · 2022-10-15T10:26:50Z

Hi! I find the predicted com data are all NaN in the com3d0.mat. the parameters was set as below. Is there any problem in the video data loading?
I use dannce demo video data, the result was fine, but using my own 30s data, the result is NaN. Could anyone do me a favor to check what the problems? Thank you so much!!!

`

(dannce113) F:\testdannce120\dannce\demo\new919>com-train .\com_config_919.yaml
2022-10-15 18:13:11.561307: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
downfac not found in io.yaml file, falling back to main config
extension not found in io.yaml file, falling back to main config
io_config not found in io.yaml file, falling back to main config
crop_height not found in io.yaml file, falling back to main config
crop_width not found in io.yaml file, falling back to main config
n_channels_in not found in io.yaml file, falling back to main config
camnames not found in io.yaml file, falling back to main config
n_views not found in io.yaml file, falling back to main config
n_channels_out not found in io.yaml file, falling back to main config
batch_size not found in io.yaml file, falling back to main config
sigma not found in io.yaml file, falling back to main config
epochs not found in io.yaml file, falling back to main config
verbose not found in io.yaml file, falling back to main config
loss not found in io.yaml file, falling back to main config
lr not found in io.yaml file, falling back to main config
net not found in io.yaml file, falling back to main config
vid_dir_flag not found in io.yaml file, falling back to main config
metric not found in io.yaml file, falling back to main config
num_validation_per_exp not found in io.yaml file, falling back to main config
debug not found in io.yaml file, falling back to main config
max_num_samples not found in io.yaml file, falling back to main config
train_mode not found in io.yaml file, falling back to main config
com_finetune_weights not found in io.yaml file, falling back to main config
com_train_dir set to: .\COM\train_results
com_predict_dir set to: .\COM\predict_results
dannce_train_dir set to: .\DANNCE\train_results\AVG
dannce_predict_dir set to: .\DANNCE\predict_results
exp set to: [{'label3d_file': './20221015_173137_Label3D_dannce.mat'}]
downfac set to: 4
extension set to: .avi
io_config set to: io.yaml
crop_height set to: [0, 1152]
crop_width set to: [0, 1920]
n_channels_in set to: 1
camnames set to: ['Camera1', 'Camera2', 'Camera3']
n_views set to: 3
n_channels_out set to: 1
batch_size set to: 2
sigma set to: 18
epochs set to: 10
verbose set to: 1
loss set to: mask_nan_keep_loss
lr set to: 5e-5
net set to: unet2d_fullbn
vid_dir_flag set to: False
metric set to: mse
num_validation_per_exp set to: 10
debug set to: False
max_num_samples set to: 100
train_mode set to: finetune
com_finetune_weights set to: ..\markerless_mouse_1\COM\weights
base_config set to: .\com_config_919.yaml
viddir set to: videos
gpu_id set to: 0
immode set to: vid
mono set to: False
mirror set to: False
num_train_per_exp set to: None
augment_hue set to: False
augment_brightness set to: False
augment_hue_val set to: 0.05
augment_bright_val set to: 0.05
augment_rotation_val set to: 5
data_split_seed set to: None
valid_exp set to: None
dsmode set to: nn
augment_shift set to: False
augment_zoom set to: False
augment_shear set to: False
augment_rotation set to: False
augment_shear_val set to: 5
augment_zoom_val set to: 0.05
augment_shift_val set to: 0.05
start_batch set to: 0
chunks set to: None
lockfirst set to: None
load_valid set to: None
drop_landmark set to: None
raw_im_h set to: None
raw_im_w set to: None
n_instances set to: 1
start_sample set to: 0
write_npy set to: None
use_npy set to: False
com_predict_weights set to: None
com_debug set to: None
com_exp set to: None
Setting vid_dir_flag to True.
Setting extension to .avi.
Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0])}.
Setting n_channels_in to 3.
Setting raw_im_h to 2560.
Setting raw_im_w to 2560.
Experiment 0 using videos in .\videos
Experiment 0 using camnames: ['Camera1', 'Camera2', 'Camera3']
{'0_Camera1': array([0]), '0_Camera2': array([0]), '0_Camera3': array([0])}
./20221015_173137_Label3D_dannce.mat
Using nn downsampling
TRAIN EXPTS: [0]
Initializing Network...
2022-10-15 18:13:14.584720: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2022-10-15 18:13:14.646942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2022-10-15 18:13:14.654599: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2022-10-15 18:13:14.654802: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2022-10-15 18:13:14.655261: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2022-10-15 18:13:14.655673: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2022-10-15 18:13:14.718061: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2022-10-15 18:13:14.718307: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2022-10-15 18:13:14.719724: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2022-10-15 18:13:14.720154: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2022-10-15 18:13:14.720614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-10-15 18:13:14.732399: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-15 18:13:14.736139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2022-10-15 18:13:14.736483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-10-15 18:13:17.018957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-10-15 18:13:17.022789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2022-10-15 18:13:17.050138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2022-10-15 18:13:17.083234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7433 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:41:00.0, compute capability: 8.6)
E:\anaconda\envs\dannce113\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:375: UserWarning: The lr argument is deprecated, use learning_rate instead.
"The lr argument is deprecated, use learning_rate instead.")
COMPLETE

2022-10-15 18:13:18.002074: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2022-10-15 18:13:18.002236: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
2022-10-15 18:13:18.002834: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1611] Profiler found 1 GPUs
2022-10-15 18:13:18.004376: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cupti64_112.dll'; dlerror: cupti64_112.dll not found
2022-10-15 18:13:18.005366: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cupti.dll'; dlerror: cupti.dll not found
2022-10-15 18:13:18.005806: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1661] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
2022-10-15 18:13:18.005906: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down.
2022-10-15 18:13:18.005966: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1752] function cupti_interface_->Finalize()failed with error CUPTI could not be loaded or symbol could not be found.
Loading data
Loading new video: .\videos\Camera1\0.avi for 0_Camera1
Loading new video: .\videos\Camera2\0.avi for 0_Camera2
Loading new video: .\videos\Camera3\0.avi for 0_Camera3
f:\testdannce120\dannce\dannce\engine\generator_aux.py:261: RuntimeWarning: invalid value encountered in true_divide
y /= np.max(np.max(y, axis=1), axis=1)[:, np.newaxis, np.newaxis, :]
Loading new video: .\videos\Camera1\0.avi for 0_Camera1
Loading new video: .\videos\Camera2\0.avi for 0_Camera2
Loading new video: .\videos\Camera3\0.avi for 0_Camera3
2022-10-15 18:13:34.113113: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10
2022-10-15 18:13:36.067056: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2022-10-15 18:13:39.639960: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8302
2022-10-15 18:13:44.926827: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2022-10-15 18:13:44.927055: W tensorflow/stream_executor/gpu/asm_compiler.cc:56] Couldn't invoke ptxas.exe --version
2022-10-15 18:13:44.931278: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2022-10-15 18:13:44.941195: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2022-10-15 18:13:45.323821: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2022-10-15 18:13:45.324555: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
1/1 [==============================] - 16s 16s/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 2/10
2022-10-15 18:13:50.695089: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2022-10-15 18:13:50.696455: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
2022-10-15 18:13:50.697011: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1661] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
1/1 [==============================] - ETA: 0s - loss: 0.0000e+002022-10-15 18:13:50.843437: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2022-10-15 18:13:50.843756: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1752] function cupti_interface_->Finalize()failed with error CUPTI could not be loaded or symbol could not be found.
2022-10-15 18:13:50.893555: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:673] GpuTracer has collected 0 callback api events and 0 activity events.
2022-10-15 18:13:50.922415: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down.
1/1 [==============================] - 1s 644ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 3/10
1/1 [==============================] - 0s 364ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 4/10
1/1 [==============================] - 0s 365ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 5/10
1/1 [==============================] - 0s 350ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 6/10
1/1 [==============================] - 0s 365ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 7/10
1/1 [==============================] - 0s 380ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 8/10
1/1 [==============================] - 0s 367ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 9/10
1/1 [==============================] - 0s 350ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 10/10
1/1 [==============================] - 0s 367ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00
Renaming weights file with best epoch description
Saving full model at end of training
`

The text was updated successfully, but these errors were encountered:

yuan0821 · 2022-10-15T10:29:58Z

Below is my com-predict log.

(dannce113) F:\testdannce120\dannce\demo\new919>com-predict .\com_config_919.yaml 2022-10-15 18:14:07.426269: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll downfac not found in io.yaml file, falling back to main config extension not found in io.yaml file, falling back to main config io_config not found in io.yaml file, falling back to main config crop_height not found in io.yaml file, falling back to main config crop_width not found in io.yaml file, falling back to main config n_channels_in not found in io.yaml file, falling back to main config camnames not found in io.yaml file, falling back to main config n_views not found in io.yaml file, falling back to main config n_channels_out not found in io.yaml file, falling back to main config batch_size not found in io.yaml file, falling back to main config sigma not found in io.yaml file, falling back to main config epochs not found in io.yaml file, falling back to main config verbose not found in io.yaml file, falling back to main config loss not found in io.yaml file, falling back to main config lr not found in io.yaml file, falling back to main config net not found in io.yaml file, falling back to main config vid_dir_flag not found in io.yaml file, falling back to main config metric not found in io.yaml file, falling back to main config num_validation_per_exp not found in io.yaml file, falling back to main config debug not found in io.yaml file, falling back to main config max_num_samples not found in io.yaml file, falling back to main config train_mode not found in io.yaml file, falling back to main config com_finetune_weights not found in io.yaml file, falling back to main config com_train_dir set to: .\COM\train_results\ com_predict_dir set to: .\COM\predict_results\ dannce_train_dir set to: .\DANNCE\train_results\AVG\ dannce_predict_dir set to: .\DANNCE\predict_results\ exp set to: [{'label3d_file': './20221015_173137_Label3D_dannce.mat'}] downfac set to: 4 extension set to: .avi io_config set to: io.yaml crop_height set to: [0, 1152] crop_width set to: [0, 1920] n_channels_in set to: 1 camnames set to: ['Camera1', 'Camera2', 'Camera3'] n_views set to: 3 n_channels_out set to: 1 batch_size set to: 2 sigma set to: 18 epochs set to: 10 verbose set to: 1 loss set to: mask_nan_keep_loss lr set to: 5e-5 net set to: unet2d_fullbn vid_dir_flag set to: False metric set to: mse num_validation_per_exp set to: 10 debug set to: False max_num_samples set to: 100 train_mode set to: finetune com_finetune_weights set to: ..\markerless_mouse_1\COM\weights\ base_config set to: .\com_config_919.yaml viddir set to: videos gpu_id set to: 0 immode set to: vid mono set to: False mirror set to: False start_batch set to: 0 start_sample set to: 0 dsmode set to: nn com_predict_weights set to: None num_train_per_exp set to: None chunks set to: None lockfirst set to: None load_valid set to: None augment_hue set to: False augment_brightness set to: False augment_hue_val set to: 0.05 augment_bright_val set to: 0.05 augment_rotation_val set to: 5 drop_landmark set to: None raw_im_h set to: None raw_im_w set to: None n_instances set to: 1 write_npy set to: None use_npy set to: False data_split_seed set to: None valid_exp set to: None com_debug set to: None com_exp set to: None augment_rotation set to: False augment_shear set to: False augment_zoom set to: False augment_shift set to: False augment_shear_val set to: 5 augment_zoom_val set to: 0.05 augment_shift_val set to: 0.05 Setting vid_dir_flag to True. Setting extension to .avi. Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0])}. Setting n_channels_in to 3. Setting raw_im_h to 2560. Setting raw_im_w to 2560. Using the following *dannce.mat files: .\20221015_173137_Label3D_dannce.mat Using camnames: ['Camera1', 'Camera2', 'Camera3'] Initializing Network... 2022-10-15 18:14:10.325801: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2022-10-15 18:14:10.349421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-15 18:14:10.349690: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2022-10-15 18:14:10.352770: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2022-10-15 18:14:10.353266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2022-10-15 18:14:10.354040: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2022-10-15 18:14:10.355596: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2022-10-15 18:14:10.355709: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2022-10-15 18:14:10.356146: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2022-10-15 18:14:10.356725: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2022-10-15 18:14:10.357199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-15 18:14:10.357985: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-10-15 18:14:10.361644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-15 18:14:10.361924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-15 18:14:10.754055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-10-15 18:14:10.754230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2022-10-15 18:14:10.755581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2022-10-15 18:14:10.756272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7433 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:41:00.0, compute capability: 8.6) E:\anaconda\envs\dannce113\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:375: UserWarning: The lrargument is deprecated, uselearning_rateinstead. "Thelrargument is deprecated, uselearning_rate` instead.")
Loading weights from .\COM\train_results\weights.0-0.00000.hdf5
COMPLETE

Predicting on sample 0
Loading new video: videos\Camera1\0.avi for Camera1
Loading new video: videos\Camera2\0.avi for Camera2
Loading new video: videos\Camera3\0.avi for Camera3
2022-10-15 18:14:12.697986: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-10-15 18:14:12.947341: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2022-10-15 18:14:13.646958: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8302
2022-10-15 18:14:14.571302: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2022-10-15 18:14:14.571485: W tensorflow/stream_executor/gpu/asm_compiler.cc:56] Couldn't invoke ptxas.exe --version
2022-10-15 18:14:14.575616: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2022-10-15 18:14:14.576118: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2022-10-15 18:14:14.605683: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2022-10-15 18:14:14.606093: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
Predicting on sample 1
Predicting on sample 2
Predicting on sample 3
Predicting on sample 4
Predicting on sample 5
Predicting on sample 6
Predicting on sample 7
Predicting on sample 8
Predicting on sample 9
Predicting on sample 10
Predicting on sample 11
Predicting on sample 12
Predicting on sample 13
Predicting on sample 14
Predicting on sample 15
Predicting on sample 16
Predicting on sample 17
Predicting on sample 18
Predicting on sample 19
Predicting on sample 20
Predicting on sample 21
Predicting on sample 22
Predicting on sample 23
Predicting on sample 24
Predicting on sample 25
Predicting on sample 26
Predicting on sample 27
Predicting on sample 28
Predicting on sample 29
Predicting on sample 30
Predicting on sample 31
Predicting on sample 32
Predicting on sample 33
Predicting on sample 34
Predicting on sample 35
Predicting on sample 36
Predicting on sample 37
Predicting on sample 38
Predicting on sample 39
Predicting on sample 40
Predicting on sample 41
Predicting on sample 42
Predicting on sample 43
Predicting on sample 44
Predicting on sample 45
Predicting on sample 46
Predicting on sample 47
Predicting on sample 48
Predicting on sample 49
Predicting on sample 50
Predicting on sample 51
Predicting on sample 52
Predicting on sample 53
Predicting on sample 54
Predicting on sample 55
Predicting on sample 56
Predicting on sample 57
Predicting on sample 58
Predicting on sample 59
Predicting on sample 60
Predicting on sample 61
Predicting on sample 62
Predicting on sample 63
Predicting on sample 64
Predicting on sample 65
Predicting on sample 66
Predicting on sample 67
Predicting on sample 68
Predicting on sample 69
Predicting on sample 70
Predicting on sample 71
Predicting on sample 72
Predicting on sample 73
Predicting on sample 74
Predicting on sample 75
Predicting on sample 76
Predicting on sample 77
Predicting on sample 78
Predicting on sample 79
Predicting on sample 80
Predicting on sample 81
Predicting on sample 82
Predicting on sample 83
Predicting on sample 84
Predicting on sample 85
Predicting on sample 86
Predicting on sample 87
Predicting on sample 88
Predicting on sample 89
Predicting on sample 90
Predicting on sample 91
Predicting on sample 92
Predicting on sample 93
Predicting on sample 94
Predicting on sample 95
Predicting on sample 96
Predicting on sample 97
Predicting on sample 98
Predicting on sample 99
using median to get 3D COM
E:\anaconda\envs\dannce113\lib\site-packages\numpy\lib\nanfunctions.py:1114: RuntimeWarning: All-NaN slice encountered
overwrite_input=overwrite_input)
Saving 3D COM to .\COM\predict_results\com3d0.mat
done!`

histun · 2023-01-26T15:15:17Z

Have you fixed this issue?

I've used 1) the provided weight (weights.250-0.00036.hdf5) as well as 2) a newly generated fintuned weight (using the provided pretrained weights.rat.COM.hdf5 and the dannce.mat), however, I always get NaN values for com-predict. (99 NaNs out of 100 predictions).

I thought it may have to do with my env setting, but 1) predicting dannce with the provided weight (weights.12000-0.00014.hdf5) and 2) finetuning/predicting (weights.rat.MAX.6cam.hdf5 with the dannce.mat) seem to work fine.

I'm puzzled with the problems I'm having with COM.
I was wondering if anyone has any ideas.

histun · 2023-01-30T15:04:47Z

I fixed this issue by reinstalling dannce with the from the development branch, which had TF2.4
Since I have RTX ada, I installed cuda 11.8 and cudnn 8.7.0 from the nvidia website following their installation guide.
After this, com-predict with the demo dataset and weights worked fine without NaN data.

v738301 mentioned this issue Nov 7, 2022

Zero training/validation errors but completely wrong in labeled images. #130

Closed

spoonsso closed this as completed May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COM prediction values are NaN #128

COM prediction values are NaN #128

yuan0821 commented Oct 15, 2022 •

edited

Loading

yuan0821 commented Oct 15, 2022

histun commented Jan 26, 2023

histun commented Jan 30, 2023

COM prediction values are NaN #128

COM prediction values are NaN #128

Comments

yuan0821 commented Oct 15, 2022 • edited Loading

yuan0821 commented Oct 15, 2022

histun commented Jan 26, 2023

histun commented Jan 30, 2023

yuan0821 commented Oct 15, 2022 •

edited

Loading