REG_Loss is nan when training SiamRPN #46

wsuny · 2019-10-21T13:47:57Z

When trained SiamRPN on the dataset VID and GOT10K, i get a problem that REG_Loss is nan. But it was ok at the beginning of the epoch.

The problems are as follows:

PROGRESS: 0.18%

Epoch: [1][180/3125] lr : 0.0010000 Batch Time: 0.283 Data Time:0.012 CLS_Loss:0.45886 REG_Loss:2.81824 Loss:3.27710
Progress: 180 / 93750 [0%], Speed: 0.283 s/iter, ETA 0:07:21 (D:H:M)

PROGRESS: 0.19%

Epoch: [1][190/3125] lr : 0.0010000 Batch Time: 0.282 Data Time:0.012 CLS_Loss:0.45153 REG_Loss:2.76308 Loss:3.21461
Progress: 190 / 93750 [0%], Speed: 0.282 s/iter, ETA 0:07:19 (D:H:M)

PROGRESS: 0.20%

Epoch: [1][200/3125] lr : 0.0010000 Batch Time: 0.281 Data Time:0.011 CLS_Loss:0.44463 REG_Loss:nan Loss:nan
Progress: 200 / 93750 [0%], Speed: 0.281 s/iter, ETA 0:07:17 (D:H:M)

PROGRESS: 0.21%

Epoch: [1][210/3125] lr : 0.0010000 Batch Time: 0.280 Data Time:0.011 CLS_Loss:0.43784 REG_Loss:nan Loss:nan
Progress: 210 / 93750 [0%], Speed: 0.280 s/iter, ETA 0:07:17 (D:H:M)

/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:291: RuntimeWarning: invalid value encountered in log
delta[2] = np.log(tw / (w + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:292: RuntimeWarning: invalid value encountered in log
delta[3] = np.log(th / (h + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:291: RuntimeWarning: invalid value encountered in log
delta[2] = np.log(tw / (w + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:292: RuntimeWarning: invalid value encountered in log
delta[3] = np.log(th / (h + eps) + eps)
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.

It seems that the gradient exploded and i can not figured it out ,can you help me?

JudasDie · 2019-10-23T08:17:40Z

When trained SiamRPN on the dataset VID and GOT10K, i get a problem that REG_Loss is nan. But it was ok at the beginning of the epoch.

The problems are as follows:

PROGRESS: 0.18%

Epoch: [1][180/3125] lr : 0.0010000 Batch Time: 0.283 Data Time:0.012 CLS_Loss:0.45886 REG_Loss:2.81824 Loss:3.27710
Progress: 180 / 93750 [0%], Speed: 0.283 s/iter, ETA 0:07:21 (D:H:M)

PROGRESS: 0.19%

Epoch: [1][190/3125] lr : 0.0010000 Batch Time: 0.282 Data Time:0.012 CLS_Loss:0.45153 REG_Loss:2.76308 Loss:3.21461
Progress: 190 / 93750 [0%], Speed: 0.282 s/iter, ETA 0:07:19 (D:H:M)

PROGRESS: 0.20%

Epoch: [1][200/3125] lr : 0.0010000 Batch Time: 0.281 Data Time:0.011 CLS_Loss:0.44463 REG_Loss:nan Loss:nan
Progress: 200 / 93750 [0%], Speed: 0.281 s/iter, ETA 0:07:17 (D:H:M)

PROGRESS: 0.21%

Epoch: [1][210/3125] lr : 0.0010000 Batch Time: 0.280 Data Time:0.011 CLS_Loss:0.43784 REG_Loss:nan Loss:nan
Progress: 210 / 93750 [0%], Speed: 0.280 s/iter, ETA 0:07:17 (D:H:M)

/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:291: RuntimeWarning: invalid value encountered in log
delta[2] = np.log(tw / (w + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:292: RuntimeWarning: invalid value encountered in log
delta[3] = np.log(th / (h + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:291: RuntimeWarning: invalid value encountered in log
delta[2] = np.log(tw / (w + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:292: RuntimeWarning: invalid value encountered in log
delta[3] = np.log(th / (h + eps) + eps)
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.

It seems that the gradient exploded and i can not figured it out ,can you help me?

Hi~ Thanks for your interest. I haven't met your problem up to now. It seems there are 0 or negative number for the input of log function. You may add some codes to adviod it, e.g. if w==0, delta[2] = 0.

wsuny · 2019-10-23T09:10:30Z

Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K?

JudasDie · 2019-10-23T09:18:40Z

Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K?

Oh, I remember that the label of GOT10K contains 0 (bugs for their dataset). You only need add aforementioned codes to dataloder.

# for got10k
        if float(tw) < 0: tw = 0
        if float(th) < 0: th = 0

hwpengms · 2019-10-23T10:27:39Z

Zhipeng, could you help to fix this bug in the repo? Thanks, Houwen 发自我的iPhone

…

在 2019年10月23日，下午5:18，ZP ZHANG ***@***.***> 写道： Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K? Oh, I remember that the label of GOT10K contains 0 (bugs for their dataset). You only need add aforementioned codes to dataloder. # for got10k if float(tw) < 0: tw = 0 if float(th) < 0: th = 0 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

JudasDie · 2019-11-19T14:16:21Z

Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K?

Hi, have you fixed this problem?

wsuny · 2019-11-19T16:11:54Z

I have fixed it, thank you! 发自我的iPad 在 2019年11月19日，下午10:16，ZP ZHANG <notifications@github.com> 写道： Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K? Hi, have you fixed this problem? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#46?email_source=notifications&email_token=AKR64KWGA5GM46AFBUFVHGLQUPYLPA5CNFSM4JC5SP4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOKTRQ#issuecomment-555526598>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKR64KRZA7HMNWIXR5BCTK3QUPYLPANCNFSM4JC5SP4A>.

JudasDie · 2019-11-20T01:29:05Z

I have fixed it, thank you! 发自我的iPad 在 2019年11月19日，下午10:16，ZP ZHANG notifications@github.com 写道： Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K? Hi, have you fixed this problem? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#46?email_source=notifications&email_token=AKR64KWGA5GM46AFBUFVHGLQUPYLPA5CNFSM4JC5SP4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOKTRQ#issuecomment-555526598>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKR64KRZA7HMNWIXR5BCTK3QUPYLPANCNFSM4JC5SP4A.

Glad to hear that. You can email me (zhangzhipeng2017@ia.ac.cn) for further talking.

bluoluo · 2020-03-28T11:45:26Z

The GOT-10K dataset does have bugs.
Here are instructions.
It can be solved through this script temporarily.

JudasDie closed this as completed Nov 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REG_Loss is nan when training SiamRPN #46

REG_Loss is nan when training SiamRPN #46

wsuny commented Oct 21, 2019

JudasDie commented Oct 23, 2019

wsuny commented Oct 23, 2019

JudasDie commented Oct 23, 2019

hwpengms commented Oct 23, 2019 via email

JudasDie commented Nov 19, 2019

wsuny commented Nov 19, 2019 via email

JudasDie commented Nov 20, 2019

bluoluo commented Mar 28, 2020

REG_Loss is nan when training SiamRPN #46

REG_Loss is nan when training SiamRPN #46

Comments

wsuny commented Oct 21, 2019

JudasDie commented Oct 23, 2019

wsuny commented Oct 23, 2019

JudasDie commented Oct 23, 2019

hwpengms commented Oct 23, 2019 via email

JudasDie commented Nov 19, 2019

wsuny commented Nov 19, 2019 via email

JudasDie commented Nov 20, 2019

bluoluo commented Mar 28, 2020