Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REG_Loss is nan when training SiamRPN #46

Closed
wsuny opened this issue Oct 21, 2019 · 8 comments
Closed

REG_Loss is nan when training SiamRPN #46

wsuny opened this issue Oct 21, 2019 · 8 comments

Comments

@wsuny
Copy link

wsuny commented Oct 21, 2019

When trained SiamRPN on the dataset VID and GOT10K, i get a problem that REG_Loss is nan. But it was ok at the beginning of the epoch.

The problems are as follows:

PROGRESS: 0.18%

Epoch: [1][180/3125] lr : 0.0010000 Batch Time: 0.283 Data Time:0.012 CLS_Loss:0.45886 REG_Loss:2.81824 Loss:3.27710
Progress: 180 / 93750 [0%], Speed: 0.283 s/iter, ETA 0:07:21 (D:H:M)

PROGRESS: 0.19%

Epoch: [1][190/3125] lr : 0.0010000 Batch Time: 0.282 Data Time:0.012 CLS_Loss:0.45153 REG_Loss:2.76308 Loss:3.21461
Progress: 190 / 93750 [0%], Speed: 0.282 s/iter, ETA 0:07:19 (D:H:M)

PROGRESS: 0.20%

Epoch: [1][200/3125] lr : 0.0010000 Batch Time: 0.281 Data Time:0.011 CLS_Loss:0.44463 REG_Loss:nan Loss:nan
Progress: 200 / 93750 [0%], Speed: 0.281 s/iter, ETA 0:07:17 (D:H:M)

PROGRESS: 0.21%

Epoch: [1][210/3125] lr : 0.0010000 Batch Time: 0.280 Data Time:0.011 CLS_Loss:0.43784 REG_Loss:nan Loss:nan
Progress: 210 / 93750 [0%], Speed: 0.280 s/iter, ETA 0:07:17 (D:H:M)

/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:291: RuntimeWarning: invalid value encountered in log
delta[2] = np.log(tw / (w + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:292: RuntimeWarning: invalid value encountered in log
delta[3] = np.log(th / (h + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:291: RuntimeWarning: invalid value encountered in log
delta[2] = np.log(tw / (w + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:292: RuntimeWarning: invalid value encountered in log
delta[3] = np.log(th / (h + eps) + eps)
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.

It seems that the gradient exploded and i can not figured it out ,can you help me?

@JudasDie
Copy link
Contributor

When trained SiamRPN on the dataset VID and GOT10K, i get a problem that REG_Loss is nan. But it was ok at the beginning of the epoch.

The problems are as follows:

PROGRESS: 0.18%

Epoch: [1][180/3125] lr : 0.0010000 Batch Time: 0.283 Data Time:0.012 CLS_Loss:0.45886 REG_Loss:2.81824 Loss:3.27710
Progress: 180 / 93750 [0%], Speed: 0.283 s/iter, ETA 0:07:21 (D:H:M)

PROGRESS: 0.19%

Epoch: [1][190/3125] lr : 0.0010000 Batch Time: 0.282 Data Time:0.012 CLS_Loss:0.45153 REG_Loss:2.76308 Loss:3.21461
Progress: 190 / 93750 [0%], Speed: 0.282 s/iter, ETA 0:07:19 (D:H:M)

PROGRESS: 0.20%

Epoch: [1][200/3125] lr : 0.0010000 Batch Time: 0.281 Data Time:0.011 CLS_Loss:0.44463 REG_Loss:nan Loss:nan
Progress: 200 / 93750 [0%], Speed: 0.281 s/iter, ETA 0:07:17 (D:H:M)

PROGRESS: 0.21%

Epoch: [1][210/3125] lr : 0.0010000 Batch Time: 0.280 Data Time:0.011 CLS_Loss:0.43784 REG_Loss:nan Loss:nan
Progress: 210 / 93750 [0%], Speed: 0.280 s/iter, ETA 0:07:17 (D:H:M)

/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:291: RuntimeWarning: invalid value encountered in log
delta[2] = np.log(tw / (w + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:292: RuntimeWarning: invalid value encountered in log
delta[3] = np.log(th / (h + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:291: RuntimeWarning: invalid value encountered in log
delta[2] = np.log(tw / (w + eps) + eps)
/home/xgd/sunyang/SiamDW/siamese_tracking/../lib/dataset/siamrpn.py:292: RuntimeWarning: invalid value encountered in log
delta[3] = np.log(th / (h + eps) + eps)
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.

It seems that the gradient exploded and i can not figured it out ,can you help me?

Hi~ Thanks for your interest. I haven't met your problem up to now. It seems there are 0 or negative number for the input of log function. You may add some codes to adviod it, e.g. if w==0, delta[2] = 0.

@wsuny
Copy link
Author

wsuny commented Oct 23, 2019

Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K?

@JudasDie
Copy link
Contributor

Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K?

Oh, I remember that the label of GOT10K contains 0 (bugs for their dataset). You only need add aforementioned codes to dataloder.

# for got10k
        if float(tw) < 0: tw = 0
        if float(th) < 0: th = 0 

@hwpengms
Copy link
Contributor

hwpengms commented Oct 23, 2019 via email

@JudasDie
Copy link
Contributor

Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K?

Hi, have you fixed this problem?

@wsuny
Copy link
Author

wsuny commented Nov 19, 2019 via email

@JudasDie
Copy link
Contributor

I have fixed it, thank you! 发自我的iPad 在 2019年11月19日,下午10:16,ZP ZHANG notifications@github.com 写道:  Now,it is ok when i trained SiamRPN only on VID.But it is still NAN when i trained on GOT10K. It seems that it is caused by the annotation of GOT10K.Did you train SiamRPN on GOT10K? Hi, have you fixed this problem? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#46?email_source=notifications&email_token=AKR64KWGA5GM46AFBUFVHGLQUPYLPA5CNFSM4JC5SP4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOKTRQ#issuecomment-555526598>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKR64KRZA7HMNWIXR5BCTK3QUPYLPANCNFSM4JC5SP4A.

Glad to hear that. You can email me (zhangzhipeng2017@ia.ac.cn) for further talking.

@bluoluo
Copy link

bluoluo commented Mar 28, 2020

The GOT-10K dataset does have bugs.
Here are instructions.
It can be solved through this script temporarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants