Please kindly help us about Not convergent network #12

yaoanderson · 2019-05-17T04:50:44Z

Hi sowson,

We used your darknet network which running in our Macbook Pro Opencl, but so weird about our training based on your code, and it seems to be Obj: 0.500000, No Obj: 0.500000 all the time for hundreds circle training. And training is not convergent all the time.
Our data is from https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/ this article dataset.
network as below:

my yolov2.cfg as below:
[net]

Testing

#batch=1
#subdivisions=1

Training

batch=32
subdivisions=4
height=416
width=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 80200
policy=steps
steps=40000,60000
scales=.1,.1
.....
.....
[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
#anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
#anchors = 5,11, 9,19, 51,62, 104,114, 181,209, 279,376, 400,289, 357,377, 390,388
anchors = 6,14, 70,82, 176,190, 291,375, 382,377
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.3
rescore=1
....

Please kindly help about our issues, thanks.

yaoanderson · 2019-05-20T13:03:05Z

why no convergent for above data ?

sowson · 2019-05-23T22:51:38Z

@yaoanderson thank you very much for this issue... please git pull to get few changes I found thanks to your issue... and then I would recommend burn_in=10 not 1000... and please let me know if that is better :D... I tested and it should be :D. thanks again!

yaoanderson · 2019-05-24T00:11:14Z

thanks so much sowson, I will try now and give your feedback for this problem.

yaoanderson · 2019-05-24T00:33:01Z

Hi @sowson ,

I continue to train my network by using the newest code (git pull and cmake and make) and burn_in=10 now.
Before updating:

After updating:

It seems learning rate is expected now, thanks so much, and I will continue to train my network and give your my final result later.

By the way, can I continue to train my network based on the training result by old code from round 960 like as above screenshot ? Or I just retrain my new network based on your new code from round 1 ?

yaoanderson · 2019-05-24T05:23:06Z

Hi @sowson,
It seems that new code is convergent now, perfect !!!
I have another question: we train network based on opencl but not cuda, but it looks like so slow, is it right ? ( cuda > almost 47 * opencl )

your code based on opencl:

cuda code:

Could you please speed up your code based on opencl ? : )

sowson · 2019-05-24T05:34:23Z

On my DreamPC it is not so, slow :D...

yaoanderson · 2019-05-24T06:36:02Z

Hi @sowson thanks for your reply.

This is my PC hardware config.

I train my network use cmd without any --gpu parameter: sudo ./darknet detector train cfg/nfpa.data cfg/yolov2-nfpa.cfg backup/yolov2-nfpa_10.weights

Is it different from your PC ? Do you have any idea about my PC or run cmd ?

sowson · 2019-05-24T06:38:58Z

wow! :D check this out: add -i 1 (index of your gpu 0 is intel, 1 is radeon) :D
sudo ./darknet detector train cfg/nfpa.data cfg/yolov2-nfpa.cfg backup/yolov2-nfpa_10.weights -i 1

sowson · 2019-05-24T07:03:07Z

Btw, I do not know why less error not going as less as in CUDA version but after 1200 rounds I have...

Thanks again for this issue, it helped me in development a lot! 👍

yaoanderson · 2019-05-24T07:40:58Z

wow! :D check this out: add -i 1 (index of your gpu 0 is intel, 1 is radeon) :D
sudo ./darknet detector train cfg/nfpa.data cfg/yolov2-nfpa.cfg backup/yolov2-nfpa_10.weights -i 1

The default option is to use 0 intel ? right ?

yaoanderson · 2019-05-24T07:49:02Z

@sowson Another questions:

How about your anchors value ? I am not sure how to generate this value which is better for our training, Do you have use the default value of yolov2-voc.cfg (anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071), and I have generate these number by using kmeans.py which scale * 416 (the image width=height size), so I get it (anchors = 6,14, 70,82, 176,190, 291,375, 382,377), but I do not why my value is too larger than default anchors. Do you have any idea about this question ?

yaoanderson · 2019-05-24T07:50:23Z

Btw, I do not know why less error not going as less as in CUDA version but after 1200 rounds I have...

Thanks again for this issue, it helped me in development a lot! 👍

Thanks so much sowson. perfect your case with screenshot, very nice vivid. :)

sowson · 2019-05-24T08:03:38Z

wow! :D check this out: add -i 1 (index of your gpu 0 is intel, 1 is radeon) :D
sudo ./darknet detector train cfg/nfpa.data cfg/yolov2-nfpa.cfg backup/yolov2-nfpa_10.weights -i 1

The default option is to use 0 intel ? right ?

That is correct.

sowson · 2019-05-24T08:04:23Z

@sowson Another questions:

How about your anchors value ? I am not sure how to generate this value which is better for our training, Do you have use the default value of yolov2-voc.cfg (anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071), and I have generate these number by using kmeans.py which scale * 416 (the image width=height size), so I get it (anchors = 6,14, 70,82, 176,190, 291,375, 382,377), but I do not why my value is too larger than default anchors. Do you have any idea about this question ?

I wish I knew what anchors value means :D

yaoanderson · 2019-05-24T13:51:53Z

Hi @sowson I solve it, yolov2 set < 13 is ok. Very nice thanks so much for your help. :)

yaoanderson closed this as completed May 24, 2019

sowson mentioned this issue Mar 3, 2020

wrong dimensions from yolov3 #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please kindly help us about Not convergent network #12

Please kindly help us about Not convergent network #12

yaoanderson commented May 17, 2019

yaoanderson commented May 20, 2019

sowson commented May 23, 2019

yaoanderson commented May 24, 2019

yaoanderson commented May 24, 2019 •

edited

yaoanderson commented May 24, 2019

sowson commented May 24, 2019

yaoanderson commented May 24, 2019 •

edited

sowson commented May 24, 2019

sowson commented May 24, 2019

yaoanderson commented May 24, 2019

yaoanderson commented May 24, 2019

yaoanderson commented May 24, 2019

sowson commented May 24, 2019

sowson commented May 24, 2019

yaoanderson commented May 24, 2019

Please kindly help us about Not convergent network #12

Please kindly help us about Not convergent network #12

Comments

yaoanderson commented May 17, 2019

Testing

Training

yaoanderson commented May 20, 2019

sowson commented May 23, 2019

yaoanderson commented May 24, 2019

yaoanderson commented May 24, 2019 • edited

yaoanderson commented May 24, 2019

sowson commented May 24, 2019

yaoanderson commented May 24, 2019 • edited

sowson commented May 24, 2019

sowson commented May 24, 2019

yaoanderson commented May 24, 2019

yaoanderson commented May 24, 2019

yaoanderson commented May 24, 2019

sowson commented May 24, 2019

sowson commented May 24, 2019

yaoanderson commented May 24, 2019

yaoanderson commented May 24, 2019 •

edited

yaoanderson commented May 24, 2019 •

edited