Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please kindly help us about Not convergent network #12

Closed
yaoanderson opened this issue May 17, 2019 · 15 comments
Closed

Please kindly help us about Not convergent network #12

yaoanderson opened this issue May 17, 2019 · 15 comments

Comments

@yaoanderson
Copy link

Hi sowson,
image
We used your darknet network which running in our Macbook Pro Opencl, but so weird about our training based on your code, and it seems to be Obj: 0.500000, No Obj: 0.500000 all the time for hundreds circle training. And training is not convergent all the time.
Our data is from https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/ this article dataset.
network as below:
image

my yolov2.cfg as below:
[net]

Testing

#batch=1
#subdivisions=1

Training

batch=32
subdivisions=4
height=416
width=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 80200
policy=steps
steps=40000,60000
scales=.1,.1
.....
.....
[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
#anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
#anchors = 5,11, 9,19, 51,62, 104,114, 181,209, 279,376, 400,289, 357,377, 390,388
anchors = 6,14, 70,82, 176,190, 291,375, 382,377
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.3
rescore=1
....

Please kindly help about our issues, thanks.

@yaoanderson
Copy link
Author

image
why no convergent for above data ?

@sowson
Copy link
Owner

sowson commented May 23, 2019

@yaoanderson thank you very much for this issue... please git pull to get few changes I found thanks to your issue... and then I would recommend burn_in=10 not 1000... and please let me know if that is better :D... I tested and it should be :D. thanks again!

@yaoanderson
Copy link
Author

thanks so much sowson, I will try now and give your feedback for this problem.

@yaoanderson
Copy link
Author

yaoanderson commented May 24, 2019

Hi @sowson ,

I continue to train my network by using the newest code (git pull and cmake and make) and burn_in=10 now.
Before updating:
image
After updating:
image
It seems learning rate is expected now, thanks so much, and I will continue to train my network and give your my final result later.

By the way, can I continue to train my network based on the training result by old code from round 960 like as above screenshot ? Or I just retrain my new network based on your new code from round 1 ?

@yaoanderson
Copy link
Author

Hi @sowson,
It seems that new code is convergent now, perfect !!!
I have another question: we train network based on opencl but not cuda, but it looks like so slow, is it right ? ( cuda > almost 47 * opencl )

your code based on opencl:
image
cuda code:
image

Could you please speed up your code based on opencl ? : )

@sowson
Copy link
Owner

sowson commented May 24, 2019

On my DreamPC it is not so, slow :D...
Screen 2019-05-24 07-19-44

@yaoanderson
Copy link
Author

yaoanderson commented May 24, 2019

Hi @sowson thanks for your reply.

This is my PC hardware config.
image
I train my network use cmd without any --gpu parameter: sudo ./darknet detector train cfg/nfpa.data cfg/yolov2-nfpa.cfg backup/yolov2-nfpa_10.weights

Is it different from your PC ? Do you have any idea about my PC or run cmd ?

@sowson
Copy link
Owner

sowson commented May 24, 2019

wow! :D check this out: add -i 1 (index of your gpu 0 is intel, 1 is radeon) :D
sudo ./darknet detector train cfg/nfpa.data cfg/yolov2-nfpa.cfg backup/yolov2-nfpa_10.weights -i 1

@sowson
Copy link
Owner

sowson commented May 24, 2019

Btw, I do not know why less error not going as less as in CUDA version but after 1200 rounds I have...
Screen 2019-05-24 09-00-14
Thanks again for this issue, it helped me in development a lot! 👍

@yaoanderson
Copy link
Author

wow! :D check this out: add -i 1 (index of your gpu 0 is intel, 1 is radeon) :D
sudo ./darknet detector train cfg/nfpa.data cfg/yolov2-nfpa.cfg backup/yolov2-nfpa_10.weights -i 1

The default option is to use 0 intel ? right ?

@yaoanderson
Copy link
Author

@sowson Another questions:

  1. How about your anchors value ? I am not sure how to generate this value which is better for our training, Do you have use the default value of yolov2-voc.cfg (anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071), and I have generate these number by using kmeans.py which scale * 416 (the image width=height size), so I get it (anchors = 6,14, 70,82, 176,190, 291,375, 382,377), but I do not why my value is too larger than default anchors. Do you have any idea about this question ?
    image

@yaoanderson
Copy link
Author

Btw, I do not know why less error not going as less as in CUDA version but after 1200 rounds I have...
Screen 2019-05-24 09-00-14
Thanks again for this issue, it helped me in development a lot! 👍

Thanks so much sowson. perfect your case with screenshot, very nice vivid. :)

@sowson
Copy link
Owner

sowson commented May 24, 2019

wow! :D check this out: add -i 1 (index of your gpu 0 is intel, 1 is radeon) :D
sudo ./darknet detector train cfg/nfpa.data cfg/yolov2-nfpa.cfg backup/yolov2-nfpa_10.weights -i 1

The default option is to use 0 intel ? right ?

That is correct.

@sowson
Copy link
Owner

sowson commented May 24, 2019

@sowson Another questions:

  1. How about your anchors value ? I am not sure how to generate this value which is better for our training, Do you have use the default value of yolov2-voc.cfg (anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071), and I have generate these number by using kmeans.py which scale * 416 (the image width=height size), so I get it (anchors = 6,14, 70,82, 176,190, 291,375, 382,377), but I do not why my value is too larger than default anchors. Do you have any idea about this question ?
    image

I wish I knew what anchors value means :D

@yaoanderson
Copy link
Author

Hi @sowson I solve it, yolov2 set < 13 is ok. Very nice thanks so much for your help. :)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants