Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any instructions about train custom images? #1

Closed
albertyou2 opened this issue May 4, 2017 · 8 comments
Closed

any instructions about train custom images? #1

albertyou2 opened this issue May 4, 2017 · 8 comments
Labels

Comments

@albertyou2
Copy link

@pierluigiferrari HI,greate work here.

I want to use this project to train on a small image set for a experiment.
I got a image set which has been organized in KITTI data format.I don't know how to train it.Could you please give a simple instructions about training?
Thank you !

@pierluigiferrari
Copy link
Owner

pierluigiferrari commented May 6, 2017

@albertyou2 I've just updated the README and included instructions in this notebook.

Check it out and let me know if this helps. If you have further questions let me know and I'd be more than happy to clarify and improve the documentation.

I'm also interested in training a model on the KITTI datasets myself, so I'll probably look into their data format soon. In case neither of the two parser methods provided by BatchGenerator is compatible with the KITTI annotations, you could follow the instructions in the README and write an additional parser method that can handle them. I might also do that at some point.

@albertyou2
Copy link
Author

hi @pierluigiferrari
Thank you very much .
I 'm sorry for reply late, I was very busy these days.i will check it out as soon as I could!
Thank you a again

@albertyou2
Copy link
Author

hi @pierluigiferrari
I have test you project successful both training and testing! Thank you again .
I use your code to detete some logos ,it works fine!

Now I meet a new problem :
The accuracy of detection for objects is very low .The objects could be very small ,like 10*10 pixel .
I just want to know if SSD is able to this Job?
If it could ,which args should I fine-tune?

Thank you very much

@pierluigiferrari
Copy link
Owner

@albertyou2 two things you could try to improve small object detection (this is not meant as an exhaustive list):

  1. Decrease the scaling factors: You could decrease all of them, not just the smallest one. Calculate what fraction of your input image size the smallest and the largest objects in your dataset will be and set the scaling factors accordingly. For example, if your input images have size 300x300, your smallest objects are about 10x10 pixels and your largest objects are about 60x60 pixels, then you could choose your smallest scaling factor to be 0.033 (or a tiny bit larger than that) and your largest scaling factors 0.2. This might improve the matching. However, this alone might not lead to a huge improvement. You might also have to try the this in addition:
  2. You could try using a predictor layer that sits on top of an earlier layer of the network, either by adding an additional predictor layer to the existing ones or by changing the lowest level predictor layer to sit on a lower layer of the network. This would both increase the spatial resolution of that predictor layer, so the overall coverage for small boxes would be better, and it would lead to a lower level of abstraction as input for the predictor layer, which might also be beneficial. Presumably, less abstraction is needed (or even useful) to detect 10x10 objects than is needed to detect 200x200 objects.

These two measures together might help improve the detection performance on small objects, although I can't guarantee it.

Another question: What base network architecture are you using? Are you using the original SSD300? If yes, then I cannot recommend trying to train that from scratch. I'm not sure if anything good could come out of that without pre-trained weights, considering that there is no dropout or batch normalization in the reduced VGG-16 and the overall network is quite deep.

If you are using a more shallow network architecture like the SSD7 included in the repo, then the above might work.

Another question would be how many different logos you are trying to detect. If the number of distinct logos is very large, then the capacity of a small network like SSD7 might not be enough and you might need a wider (more filters per conv layer) and/or deeper (more conv layers) network.

@albertyou2
Copy link
Author

hi @pierluigiferrari
Thank you so much!
I will try these suggestions soon !!!

"Another question: What base network architecture are you using? Are you using the original SSD300?"
Yes,I 'm using SSD300 to do my job.But I will try SSD512 without retrained model.I think larger input size will increase the size of the object which will be detected ,so the accuracy will be better.

"Another question would be how many different logos you are trying to detect"
The class number of my logo image dataset is 22. I think this is not very large.

Thank you again

@albertyou2
Copy link
Author

@pierluigiferrari
I followed your suggestion and train it again , the accuracy for small objects detection is now better!It's reach to 56%.Thank you !

I ‘m now wandering if I use smaller network (SSD7)on this small dataset will get better reault?

@pierluigiferrari
Copy link
Owner

pierluigiferrari commented May 25, 2017

@albertyou2 that also depends on how much data you have and how heavily you use data augmentation. If you only have a couple hundred or a few thousand images, a deep and wide model like SSD300 will be overkill. If you have tens of thousands or hundreds of thousands of images, then SSD300 or SSD512 will be suitable models. And of course, more data augmentation is always better, as long as the generated data is representative of the original data.

Now, when it comes to training SSD300 or SSD512 from scratch, consider the following important points:

When Wei Liu et al. turned the original VGG-16 into their reduced atrous version, they removed the dropout layers and loaded weights that were pre-trained on the large ImageNet localization dataset. They didn't need the regularization because they initialized the base network with pre-trained weights anyway. If, however, you're trying to train the entire SSD300 completely from scratch, then that might be a problem. There are no dropout layers, no batch normalization, and no other techniques in the SSD300 that would improve learning for such a deep and wide network.

If you have enough data, a smaller network like SSD7 will not yield better results, but at the same time training the original SSD300 from scratch (i.e. without loading pre-trained weights for the VGG-16 base network) is not optimal either.

But there is not really a need to stick to the original SSD300/512 architecture with the reduced atrous VGG-16 base network if you want to train from scratch. You could modify the base network or even build something completely different.

For example, I would definitely include a batch normalization layer after every convolution layer, like SSD7 does it. That alone might help quite a bit. I would also use ELUs instead of ReLUs - ReLUs can die. Or to take it a step further you could use a ResNet architecture. It wouldn't have to be a super-deep ResNet, but the general design is far superior over the more primitive VGG design.

As always, these suggestions aren't guaranteed to get better results, but I believe they are worth a shot.

And another thing: Since adding a lower level detection layer worked, you could try taking this experiment further in the same direction. You could add another, even lower level detection layer to test whether or not that will yield further improvements.

@albertyou2
Copy link
Author

@pierluigiferrari
Thank you so much !
I will try your suggestion again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants