-
Notifications
You must be signed in to change notification settings - Fork 937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
any instructions about train custom images? #1
Comments
@albertyou2 I've just updated the README and included instructions in this notebook. Check it out and let me know if this helps. If you have further questions let me know and I'd be more than happy to clarify and improve the documentation. I'm also interested in training a model on the KITTI datasets myself, so I'll probably look into their data format soon. In case neither of the two parser methods provided by |
hi @pierluigiferrari |
hi @pierluigiferrari Now I meet a new problem : Thank you very much |
@albertyou2 two things you could try to improve small object detection (this is not meant as an exhaustive list):
These two measures together might help improve the detection performance on small objects, although I can't guarantee it. Another question: What base network architecture are you using? Are you using the original SSD300? If yes, then I cannot recommend trying to train that from scratch. I'm not sure if anything good could come out of that without pre-trained weights, considering that there is no dropout or batch normalization in the reduced VGG-16 and the overall network is quite deep. If you are using a more shallow network architecture like the SSD7 included in the repo, then the above might work. Another question would be how many different logos you are trying to detect. If the number of distinct logos is very large, then the capacity of a small network like SSD7 might not be enough and you might need a wider (more filters per conv layer) and/or deeper (more conv layers) network. |
hi @pierluigiferrari "Another question: What base network architecture are you using? Are you using the original SSD300?" "Another question would be how many different logos you are trying to detect" Thank you again |
@pierluigiferrari I ‘m now wandering if I use smaller network (SSD7)on this small dataset will get better reault? |
@albertyou2 that also depends on how much data you have and how heavily you use data augmentation. If you only have a couple hundred or a few thousand images, a deep and wide model like SSD300 will be overkill. If you have tens of thousands or hundreds of thousands of images, then SSD300 or SSD512 will be suitable models. And of course, more data augmentation is always better, as long as the generated data is representative of the original data. Now, when it comes to training SSD300 or SSD512 from scratch, consider the following important points: When Wei Liu et al. turned the original VGG-16 into their reduced atrous version, they removed the dropout layers and loaded weights that were pre-trained on the large ImageNet localization dataset. They didn't need the regularization because they initialized the base network with pre-trained weights anyway. If, however, you're trying to train the entire SSD300 completely from scratch, then that might be a problem. There are no dropout layers, no batch normalization, and no other techniques in the SSD300 that would improve learning for such a deep and wide network. If you have enough data, a smaller network like SSD7 will not yield better results, but at the same time training the original SSD300 from scratch (i.e. without loading pre-trained weights for the VGG-16 base network) is not optimal either. But there is not really a need to stick to the original SSD300/512 architecture with the reduced atrous VGG-16 base network if you want to train from scratch. You could modify the base network or even build something completely different. For example, I would definitely include a batch normalization layer after every convolution layer, like SSD7 does it. That alone might help quite a bit. I would also use ELUs instead of ReLUs - ReLUs can die. Or to take it a step further you could use a ResNet architecture. It wouldn't have to be a super-deep ResNet, but the general design is far superior over the more primitive VGG design. As always, these suggestions aren't guaranteed to get better results, but I believe they are worth a shot. And another thing: Since adding a lower level detection layer worked, you could try taking this experiment further in the same direction. You could add another, even lower level detection layer to test whether or not that will yield further improvements. |
@pierluigiferrari |
@pierluigiferrari HI,greate work here.
I want to use this project to train on a small image set for a experiment.
I got a image set which has been organized in KITTI data format.I don't know how to train it.Could you please give a simple instructions about training?
Thank you !
The text was updated successfully, but these errors were encountered: