Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Deep Residual Net (ResNet) reference models for ILSVRC #60

Merged
merged 5 commits into from
Feb 7, 2016

Conversation

beniz
Copy link
Collaborator

@beniz beniz commented Feb 4, 2016

This is support for the state-of-the-art nets just released by https://github.com/KaimingHe/deep-residual-networks. They are implemented as DeepDetect neural net templates: resnet_50,resnet_101 and resnet_152 are now available from the API.

Note: training successfully tested on resnet_18 and resnet_50

For using the nets in predict mode:

curl -X PUT "http://localhost:8080/services/imageserv" -d "{\"mllib\":\"caffe\",\"description\":\"image classification service\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"image\"},\"mllib\":{\"template\":\"resnet_50\",\"nclasses\":1000}},\"model\":{\"templates\":\"../templates/caffe/\",\"repository\":\"/path/to/model\"}}"

Note that template is set to resnet_50

  • image classification:
curl -X POST "http://localhost:8080/predict" -d "{\"service\":\"imageserv\",\"parameters\":{\"input\":{\"width\":224,\"height\":224},\"output\":{\"best\":5}},\"data\":[\"http://i.ytimg.com/vi/0vxOhd4qlnA/maxresdefault.jpg\"]}"

@beniz
Copy link
Collaborator Author

beniz commented Feb 5, 2016

Added resnet_18 that is more manageable on single GPU. Successfully tested for both prediction and training.

Typical training call chain for using resnet_18:

  • Service creation
curl -X PUT "http://localhost:8080/services/imageserv" -d "{\"mllib\":\"caffe\",\"description\":\"image classification service\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"image\"},\"mllib\":{\"template\":\"resnet_18\",\"nclasses\":5}},\"model\":{\"templates\":\"../templates/caffe/\",\"repository\":\"/path/to/model\"}}"
  • Training:
curl -X POST "http://localhost:8080/train" -d "{\"service\":\"imageserv\",\"async\":true,\"parameters\":{\"mllib\":{\"gpu\":true,\"resume\":false,\"net\":{\"batch_size\":32,\"test_batch_size\":2},\"solver\":{\"weight_decay\":0.0001,\"solver_type\":\"SGD\",\"test_interval\":500,\"iterations\":40000,\"base_lr\":0.01,\"test_initialization\":true}},\"input\":{\"connector\":\"image\",\"test_split\":0.1,\"shuffle\":true,\"width\":224,\"height\":224},\"output\":{\"measure\":[\"acc\",\"mcll\",\"f1\"]}},\"data\":[\"/path/to/data\"]}"

Tested on a K40.

@beniz
Copy link
Collaborator Author

beniz commented Feb 6, 2016

Resnet 18,50,101,152 now ready for training.

Training tips (to be updated):

  • learning rate is a big deal, 0.01 best observed start for resnet_18, 0.001 for resnet_50
  • resnet > 50 do not fit on a 12GB K40 and appear to require a multi-GPU setting

@beniz
Copy link
Collaborator Author

beniz commented Feb 7, 2016

Added resnet_32 as a middle ground, and tested against various image datasets. Ready to join the rest of the model templates...

beniz added a commit that referenced this pull request Feb 7, 2016
Support for Deep Residual Net (ResNet) reference models for ILSVRC
@beniz beniz merged commit 4bbfc24 into master Feb 7, 2016
@GiuliaP
Copy link

GiuliaP commented May 2, 2016

Hi @beniz,
firstly thank you very much for this wonderful software. I'm trying to finetune resnet-18 (with a k40). In order to understand the different possibilities that I can take to learn the network layers (for instance, as in caffenet or vgg where I can learn fc8, fc7, fc6 or beyond, with different learning rates with respect to the base_lr), may I ask you (1) why in the trainval.prototxt files that you provided for resnet the lr_mult parameters are missing and (2) where should I put them?
Thank you

@beniz
Copy link
Collaborator Author

beniz commented May 2, 2016

hi @GiuliaP, I guess you are referring to this file https://github.com/beniz/deepdetect/blob/master/templates/caffe/resnet_18/resnet_18.prototxt

So, two things:

  • when using the finetuning option in dd, the server automatically removes the last layer (softmax) and finetunes from there with the same learning rate for every layer. This is equivalent to initializing the network with weights and learning from there. We do this by default because this is what appears to work best in practice (for us, no hard science here)
  • when needed to modify the per-layer learning rate, you are correct that you need to modify the lr_mult parameters to the layers. Look at https://github.com/beniz/deepdetect/blob/master/templates/caffe/googlenet/googlenet.prototxt that you can use as a reference example on how to set these lr_mult parameters. Note that there are two as per Caffe's design: first one is for the filters / weights, the second one is for the biases, see http://caffe.berkeleyvision.org/tutorial/layers.html
    The resnet model have no lr_mult simply because we are not using them and they were not provided in the original templates. If you add these parameters, you can PR the change and we'll be happy to update the current templates.

@GiuliaP
Copy link

GiuliaP commented May 4, 2016

Hi @beniz , thank you for the detailed answer, it's as I was expecting. In case I manage to come up with working models, I'll open a PR!

However, at present I am stuck by the fact that I cannot find anywhere on the internet the .caffemodel of resnet-18 or resnet-32 traned on the Imagenet, but only from resnet-50 above. Having access only to two separate machines with a k40, I would like to fine-tune smaller models, also because I need to train on many datasets. I have tried to fine-tune resnet-50 but I would like also to test smaller ones.

@beniz
Copy link
Collaborator Author

beniz commented May 4, 2016

I cannot find anywhere on the internet the .caffemodel of resnet-18 or resnet-32 traned on the Imagenet, but only from resnet-50 above

Weights for resnet_18 and resnet_32 are not available AFAIK but we could compute them for the community if there's enough demand for it, cc @revilokeb

Note that resnet_18 and resnet_32 have just been cut out of resnet_50. As such it is possible these two smaller architectures can be reviewed and improved a bit. If you have feedback on this, do not hesitate as it is better doing it before starting training on Imagenet.

Now, a hint that derives from the above note is that you may be able to take the appropriate weights from resnet_50 (i.e. first few layers) and copy them into weights for resnet_18 and resnet_32. If you'd like to tackle this, open a dedicated issue, and we'll provide some help if you have difficulties. Typically, Caffe models are stored into protocol buffers, so it should be pretty easy to write some Python code to extract relevant layers. But note that these layers may not be as relevant as training from scratch for these architectures.

I have tried to fine-tune resnet-50 but I would like also to test smaller ones.

Finetuning resnet_50 works well in practice and doesn't take too long (for us) on a K40, but I guess it depends on the size of your 'smaller' datasets, number of classes (for convergence typically).

@GiuliaP
Copy link

GiuliaP commented May 4, 2016

Thanks for the feedback. In this case, I'll try to benchmark resnet-50 on my data and, if I notice remarkable benefits with respect to other networks, I may focus on resnet-18/32 and try what you suggest, if in the meantime the models won't have been released. Indeed for training online these two would be better.

@revilokeb
Copy link

@GiuliaP FB has trained ResNets of various sizes in torch and is making their weights available (incl. 18, 34, also 200): https://github.com/facebook/fb.resnet.torch/tree/master/pretrained

There is also a torch2caffe converter available here: https://github.com/facebook/fb-caffe-exts#torch2caffe

I havent tried this path myself but if you decide to go down this road I would be interested to hear how this is going :-)

@beniz
Copy link
Collaborator Author

beniz commented May 4, 2016

Ah great, thanks @revilokeb. @GiuliaP make sure the architectures (e.g. the 18) are exactly the same when using converted weights. You can modify the prototxt accordingly.

@czhang96
Copy link

Correct me if I'm wrong, but doesnt this implementation of resnet-18 create a huge FC layer. In resnet-50 (which I believe this implementation is just a cut-out of), the layer before the last 7x7 pooling is 7x7x2048, which then is followed by a 2048x1000 FC layer. However here, the layer before the last 7x7 pooling is something along the lines 28x28x512, which even after 7x7 stride 1 pooling will create a giant FC layer (225792x1000).

I believe it would be better if we gradually scale down the activations to 7x7x512 by the end of 18 layers (through occasional stride 2 convolutions), or at least do a 7x7 stride 4 pooling, etc.

@beniz
Copy link
Collaborator Author

beniz commented Jun 24, 2016

Yes, the resnet_18 is a cut-out that works well in practice. This is possibly due to the considerations on wide resnets from http://arxiv.org/abs/1605.07146.

To scale down the top of the network, you can look at the resnet with 18 layers from the torch implementation, https://github.com/facebook/fb.resnet.torch/blob/master/models/resnet.lua, or take a look at a generator for Caffe (untested), https://github.com/soeaver/caffe-model/blob/master/resnet.py

There's a resnet (and more) generator coming up for DD but it is in no state to be shared at the moment. If you ever scale down the net with good results, please PR the changes so that others can benefit from the change.

@OranjeeGeneral
Copy link

FIrst off thank you for making the Resnet models available on your repository however I think something is a bit odd with your training ones because when you run them up in caffe basically it says all the nodes do not need backward computation that surely can't be right usually you would only see this on the data layer. That smells to me they are for interference not for training

@beniz
Copy link
Collaborator Author

beniz commented Sep 8, 2016

We've trained from scratch and finetuned many resnet. Best is that you post your exact API calls, etc...
EDIT: and see this comment from elsewhere today: KaimingHe/deep-residual-networks#6 (comment)

@OranjeeGeneral
Copy link

Thanks managed to fix it by adding two accuracy layers to the TEST phase. All might be because I changed the TEST input layer as well

@beniz beniz deleted the resnet branch September 16, 2016 05:55
@msukoz
Copy link

msukoz commented Oct 25, 2016

@beniz Thank you for your patient reply. Could you give me some tips about combining faster rcnn with resnet if you have any ideas. I am trying to finetune 50 layers faster rcnn with resnet, but I have some trouble about layers. Whether I need to modify the name of full connection layer and it is to convergence without the layer parameter(such as batch_norm_param{}).

@achaiah
Copy link

achaiah commented Nov 18, 2016

@OranjeeGeneral Would you mind sharing your fix? I'm not able to train a single resnet that converges. What made the difference?

Thanks!

@mrgloom
Copy link

mrgloom commented Nov 24, 2016

I came here from this thread KaimingHe/deep-residual-networks#6

I have tried your ResNet-18 with batch size 16 and learning rate from 0.1 to 0.00001, but with no success.

My task have 20k images and 2 class.
https://github.com/mrgloom/kaggle-dogs-vs-cats-solution

ResNet-18.zip

@beniz
Copy link
Collaborator Author

beniz commented Nov 24, 2016

Have you tried a simpler network I.e. GoogLeNet and had it converged ? If you are using dd, post your exact API calls.

@mrgloom
Copy link

mrgloom commented Nov 24, 2016

Yes, I have successfully trained AlexNet, GoogLeNet, VGG, etc. see in my repo https://github.com/mrgloom/kaggle-dogs-vs-cats-solution
I use nvidia branch caffe-0.15, but ResNet use common building blocks no custom layers.
Also maybe you can try to run DD on this data https://www.kaggle.com/c/dogs-vs-cats ?

@beniz
Copy link
Collaborator Author

beniz commented Nov 24, 2016

Ok, fyi our resnet-18 should be redone, it's on my to-do list, just got caught elsewhere. However, we ve used it with good success before.

Two suggestions:

  • try the resnet-50 instead

  • use https://github.com/jay-mahadeokar/pynetbuilder to produce a proper resnet-18, or any depth you like and try it out. You may have to modify the input layers to these generated nets, just look at the existing templates for an example.

Let me know how this goes!

@beniz
Copy link
Collaborator Author

beniz commented Nov 24, 2016

Also note that these are for dd and it's custom Caffe version. We don't support Nvidia s version...

@mrgloom
Copy link

mrgloom commented Nov 26, 2016

@beniz
Copy link
Collaborator Author

beniz commented Nov 26, 2016

This means the pb is elsewhere. We can't help you outside dd, good luck :)

@achaiah
Copy link

achaiah commented Nov 27, 2016

I also want to report that I tried multiple nets from https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/imagenet on NVIDIA with no success. I know you don't support NVIDIA but just figured I'd let you know.

@beniz
Copy link
Collaborator Author

beniz commented Nov 27, 2016

What about the original resnet-50 ?

@achaiah
Copy link

achaiah commented Nov 27, 2016

I haven't found an original resnet-50 yet but I did find a resnet-18 that works perfectly. Would love to try out all the variations from https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/imagenet but don't know how to make them work in NVIDIA.

@beniz
Copy link
Collaborator Author

beniz commented Nov 27, 2016

resnet 50 and above are available pre trained from Caffe, TF and dd. They can easily be finetuned.

@achaiah
Copy link

achaiah commented Nov 28, 2016

@beniz Do you know where I can find a vanilla resnet-50 network for Caffe? Haven't seen any available publicly. I see pre-trained models listed but without the network I can't finetune it.

@beniz
Copy link
Collaborator Author

beniz commented Nov 28, 2016

Have you looked at https://github.com/beniz/deepdetect/blob/master/README.md ? There are links to several pre trained resnet models.

@achaiah
Copy link

achaiah commented Nov 29, 2016

Pretrained - yes... but I can't find prototxt for any of them.

@beniz
Copy link
Collaborator Author

beniz commented Nov 29, 2016

Read the documentation, they are in the repository as templates. Resnets are used all over the place.

@achaiah
Copy link

achaiah commented Nov 30, 2016

Thanks, I did find the templates, however, they don't work with the pre-trained models in NVIDIA. It is due to the different BN layer implementations. They also don't converge above 60% on the 17Flowers dataset - not sure why.

@beniz
Copy link
Collaborator Author

beniz commented Nov 30, 2016

Can't help you with that, they work fine with dd, that's all we guarantee.

@mrgloom
Copy link

mrgloom commented Dec 6, 2016

ResNet-18 works on fresh caffe-master. Is there any pretrained on ImageNet model available?

@miquelmarti
Copy link

I am also looking for pretrained weights for resnet-18/32 for Caffe. The conversions from Facebook's Torch implementation do not work properly.

@liam0949
Copy link

@beniz hi, I used your Res50 train prototxt for finetuning with kaiming's pre trained model. But it seems that the paramters can't match.

@beniz
Copy link
Collaborator Author

beniz commented Jan 13, 2017

@xiaojimi we finetune resnet_50 very often on a variety of tasks with excellent convergence. You'll need to share your API calls, server logs and list your model directory.

@nnop
Copy link

nnop commented Sep 12, 2017

Could you please provide the pretrained ResNet-18/34 models?

@apereiracv
Copy link

@miquelmarti I am thinking of trying to convert resnet 18/34 from torch, weren't you able to do it? Did you find any other pretrained weights somewhere else?

@miquelmarti
Copy link

@olaff09 I did not try too hard, you can give it another try as it has to be possible, I just lack the sufficient knowledge of torch.

@beniz
Copy link
Collaborator Author

beniz commented Sep 12, 2017

What'd be so useful about ResNet-18/34, is that the lower memory per image at prediction time, along with speed ?
Edit: a useful prototxt may be found here: https://github.com/marvis/pytorch-caffe-darknet-convert/blob/master/cfg/resnet-18.prototxt

@caesar84
Copy link

hi there,
I am working on Matlab 2017b which supports importing models from caffe, if there is .prototxt and
.caffemodel files then all good. I would be grateful if there is any for renet-18 or resnet 34.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet