Support for Deep Residual Net (ResNet) reference models for ILSVRC #60

beniz · 2016-02-04T11:29:10Z

This is support for the state-of-the-art nets just released by https://github.com/KaimingHe/deep-residual-networks. They are implemented as DeepDetect neural net templates: resnet_50,resnet_101 and resnet_152 are now available from the API.

Note: training successfully tested on resnet_18 and resnet_50

For using the nets in predict mode:

model repository preparation:
download models from https://github.com/KaimingHe/deep-residual-networks
mkdir path/to/model
cp ResNet-50-model.caffemodel path/to/model/
cp ResNet_mean.binaryproto path/to/model/mean.binaryproto
service creation:

curl -X PUT "http://localhost:8080/services/imageserv" -d "{\"mllib\":\"caffe\",\"description\":\"image classification service\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"image\"},\"mllib\":{\"template\":\"resnet_50\",\"nclasses\":1000}},\"model\":{\"templates\":\"../templates/caffe/\",\"repository\":\"/path/to/model\"}}"

Note that template is set to resnet_50

image classification:

curl -X POST "http://localhost:8080/predict" -d "{\"service\":\"imageserv\",\"parameters\":{\"input\":{\"width\":224,\"height\":224},\"output\":{\"best\":5}},\"data\":[\"http://i.ytimg.com/vi/0vxOhd4qlnA/maxresdefault.jpg\"]}"

beniz · 2016-02-05T19:13:27Z

Added resnet_18 that is more manageable on single GPU. Successfully tested for both prediction and training.

Typical training call chain for using resnet_18:

Service creation

curl -X PUT "http://localhost:8080/services/imageserv" -d "{\"mllib\":\"caffe\",\"description\":\"image classification service\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"image\"},\"mllib\":{\"template\":\"resnet_18\",\"nclasses\":5}},\"model\":{\"templates\":\"../templates/caffe/\",\"repository\":\"/path/to/model\"}}"

Training:

curl -X POST "http://localhost:8080/train" -d "{\"service\":\"imageserv\",\"async\":true,\"parameters\":{\"mllib\":{\"gpu\":true,\"resume\":false,\"net\":{\"batch_size\":32,\"test_batch_size\":2},\"solver\":{\"weight_decay\":0.0001,\"solver_type\":\"SGD\",\"test_interval\":500,\"iterations\":40000,\"base_lr\":0.01,\"test_initialization\":true}},\"input\":{\"connector\":\"image\",\"test_split\":0.1,\"shuffle\":true,\"width\":224,\"height\":224},\"output\":{\"measure\":[\"acc\",\"mcll\",\"f1\"]}},\"data\":[\"/path/to/data\"]}"

Tested on a K40.

beniz · 2016-02-06T04:58:07Z

Resnet 18,50,101,152 now ready for training.

Training tips (to be updated):

learning rate is a big deal, 0.01 best observed start for resnet_18, 0.001 for resnet_50
resnet > 50 do not fit on a 12GB K40 and appear to require a multi-GPU setting

beniz · 2016-02-07T18:39:19Z

Added resnet_32 as a middle ground, and tested against various image datasets. Ready to join the rest of the model templates...

Support for Deep Residual Net (ResNet) reference models for ILSVRC

GiuliaP · 2016-05-02T11:10:44Z

Hi @beniz,
firstly thank you very much for this wonderful software. I'm trying to finetune resnet-18 (with a k40). In order to understand the different possibilities that I can take to learn the network layers (for instance, as in caffenet or vgg where I can learn fc8, fc7, fc6 or beyond, with different learning rates with respect to the base_lr), may I ask you (1) why in the trainval.prototxt files that you provided for resnet the lr_mult parameters are missing and (2) where should I put them?
Thank you

beniz · 2016-05-02T12:19:22Z

hi @GiuliaP, I guess you are referring to this file https://github.com/beniz/deepdetect/blob/master/templates/caffe/resnet_18/resnet_18.prototxt

So, two things:

when using the finetuning option in dd, the server automatically removes the last layer (softmax) and finetunes from there with the same learning rate for every layer. This is equivalent to initializing the network with weights and learning from there. We do this by default because this is what appears to work best in practice (for us, no hard science here)
when needed to modify the per-layer learning rate, you are correct that you need to modify the lr_mult parameters to the layers. Look at https://github.com/beniz/deepdetect/blob/master/templates/caffe/googlenet/googlenet.prototxt that you can use as a reference example on how to set these lr_mult parameters. Note that there are two as per Caffe's design: first one is for the filters / weights, the second one is for the biases, see http://caffe.berkeleyvision.org/tutorial/layers.html
The resnet model have no lr_mult simply because we are not using them and they were not provided in the original templates. If you add these parameters, you can PR the change and we'll be happy to update the current templates.

GiuliaP · 2016-05-04T07:14:29Z

Hi @beniz , thank you for the detailed answer, it's as I was expecting. In case I manage to come up with working models, I'll open a PR!

However, at present I am stuck by the fact that I cannot find anywhere on the internet the .caffemodel of resnet-18 or resnet-32 traned on the Imagenet, but only from resnet-50 above. Having access only to two separate machines with a k40, I would like to fine-tune smaller models, also because I need to train on many datasets. I have tried to fine-tune resnet-50 but I would like also to test smaller ones.

beniz · 2016-05-04T07:33:59Z

I cannot find anywhere on the internet the .caffemodel of resnet-18 or resnet-32 traned on the Imagenet, but only from resnet-50 above

Weights for resnet_18 and resnet_32 are not available AFAIK but we could compute them for the community if there's enough demand for it, cc @revilokeb

Note that resnet_18 and resnet_32 have just been cut out of resnet_50. As such it is possible these two smaller architectures can be reviewed and improved a bit. If you have feedback on this, do not hesitate as it is better doing it before starting training on Imagenet.

Now, a hint that derives from the above note is that you may be able to take the appropriate weights from resnet_50 (i.e. first few layers) and copy them into weights for resnet_18 and resnet_32. If you'd like to tackle this, open a dedicated issue, and we'll provide some help if you have difficulties. Typically, Caffe models are stored into protocol buffers, so it should be pretty easy to write some Python code to extract relevant layers. But note that these layers may not be as relevant as training from scratch for these architectures.

I have tried to fine-tune resnet-50 but I would like also to test smaller ones.

Finetuning resnet_50 works well in practice and doesn't take too long (for us) on a K40, but I guess it depends on the size of your 'smaller' datasets, number of classes (for convergence typically).

GiuliaP · 2016-05-04T08:06:11Z

Thanks for the feedback. In this case, I'll try to benchmark resnet-50 on my data and, if I notice remarkable benefits with respect to other networks, I may focus on resnet-18/32 and try what you suggest, if in the meantime the models won't have been released. Indeed for training online these two would be better.

revilokeb · 2016-05-04T08:10:19Z

@GiuliaP FB has trained ResNets of various sizes in torch and is making their weights available (incl. 18, 34, also 200): https://github.com/facebook/fb.resnet.torch/tree/master/pretrained

There is also a torch2caffe converter available here: https://github.com/facebook/fb-caffe-exts#torch2caffe

I havent tried this path myself but if you decide to go down this road I would be interested to hear how this is going :-)

beniz · 2016-05-04T08:17:37Z

Ah great, thanks @revilokeb. @GiuliaP make sure the architectures (e.g. the 18) are exactly the same when using converted weights. You can modify the prototxt accordingly.

czhang96 · 2016-06-24T03:45:21Z

Correct me if I'm wrong, but doesnt this implementation of resnet-18 create a huge FC layer. In resnet-50 (which I believe this implementation is just a cut-out of), the layer before the last 7x7 pooling is 7x7x2048, which then is followed by a 2048x1000 FC layer. However here, the layer before the last 7x7 pooling is something along the lines 28x28x512, which even after 7x7 stride 1 pooling will create a giant FC layer (225792x1000).

I believe it would be better if we gradually scale down the activations to 7x7x512 by the end of 18 layers (through occasional stride 2 convolutions), or at least do a 7x7 stride 4 pooling, etc.

beniz · 2016-06-24T08:32:26Z

Yes, the resnet_18 is a cut-out that works well in practice. This is possibly due to the considerations on wide resnets from http://arxiv.org/abs/1605.07146.

To scale down the top of the network, you can look at the resnet with 18 layers from the torch implementation, https://github.com/facebook/fb.resnet.torch/blob/master/models/resnet.lua, or take a look at a generator for Caffe (untested), https://github.com/soeaver/caffe-model/blob/master/resnet.py

There's a resnet (and more) generator coming up for DD but it is in no state to be shared at the moment. If you ever scale down the net with good results, please PR the changes so that others can benefit from the change.

OranjeeGeneral · 2016-09-08T14:26:48Z

FIrst off thank you for making the Resnet models available on your repository however I think something is a bit odd with your training ones because when you run them up in caffe basically it says all the nodes do not need backward computation that surely can't be right usually you would only see this on the data layer. That smells to me they are for interference not for training

beniz · 2016-09-08T14:50:28Z

We've trained from scratch and finetuned many resnet. Best is that you post your exact API calls, etc...
EDIT: and see this comment from elsewhere today: KaimingHe/deep-residual-networks#6 (comment)

OranjeeGeneral · 2016-09-09T09:12:27Z

Thanks managed to fix it by adding two accuracy layers to the TEST phase. All might be because I changed the TEST input layer as well

msukoz · 2016-10-25T10:54:02Z

@beniz Thank you for your patient reply. Could you give me some tips about combining faster rcnn with resnet if you have any ideas. I am trying to finetune 50 layers faster rcnn with resnet, but I have some trouble about layers. Whether I need to modify the name of full connection layer and it is to convergence without the layer parameter(such as batch_norm_param{}).

achaiah · 2016-11-18T21:47:08Z

@OranjeeGeneral Would you mind sharing your fix? I'm not able to train a single resnet that converges. What made the difference?

Thanks!

mrgloom · 2016-11-24T20:12:35Z

I came here from this thread KaimingHe/deep-residual-networks#6

I have tried your ResNet-18 with batch size 16 and learning rate from 0.1 to 0.00001, but with no success.

My task have 20k images and 2 class.
https://github.com/mrgloom/kaggle-dogs-vs-cats-solution

ResNet-18.zip

beniz · 2016-11-24T20:21:02Z

Have you tried a simpler network I.e. GoogLeNet and had it converged ? If you are using dd, post your exact API calls.

mrgloom · 2016-11-24T20:36:43Z

Yes, I have successfully trained AlexNet, GoogLeNet, VGG, etc. see in my repo https://github.com/mrgloom/kaggle-dogs-vs-cats-solution
I use nvidia branch caffe-0.15, but ResNet use common building blocks no custom layers.
Also maybe you can try to run DD on this data https://www.kaggle.com/c/dogs-vs-cats ?

beniz · 2016-11-24T20:46:05Z

Ok, fyi our resnet-18 should be redone, it's on my to-do list, just got caught elsewhere. However, we ve used it with good success before.

Two suggestions:

try the resnet-50 instead
use https://github.com/jay-mahadeokar/pynetbuilder to produce a proper resnet-18, or any depth you like and try it out. You may have to modify the input layers to these generated nets, just look at the existing templates for an example.

Let me know how this goes!

beniz · 2016-11-24T20:47:26Z

Also note that these are for dd and it's custom Caffe version. We don't support Nvidia s version...

mrgloom · 2016-11-26T08:01:58Z

Also no success with ResNet-50 from here https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/imagenet/resnet_50
ResNet-50.zip

beniz · 2016-11-26T09:04:50Z

This means the pb is elsewhere. We can't help you outside dd, good luck :)

achaiah · 2016-11-27T05:47:37Z

I also want to report that I tried multiple nets from https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/imagenet on NVIDIA with no success. I know you don't support NVIDIA but just figured I'd let you know.

beniz · 2016-11-27T07:19:31Z

What about the original resnet-50 ?

achaiah · 2016-11-27T20:23:17Z

I haven't found an original resnet-50 yet but I did find a resnet-18 that works perfectly. Would love to try out all the variations from https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/imagenet but don't know how to make them work in NVIDIA.

beniz · 2016-11-27T21:01:43Z

resnet 50 and above are available pre trained from Caffe, TF and dd. They can easily be finetuned.

achaiah · 2016-11-28T20:07:07Z

@beniz Do you know where I can find a vanilla resnet-50 network for Caffe? Haven't seen any available publicly. I see pre-trained models listed but without the network I can't finetune it.

beniz · 2016-11-28T21:54:21Z

Have you looked at https://github.com/beniz/deepdetect/blob/master/README.md ? There are links to several pre trained resnet models.

achaiah · 2016-11-29T02:35:22Z

Pretrained - yes... but I can't find prototxt for any of them.

beniz · 2016-11-29T04:06:56Z

Read the documentation, they are in the repository as templates. Resnets are used all over the place.

achaiah · 2016-11-30T04:00:29Z

Thanks, I did find the templates, however, they don't work with the pre-trained models in NVIDIA. It is due to the different BN layer implementations. They also don't converge above 60% on the 17Flowers dataset - not sure why.

beniz · 2016-11-30T06:22:25Z

Can't help you with that, they work fine with dd, that's all we guarantee.

mrgloom · 2016-12-06T21:18:34Z

ResNet-18 works on fresh caffe-master. Is there any pretrained on ImageNet model available?

miquelmarti · 2017-01-05T08:35:40Z

I am also looking for pretrained weights for resnet-18/32 for Caffe. The conversions from Facebook's Torch implementation do not work properly.

liam0949 · 2017-01-13T16:37:41Z

@beniz hi, I used your Res50 train prototxt for finetuning with kaiming's pre trained model. But it seems that the paramters can't match.

beniz · 2017-01-13T17:42:42Z

@xiaojimi we finetune resnet_50 very often on a variety of tasks with excellent convergence. You'll need to share your API calls, server logs and list your model directory.

nnop · 2017-09-12T00:14:26Z

Could you please provide the pretrained ResNet-18/34 models?

apereiracv · 2017-09-12T05:32:13Z

@miquelmarti I am thinking of trying to convert resnet 18/34 from torch, weren't you able to do it? Did you find any other pretrained weights somewhere else?

miquelmarti · 2017-09-12T07:14:41Z

@olaff09 I did not try too hard, you can give it another try as it has to be possible, I just lack the sufficient knowledge of torch.

beniz · 2017-09-12T07:34:06Z

What'd be so useful about ResNet-18/34, is that the lower memory per image at prediction time, along with speed ?
Edit: a useful prototxt may be found here: https://github.com/marvis/pytorch-caffe-darknet-convert/blob/master/cfg/resnet-18.prototxt

caesar84 · 2017-10-30T21:52:17Z

hi there,
I am working on Matlab 2017b which supports importing models from caffe, if there is .prototxt and
.caffemodel files then all good. I would be grateful if there is any for renet-18 or resnet 34.

added ResNet reference models for ILSVRC

a49f8d5

beniz added type:enhancement kind:model kind:neural net kind:API mllib:caffe labels Feb 4, 2016

beniz self-assigned this Feb 4, 2016

beniz mentioned this pull request Feb 4, 2016

Authentication / login #49

Closed

beniz added 2 commits February 4, 2016 17:06

fixed resnet reference models for training

23ae97a

added resnet_18 for prediction + training

de5655e

fixed resnet models for training

bababf7

added resnet_32

4c77274

beniz added a commit that referenced this pull request Feb 7, 2016

Merge pull request #60 from beniz/resnet

4bbfc24

Support for Deep Residual Net (ResNet) reference models for ILSVRC

beniz merged commit 4bbfc24 into master Feb 7, 2016

beniz mentioned this pull request Feb 17, 2016

Out of memory for training ResNet-50 #66

Closed

beniz mentioned this pull request Mar 9, 2016

Discussion on convergence and memory requirements using ResNet #84

Open

beniz deleted the resnet branch September 16, 2016 05:55

Support for Deep Residual Net (ResNet) reference models for ILSVRC #60

Support for Deep Residual Net (ResNet) reference models for ILSVRC #60

Conversation

beniz commented Feb 4, 2016

beniz commented Feb 5, 2016

beniz commented Feb 6, 2016

beniz commented Feb 7, 2016

GiuliaP commented May 2, 2016

beniz commented May 2, 2016

GiuliaP commented May 4, 2016 • edited Loading

beniz commented May 4, 2016

GiuliaP commented May 4, 2016

revilokeb commented May 4, 2016

beniz commented May 4, 2016

czhang96 commented Jun 24, 2016

beniz commented Jun 24, 2016 • edited Loading

OranjeeGeneral commented Sep 8, 2016

beniz commented Sep 8, 2016 • edited Loading

OranjeeGeneral commented Sep 9, 2016

msukoz commented Oct 25, 2016

achaiah commented Nov 18, 2016

mrgloom commented Nov 24, 2016 • edited Loading

beniz commented Nov 24, 2016

mrgloom commented Nov 24, 2016

beniz commented Nov 24, 2016

beniz commented Nov 24, 2016

mrgloom commented Nov 26, 2016

beniz commented Nov 26, 2016

achaiah commented Nov 27, 2016

beniz commented Nov 27, 2016

achaiah commented Nov 27, 2016

beniz commented Nov 27, 2016

achaiah commented Nov 28, 2016

beniz commented Nov 28, 2016

achaiah commented Nov 29, 2016

beniz commented Nov 29, 2016 • edited Loading

achaiah commented Nov 30, 2016

beniz commented Nov 30, 2016

mrgloom commented Dec 6, 2016

miquelmarti commented Jan 5, 2017

liam0949 commented Jan 13, 2017

beniz commented Jan 13, 2017

nnop commented Sep 12, 2017

apereiracv commented Sep 12, 2017

miquelmarti commented Sep 12, 2017

beniz commented Sep 12, 2017 • edited Loading

caesar84 commented Oct 30, 2017

GiuliaP commented May 4, 2016 •

edited

Loading

beniz commented Jun 24, 2016 •

edited

Loading

beniz commented Sep 8, 2016 •

edited

Loading

mrgloom commented Nov 24, 2016 •

edited

Loading

beniz commented Nov 29, 2016 •

edited

Loading

beniz commented Sep 12, 2017 •

edited

Loading