Skip to content

Commit

Permalink
solver configuration that converges more reliably. thanks to @ducha-aiki
Browse files Browse the repository at this point in the history
 in #3 for the suggestion of linearly decreasing the learning rate through training. note that the provided model was trained with the old solver configuration. in our experiements, this new solver configuration leads to model accuracy that is greater than or equal to the old configuration.
  • Loading branch information
forresti committed Mar 26, 2016
1 parent 69c0afe commit 0bc03d9
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Helpful hints:
1. **Getting the SqueezeNet model:** `git clone <this repo>`.
In this repository, we include Caffe-compatible files for the model architecture, the solver configuration, and the pretrained model (4.8MB uncompressed).

2. **Batch size.** For the SqueezeNet model in our paper, we used a batch size of 1024. If implemented naively on a single GPU, this may result in running out of memory. An effective workaround is to use hierarchical batching (sometimes called "delayed batching"). Caffe supports hierarchical batching by doing `train_val.prototxt>batch_size` training samples concurrently in memory. After `solver.prototxt>iter_size` iterations, the gradients are summed and the model is updated. Mathematically, the batch size is `batch_size * iter_size`. In the included prototxt files, we have set `(batch_size=32, iter_size=32)`, but any combination of batch_size and iter_size that multiply to 1024 will produce eqivalent results. In fact, with the same random number generator seed, the model will be fully reproducable if trained multiple times. Finally, note that in Caffe `iter_size` is applied while training on the training set but not while testing on the test set.
2. **Batch size.** We have experimented with batch sizes ranging from 32 to 1024. In this repo, our default batch size is 512. If implemented naively on a single GPU, a batch size this large may result in running out of memory. An effective workaround is to use hierarchical batching (sometimes called "delayed batching"). Caffe supports hierarchical batching by doing `train_val.prototxt>batch_size` training samples concurrently in memory. After `solver.prototxt>iter_size` iterations, the gradients are summed and the model is updated. Mathematically, the batch size is `batch_size * iter_size`. In the included prototxt files, we have set `(batch_size=32, iter_size=16)`, but any combination of batch_size and iter_size that multiply to 512 will produce eqivalent results. In fact, with the same random number generator seed, the model will be fully reproducable if trained multiple times. Finally, note that in Caffe `iter_size` is applied while training on the training set but not while testing on the test set.

3. **Implementing Fire modules.** In the paper, we describe the `expand` portion of the Fire layer as a collection of 1x1 and 3x3 filters. Caffe does not natively support a convolution layer that has multiple filter sizes. To work around this, we implement `expand1x1` and `expand3x3` layers and concatenate the results together in the channel dimension.

8 changes: 4 additions & 4 deletions SqueezeNet_v1.0/solver.prototxt
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@

test_iter: 2000 #not subject to iter_size
test_interval: 1000
base_lr: 0.08
base_lr: 0.04
display: 40
max_iter: 85000
iter_size: 32 #global batch size = batch_size * iter_size
max_iter: 170000
iter_size: 16 #global batch size = batch_size * iter_size
lr_policy: "poly"
power: 0.5
power: 1.0 #linearly decrease LR
momentum: 0.9
weight_decay: 0.0002
snapshot: 1000
Expand Down

0 comments on commit 0bc03d9

Please sign in to comment.