Skip to content

Commit

Permalink
fixed documentation for convnet.md
Browse files Browse the repository at this point in the history
updated image-net.sqlite3
  • Loading branch information
liuliu committed Mar 27, 2014
1 parent e9586df commit 2542c03
Show file tree
Hide file tree
Showing 6 changed files with 189 additions and 34 deletions.
2 changes: 1 addition & 1 deletion bin/cnnclassify.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ int main(int argc, char** argv)
chdir(argv[3]);
if(r)
{
ccv_convnet_t* convnet = ccv_convnet_read(0, argv[2]);
ccv_convnet_t* convnet = ccv_convnet_read(1, argv[2]);
int i, j, k = 0;
ccv_dense_matrix_t* images[32] = {
0
Expand Down
2 changes: 1 addition & 1 deletion bin/cnndraw.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
print line
args = line.split " "
break if args[0] == 'elapsed'
for i in 0..args.length / 2
for i in 0..(args.length / 2 - 1)
draw += sprintf("-fill none -strokewidth 1 -stroke DodgerBlue -draw \"rectangle 15,%d,165,%d\" -fill DodgerBlue -draw \"rectangle 15,%d,%d,%d\" -strokewidth 0 -stroke none -fill red -draw 'text 18,%d \"%s\"' ", y, y + 16, y, (args[i * 2 + 1].to_f * 150).to_i + 15, y + 16, y + 13, labels[args[i * 2]])
y += 31
end
Expand Down
2 changes: 1 addition & 1 deletion bin/cnnvldtr.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@
i += 1
end

print ((miss1.to_f / i.to_f * 10000).round / 100.0).to_s + "% (1), " + ((miss5.to_f / i.to_f * 10000).round / 100.0).to_s + "%(5)\n"
print ((miss1.to_f / i.to_f * 10000).round / 100.0).to_s + "% (1), " + ((miss5.to_f / i.to_f * 10000).round / 100.0).to_s + "% (5)\n"
100 changes: 86 additions & 14 deletions doc/convnet.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,18 +44,33 @@ Accuracy-wise:
The test is performed on ILSVRC 2010 test dataset, as of time being, I cannot obtain the validation
dataset for ILSVRC 2012.

The training stopped to improve at around 60 epochs, at that time, the central patch obtained
39.71% of top-1 missing rate (lower is better). In Alex's paper, they reported 37.5% top-1
missing rate when averaging 10 patches, and 39% top-1 missing rate when using one patch.
The training stopped to improve at around 60 epochs, at that time, the central patch from test set
obtained 39.71% of top-1 missing rate (lower is better) and the training set obtained 37.80% of
top-1 missing rate. In Alex's paper, they reported 37.5% top-1 missing rate when averaging 10 patches,
and 39% top-1 missing rate when using the central patch in test set.

By applying this patch: https://gist.github.com/liuliu/9420735
Assuming you have ILSVRC 2010 test set files ordered in image-net-test.txt, run

git am -3 9420935.patch
./cnnclassify image-net-test.txt ../samples/image-net.sqlite3 > image-net-classify.txt

For 32-bit float point image-net.sqlite3, the top-1 missing rate is 36.97%, 0.53% better than
Alex's result. For half precision image-net.sqlite3 (the one included in ./samples/), the top-1
missing rate is 39.8%, 0.3% worse than the 32-bit float point one. You can download the float
point one with ./samples/download-image-net.sh
For complete test set to finish, this command takes an hour on GPU, and if you don't have GPU
enabled, it will take about a day to run on CPU.

Assuming you have the ILSVRC 2010 ground truth data in LSVRC2010_test_ground_truth.txt

./cnnvldtr.rb LSVRC2010_test_ground_truth.txt image-net-classify.txt

will reports the top-1 missing rate as well as top-5 missing rate.

For 32-bit float point image-net.sqlite3 on GPU, the top-1 missing rate is 36.82%, 0.68% better
than Alex's result, the top-5 missing rate is 16.26%, 0.74% better than Alex's. For half precision
image-net.sqlite3 (the one included in ./samples/), the top-1 missing rate is 36.83% and the top-5
missing rate is 16.25%.

For 32-bit float point image-net.sqlite3 on CPU, the top-1 missing rate is 37.34%, and the top-5
missing rate is 16.62%.

You can download the 32-bit float point one with ./samples/download-image-net.sh

Speed-wise:

Expand All @@ -64,7 +79,7 @@ frequency, and Samsung MZ-7TE500BW 500GiB SSD with clang, libdispatch, libatlas
Scientific Library.

The CPU version of forward pass (from RGB image input to the classification result) takes about
350ms per image. This is achieved with multi-threaded convolutional kernel computation. Decaf (
700ms per image. This is achieved with multi-threaded convolutional kernel computation. Decaf (
the CPU counter-part of Caffe) reported their forward pass at around 0.5s per image with
unspecified hardware over 10 patches (the same as ccv's cnnclassify implementation). I cannot
get sensible number off OverFeat on my machine (it reports about 1.4s for forward pass, that
Expand All @@ -81,7 +96,7 @@ within 6 days on two GeForce 580, which suggests my time is within line of these
As a preliminary implementation, I didn't spend enough time to optimize these operations in ccv if
any at all. For example, [cuda-convnet](http://code.google.com/p/cuda-convnet/) implements its
functionalities in about 10,000 lines of code, Caffe implements with 14,000 lines of code, as of
this release, ccv implements with about 3,700 lines of code. For the future, the low-hanging
this release, ccv implements with about 4,300 lines of code. For the future, the low-hanging
optimization opportunities include using SIMD instruction, doing FFT in densely convolved layers
etc.

Expand All @@ -100,8 +115,6 @@ I downloaded the ImageNet dataset from this torrent:
Assuming you've downloaded / bought all these and installed on your computer, get a hot tea, it will
take a while to get all the puzzles and riddles in place for the training starts.

Ready? Continue!

The ImageNet metadata for 2010 challenge can be downloaded from
http://www.image-net.org/challenges/LSVRC/2010/download-public

Expand Down Expand Up @@ -158,7 +171,8 @@ The generated image-net.sqlite3 file is about 600MiB in size because it contains
and resume. You can either open this file with sqlite command-line tool (it is a vanilla sqlite database
file), and do:

drop table function_state, momentum_data;
drop table function_state;
drop table momentum_data;
vacuum;

The file size will shrink to about 200MiB. You can achieve further reduction in file size by rewrite it into
Expand All @@ -174,3 +188,61 @@ practically anywhere and anyhow with proper attribution. As far as I can tell, t
data released under commercial-friendly license (Caffe itself is released under FreeBSD license but
its pre-trained data is "research only" and OverFeat is released under custom research only license).

Differences between ccv's implementation, Caffe's and Alex's
------------------------------------------------------------

Although the network topology of ccv's implementation followed closely to Alex's (as well as Caffe's),
the reported results diverged significantly enough for me to document the differences in implementation
details.

Network Topology:

ccv's local response normalization layer followed the convolutional layer, and the pooling layer is after
the local response normalization. This is briefly mentioned in Alex's paper, but in Caffe, their local
response normalization layer followed the pooling layer.

The input dimension to ccv's implemented network is 225x225, and in Caffe, it is 227x227. Alex's paper
mentioned their input size is 224x224. For 225x225, it implies a 1 pixel padding around the input image
such that with 11x11 filter and 4 stride size, a 55x55 output will be generated.

Data Preparation:

Caffe's implementation resizes image into 256x256 size without retaining aspect ratio. Alex's implementation
resizes image into sizes such that the minimal dimension is 256 while retains the aspect ratio (at least
as the paper implied) and cropped the image into 256x256 size. ccv's implementation resizes image into sizes
such that the minimal dimension is 257 while retains the aspect ratio (downsamples with CCV_INTER_AREA
interpolation and upsamples with CCV_INTER_CUBIC interpoliation if needed). ccv's implementation obtains
the mean image from center cropped 257x257 images.

Data Augmentation:

Caffe's implementation randomly crops image from 256x256 to 227x227. Alex's implementation randomly crops
image from 256x256 to 224x224 and then applied color augmentation with Gaussian random coefficient sampled
with sigma == 0.1. ccv's implementation randomly crops image from the aspect retained sizes into 257x257,
subtract the mean image and then randomly crops it into 225x225, color augmentation is applied with Gassian
random coefficient sampled with sigma == 0.001. All three implementations did horizontal mirroring as a
data augmentation technique.

Averaged Classification:

Caffe averages the softmax output of 10 patches from the test image by first resize image into 256x256 without
retaining aspect ratio, and then the first 5 patches of size 227x227 cropped from top left, top right, center,
bottom left, bottom right of the resized test image, the second 5 patches are the horizontal mirrors of the
first 5 patches.

Alex's implementation averages the softmax output of 10 patches from the test image by first resize image into
sizes such that the minimal dimension is 256 while retains the aspect ratio and then center-crops into 256x256.
The 10 patches of size 224x224 are sampled from the 256x256 crop the same way as Caffe did.

ccv's GPU implementation averages the softmax output of 30 patches from the test image by first resize the image
into sizes such that the minimal dimension is 257. Then it makes 3 crops from top left, center, and bottom right
so that the cropped image is 257x257. The cropped images subtract mean image, and then each cropped from
top left, top right, center, bottom left, bottom right into 225x225. This generates 15 patches, and each one
of them has its horizontally-mirrored counter-part.

ccv's CPU implementation for efficiency considerations averages the softmax output of 10 patches from the test
image by first resize the image into sizes such that the minimal dimension is 257. The mean image is upsampled
into the same size with CCV_INTER_CUBIC and then is subtracted from the resized image. The top left, top right,
center, bottom left, bottom right patches of 225x225 is extracted and horizontally mirrored to generate the 10
patches.

19 changes: 15 additions & 4 deletions lib/cuda/cwc_convnet.cu
Original file line number Diff line number Diff line change
Expand Up @@ -582,10 +582,17 @@ static void _cwc_convnet_alloc_reserved_for_classify(ccv_convnet_t* convnet, int
GPU(convnet)->scans = (float**)(GPU(convnet)->layers + convnet->count) + convnet->count * 2;
_cwc_convnet_alloc_scans(convnet, scan, batch * 30);
GPU(convnet)->backwards = 0;
GPU(convnet)->contexts[0].host.dor = GPU(convnet)->contexts[0].device.dor = 0;
_cwc_convnet_alloc_input(convnet, convnet->input.height, convnet->input.width, 0, batch * 6);
_cwc_convnet_alloc_c(convnet, 0, batch * tops);
_cwc_convnet_alloc_out(convnet, 0, batch * tops);
_cwc_convnet_alloc_context(convnet, 0);
GPU(convnet)->contexts[1].host.dor = GPU(convnet)->contexts[1].device.dor = 0;
GPU(convnet)->contexts[1].host.input = GPU(convnet)->contexts[1].device.input = 0;
GPU(convnet)->contexts[1].host.c = GPU(convnet)->contexts[1].device.c = 0;
GPU(convnet)->contexts[1].host.out = GPU(convnet)->contexts[1].device.out = 0;
GPU(convnet)->contexts[1].device.stream = 0;
GPU(convnet)->contexts[1].device.cublas = 0;
}

// allocate reserved for both forward and backward path
Expand Down Expand Up @@ -2908,8 +2915,10 @@ void cwc_convnet_compact(ccv_convnet_t* convnet)
for (i = 0; i < 2; i++)
{
cwc_convnet_context_t* context = GPU(convnet)->contexts + i;
cudaFreeHost(context->host.input);
cudaFree(context->device.input);
if (context->host.input)
cudaFreeHost(context->host.input);
if (context->device.input)
cudaFree(context->device.input);
if (context->host.c)
cudaFreeHost(context->host.c);
if (context->device.c)
Expand All @@ -2918,8 +2927,10 @@ void cwc_convnet_compact(ccv_convnet_t* convnet)
cudaFreeHost(context->host.out);
if (context->device.out)
cudaFree(context->device.out);
cudaStreamDestroy(context->device.stream);
cublasDestroy(context->device.cublas);
if (context->device.stream)
cudaStreamDestroy(context->device.stream);
if (context->device.cublas)
cublasDestroy(context->device.cublas);
}
for (i = 0; i < convnet->count; i++)
{
Expand Down
98 changes: 85 additions & 13 deletions site/_posts/0000-01-01-doc-convnet.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -53,18 +53,33 @@ Accuracy-wise:
The test is performed on ILSVRC 2010 test dataset, as of time being, I cannot obtain the validation
dataset for ILSVRC 2012.

The training stopped to improve at around 90 epochs, at that time, the central patch obtained
42.81% of top-1 missing rate (lower is better). In Alex's paper, they reported 37.5% top-1
missing rate when averaging 10 patches, and 39% top-1 missing rate when using one patch.
The training stopped to improve at around 60 epochs, at that time, the central patch from test set
obtained 39.71% of top-1 missing rate (lower is better) and the training set obtained 37.80% of
top-1 missing rate. In Alex's paper, they reported 37.5% top-1 missing rate when averaging 10 patches,
and 39% top-1 missing rate when using the central patch in test set.

By applying this patch: <https://gist.github.com/liuliu/9420735>
Assuming you have ILSVRC 2010 test set files ordered in image-net-test.txt, run

git am -3 9420935.patch
./cnnclassify image-net-test.txt ../samples/image-net.sqlite3 > image-net-classify.txt

For 32-bit float point image-net.sqlite3, the top-1 missing rate is 39.51%, 2% shying from
Alex's result. For half precision image-net.sqlite3 (the one included in ./samples/), the top-1
missing rate is 39.8%, 0.3% worse than the 32-bit float point one. You can download the float
point one with ./samples/download-image-net.sh
For complete test set to finish, this command takes an hour on GPU, and if you don't have GPU
enabled, it will take about a day to run on CPU.

Assuming you have the ILSVRC 2010 ground truth data in LSVRC2010_test_ground_truth.txt

./cnnvldtr.rb LSVRC2010_test_ground_truth.txt image-net-classify.txt

will reports the top-1 missing rate as well as top-5 missing rate.

For 32-bit float point image-net.sqlite3 on GPU, the top-1 missing rate is 36.82%, 0.68% better
than Alex's result, the top-5 missing rate is 16.26%, 0.74% better than Alex's. For half precision
image-net.sqlite3 (the one included in ./samples/), the top-1 missing rate is 36.83% and the top-5
missing rate is 16.25%.

For 32-bit float point image-net.sqlite3 on CPU, the top-1 missing rate is 37.34%, and the top-5
missing rate is 16.62%.

You can download the 32-bit float point one with ./samples/download-image-net.sh

Speed-wise:

Expand All @@ -73,7 +88,7 @@ frequency, and Samsung MZ-7TE500BW 500GiB SSD with clang, libdispatch, libatlas
Scientific Library.

The CPU version of forward pass (from RGB image input to the classification result) takes about
350ms per image. This is achieved with multi-threaded convolutional kernel computation. Decaf (
700ms per image. This is achieved with multi-threaded convolutional kernel computation. Decaf (
the CPU counter-part of Caffe) reported their forward pass at around 0.5s per image with
unspecified hardware over 10 patches (the same as ccv's cnnclassify implementation). I cannot
get sensible number off OverFeat on my machine (it reports about 1.4s for forward pass, that
Expand Down Expand Up @@ -109,8 +124,6 @@ I downloaded the ImageNet dataset from this torrent:
Assuming you've downloaded / bought all these and installed on your computer, get a hot tea, it will
take a while to get all the puzzles and riddles in place for the training starts.

Ready? Continue!

The ImageNet metadata for 2010 challenge can be downloaded from
<http://www.image-net.org/challenges/LSVRC/2010/download-public>

Expand Down Expand Up @@ -167,7 +180,8 @@ The generated image-net.sqlite3 file is about 600MiB in size because it contains
and resume. You can either open this file with sqlite command-line tool (it is a vanilla sqlite database
file), and do:

drop table function_state, momentum_data;
drop table function_state;
drop table momentum_data;
vacuum;

The file size will shrink to about 200MiB. You can achieve further reduction in file size by rewrite it into
Expand All @@ -183,3 +197,61 @@ practically anywhere and anyhow with proper attribution. As far as I can tell, t
data released under commercial-friendly license (Caffe itself is released under FreeBSD license but
its pre-trained data is "research only" and OverFeat is released under custom research only license).

Differences between ccv's implementation, Caffe's and Alex's
------------------------------------------------------------

Although the network topology of ccv's implementation followed closely to Alex's (as well as Caffe's),
the reported results diverged significantly enough for me to document the differences in implementation
details.

Network Topology:

ccv's local response normalization layer followed the convolutional layer, and the pooling layer is after
the local response normalization. This is briefly mentioned in Alex's paper, but in Caffe, their local
response normalization layer followed the pooling layer.

The input dimension to ccv's implemented network is 225x225, and in Caffe, it is 227x227. Alex's paper
mentioned their input size is 224x224. For 225x225, it implies a 1 pixel padding around the input image
such that with 11x11 filter and 4 stride size, a 55x55 output will be generated.

Data Preparation:

Caffe's implementation resizes image into 256x256 size without retaining aspect ratio. Alex's implementation
resizes image into sizes such that the minimal dimension is 256 while retains the aspect ratio (at least
as the paper implied) and cropped the image into 256x256 size. ccv's implementation resizes image into sizes
such that the minimal dimension is 257 while retains the aspect ratio (downsamples with CCV_INTER_AREA
interpolation and upsamples with CCV_INTER_CUBIC interpoliation if needed). ccv's implementation obtains
the mean image from center cropped 257x257 images.

Data Augmentation:

Caffe's implementation randomly crops image from 256x256 to 227x227. Alex's implementation randomly crops
image from 256x256 to 224x224 and then applied color augmentation with Gaussian random coefficient sampled
with sigma == 0.1. ccv's implementation randomly crops image from the aspect retained sizes into 257x257,
subtract the mean image and then randomly crops it into 225x225, color augmentation is applied with Gassian
random coefficient sampled with sigma == 0.001. All three implementations did horizontal mirroring as a
data augmentation technique.

Averaged Classification:

Caffe averages the softmax output of 10 patches from the test image by first resize image into 256x256 without
retaining aspect ratio, and then the first 5 patches of size 227x227 cropped from top left, top right, center,
bottom left, bottom right of the resized test image, the second 5 patches are the horizontal mirrors of the
first 5 patches.

Alex's implementation averages the softmax output of 10 patches from the test image by first resize image into
sizes such that the minimal dimension is 256 while retains the aspect ratio and then center-crops into 256x256.
The 10 patches of size 224x224 are sampled from the 256x256 crop the same way as Caffe did.

ccv's GPU implementation averages the softmax output of 30 patches from the test image by first resize the image
into sizes such that the minimal dimension is 257. Then it makes 3 crops from top left, center, and bottom right
so that the cropped image is 257x257. The cropped images subtract mean image, and then each cropped from
top left, top right, center, bottom left, bottom right into 225x225. This generates 15 patches, and each one
of them has its horizontally-mirrored counter-part.

ccv's CPU implementation for efficiency considerations averages the softmax output of 10 patches from the test
image by first resize the image into sizes such that the minimal dimension is 257. The mean image is upsampled
into the same size with CCV_INTER_CUBIC and then is subtracted from the resized image. The top left, top right,
center, bottom left, bottom right patches of 225x225 is extracted and horizontally mirrored to generate the 10
patches.

0 comments on commit 2542c03

Please sign in to comment.