fixed documentation for convnet.md

updated image-net.sqlite3
liuliu · Mar 27, 2014 · 2542c03 · 2542c03
1 parent e9586df
commit 2542c03
Show file tree

Hide file tree

Showing 6 changed files with 189 additions and 34 deletions.
diff --git a/bin/cnnclassify.c b/bin/cnnclassify.c
@@ -43,7 +43,7 @@ int main(int argc, char** argv)
 			chdir(argv[3]);
 		if(r)
 		{
-			ccv_convnet_t* convnet = ccv_convnet_read(0, argv[2]);
+			ccv_convnet_t* convnet = ccv_convnet_read(1, argv[2]);
 			int i, j, k = 0;
 			ccv_dense_matrix_t* images[32] = {
 				0

diff --git a/bin/cnndraw.rb b/bin/cnndraw.rb
@@ -18,7 +18,7 @@
 	print line
 	args = line.split " "
 	break if args[0] == 'elapsed'
-	for i in 0..args.length / 2
+	for i in 0..(args.length / 2 - 1)
 		draw += sprintf("-fill none -strokewidth 1 -stroke DodgerBlue -draw \"rectangle 15,%d,165,%d\" -fill DodgerBlue -draw \"rectangle 15,%d,%d,%d\" -strokewidth 0 -stroke none -fill red -draw 'text 18,%d \"%s\"' ", y, y + 16, y, (args[i * 2 + 1].to_f * 150).to_i + 15, y + 16, y + 13, labels[args[i * 2]])
 		y += 31
 	end

diff --git a/bin/cnnvldtr.rb b/bin/cnnvldtr.rb
@@ -19,4 +19,4 @@
 	i += 1
 end
 
-print ((miss1.to_f / i.to_f * 10000).round / 100.0).to_s + "% (1), " + ((miss5.to_f / i.to_f * 10000).round / 100.0).to_s + "%(5)\n"
+print ((miss1.to_f / i.to_f * 10000).round / 100.0).to_s + "% (1), " + ((miss5.to_f / i.to_f * 10000).round / 100.0).to_s + "% (5)\n"
diff --git a/doc/convnet.md b/doc/convnet.md
@@ -44,18 +44,33 @@ Accuracy-wise:
 The test is performed on ILSVRC 2010 test dataset, as of time being, I cannot obtain the validation
 dataset for ILSVRC 2012.
 
-The training stopped to improve at around 60 epochs, at that time, the central patch obtained
-39.71% of top-1 missing rate (lower is better). In Alex's paper, they reported 37.5% top-1
-missing rate when averaging 10 patches, and 39% top-1 missing rate when using one patch.
+The training stopped to improve at around 60 epochs, at that time, the central patch from test set
+obtained 39.71% of top-1 missing rate (lower is better) and the training set obtained 37.80% of
+top-1 missing rate. In Alex's paper, they reported 37.5% top-1 missing rate when averaging 10 patches,
+and 39% top-1 missing rate when using the central patch in test set.
 
-By applying this patch: https://gist.github.com/liuliu/9420735
+Assuming you have ILSVRC 2010 test set files ordered in image-net-test.txt, run
 
-	git am -3 9420935.patch
+	./cnnclassify image-net-test.txt ../samples/image-net.sqlite3 > image-net-classify.txt
 
-For 32-bit float point image-net.sqlite3, the top-1 missing rate is 36.97%, 0.53% better than
-Alex's result. For half precision image-net.sqlite3 (the one included in ./samples/), the top-1
-missing rate is 39.8%, 0.3% worse than the 32-bit float point one. You can download the float
-point one with ./samples/download-image-net.sh
+For complete test set to finish, this command takes an hour on GPU, and if you don't have GPU
+enabled, it will take about a day to run on CPU.
+
+Assuming you have the ILSVRC 2010 ground truth data in LSVRC2010_test_ground_truth.txt
+
+	./cnnvldtr.rb LSVRC2010_test_ground_truth.txt image-net-classify.txt
+
+will reports the top-1 missing rate as well as top-5 missing rate.
+
+For 32-bit float point image-net.sqlite3 on GPU, the top-1 missing rate is 36.82%, 0.68% better
+than Alex's result, the top-5 missing rate is 16.26%, 0.74% better than Alex's. For half precision
+image-net.sqlite3 (the one included in ./samples/), the top-1 missing rate is 36.83% and the top-5
+missing rate is 16.25%.
+
+For 32-bit float point image-net.sqlite3 on CPU, the top-1 missing rate is 37.34%, and the top-5
+missing rate is 16.62%.
+
+You can download the 32-bit float point one with ./samples/download-image-net.sh
 
 Speed-wise:
 
@@ -64,7 +79,7 @@ frequency, and Samsung MZ-7TE500BW 500GiB SSD with clang, libdispatch, libatlas
 Scientific Library.
 
 The CPU version of forward pass (from RGB image input to the classification result) takes about
-350ms per image. This is achieved with multi-threaded convolutional kernel computation. Decaf (
+700ms per image. This is achieved with multi-threaded convolutional kernel computation. Decaf (
 the CPU counter-part of Caffe) reported their forward pass at around 0.5s per image with
 unspecified hardware over 10 patches (the same as ccv's cnnclassify implementation). I cannot
 get sensible number off OverFeat on my machine (it reports about 1.4s for forward pass, that
@@ -81,7 +96,7 @@ within 6 days on two GeForce 580, which suggests my time is within line of these
 As a preliminary implementation, I didn't spend enough time to optimize these operations in ccv if
 any at all. For example, [cuda-convnet](http://code.google.com/p/cuda-convnet/) implements its
 functionalities in about 10,000 lines of code, Caffe implements with 14,000 lines of code, as of
-this release, ccv implements with about 3,700 lines of code. For the future, the low-hanging
+this release, ccv implements with about 4,300 lines of code. For the future, the low-hanging
 optimization opportunities include using SIMD instruction, doing FFT in densely convolved layers
 etc.
 
@@ -100,8 +115,6 @@ I downloaded the ImageNet dataset from this torrent:
 Assuming you've downloaded / bought all these and installed on your computer, get a hot tea, it will
 take a while to get all the puzzles and riddles in place for the training starts.
 
-Ready? Continue!
-
 The ImageNet metadata for 2010 challenge can be downloaded from
 http://www.image-net.org/challenges/LSVRC/2010/download-public
 
@@ -158,7 +171,8 @@ The generated image-net.sqlite3 file is about 600MiB in size because it contains
 and resume. You can either open this file with sqlite command-line tool (it is a vanilla sqlite database
 file), and do:
 
-	drop table function_state, momentum_data;
+	drop table function_state;
+	drop table momentum_data;
 	vacuum;
 
 The file size will shrink to about 200MiB. You can achieve further reduction in file size by rewrite it into
@@ -174,3 +188,61 @@ practically anywhere and anyhow with proper attribution. As far as I can tell, t
 data released under commercial-friendly license (Caffe itself is released under FreeBSD license but
 its pre-trained data is "research only" and OverFeat is released under custom research only license).
 
+Differences between ccv's implementation, Caffe's and Alex's
+------------------------------------------------------------
+
+Although the network topology of ccv's implementation followed closely to Alex's (as well as Caffe's),
+the reported results diverged significantly enough for me to document the differences in implementation
+details.
+
+Network Topology:
+
+ccv's local response normalization layer followed the convolutional layer, and the pooling layer is after
+the local response normalization. This is briefly mentioned in Alex's paper, but in Caffe, their local
+response normalization layer followed the pooling layer.
+
+The input dimension to ccv's implemented network is 225x225, and in Caffe, it is 227x227. Alex's paper
+mentioned their input size is 224x224. For 225x225, it implies a 1 pixel padding around the input image
+such that with 11x11 filter and 4 stride size, a 55x55 output will be generated.
+
+Data Preparation:
+
+Caffe's implementation resizes image into 256x256 size without retaining aspect ratio. Alex's implementation
+resizes image into sizes such that the minimal dimension is 256 while retains the aspect ratio (at least
+as the paper implied) and cropped the image into 256x256 size. ccv's implementation resizes image into sizes
+such that the minimal dimension is 257 while retains the aspect ratio (downsamples with CCV_INTER_AREA
+interpolation and upsamples with CCV_INTER_CUBIC interpoliation if needed). ccv's implementation obtains
+the mean image from center cropped 257x257 images.
+
+Data Augmentation:
+
+Caffe's implementation randomly crops image from 256x256 to 227x227. Alex's implementation randomly crops
+image from 256x256 to 224x224 and then applied color augmentation with Gaussian random coefficient sampled
+with sigma == 0.1. ccv's implementation randomly crops image from the aspect retained sizes into 257x257,
+subtract the mean image and then randomly crops it into 225x225, color augmentation is applied with Gassian
+random coefficient sampled with sigma == 0.001. All three implementations did horizontal mirroring as a
+data augmentation technique.
+
+Averaged Classification:
+
+Caffe averages the softmax output of 10 patches from the test image by first resize image into 256x256 without
+retaining aspect ratio, and then the first 5 patches of size 227x227 cropped from top left, top right, center,
+bottom left, bottom right of the resized test image, the second 5 patches are the horizontal mirrors of the
+first 5 patches.
+
+Alex's implementation averages the softmax output of 10 patches from the test image by first resize image into
+sizes such that the minimal dimension is 256 while retains the aspect ratio and then center-crops into 256x256.
+The 10 patches of size 224x224 are sampled from the 256x256 crop the same way as Caffe did.
+
+ccv's GPU implementation averages the softmax output of 30 patches from the test image by first resize the image
+into sizes such that the minimal dimension is 257. Then it makes 3 crops from top left, center, and bottom right
+so that the cropped image is 257x257. The cropped images subtract mean image, and then each cropped from
+top left, top right, center, bottom left, bottom right into 225x225. This generates 15 patches, and each one
+of them has its horizontally-mirrored counter-part.
+
+ccv's CPU implementation for efficiency considerations averages the softmax output of 10 patches from the test
+image by first resize the image into sizes such that the minimal dimension is 257. The mean image is upsampled
+into the same size with CCV_INTER_CUBIC and then is subtracted from the resized image. The top left, top right,
+center, bottom left, bottom right patches of 225x225 is extracted and horizontally mirrored to generate the 10
+patches.
+
diff --git a/lib/cuda/cwc_convnet.cu b/lib/cuda/cwc_convnet.cu
@@ -582,10 +582,17 @@ static void _cwc_convnet_alloc_reserved_for_classify(ccv_convnet_t* convnet, int
 	GPU(convnet)->scans = (float**)(GPU(convnet)->layers + convnet->count) + convnet->count * 2;
 	_cwc_convnet_alloc_scans(convnet, scan, batch * 30);
 	GPU(convnet)->backwards = 0;
+	GPU(convnet)->contexts[0].host.dor = GPU(convnet)->contexts[0].device.dor = 0;
 	_cwc_convnet_alloc_input(convnet, convnet->input.height, convnet->input.width, 0, batch * 6);
 	_cwc_convnet_alloc_c(convnet, 0, batch * tops);
 	_cwc_convnet_alloc_out(convnet, 0, batch * tops);
 	_cwc_convnet_alloc_context(convnet, 0);
+	GPU(convnet)->contexts[1].host.dor = GPU(convnet)->contexts[1].device.dor = 0;
+	GPU(convnet)->contexts[1].host.input = GPU(convnet)->contexts[1].device.input = 0;
+	GPU(convnet)->contexts[1].host.c = GPU(convnet)->contexts[1].device.c = 0;
+	GPU(convnet)->contexts[1].host.out = GPU(convnet)->contexts[1].device.out = 0;
+	GPU(convnet)->contexts[1].device.stream = 0;
+	GPU(convnet)->contexts[1].device.cublas = 0;
 }
 
 // allocate reserved for both forward and backward path
@@ -2908,8 +2915,10 @@ void cwc_convnet_compact(ccv_convnet_t* convnet)
 		for (i = 0; i < 2; i++)
 		{
 			cwc_convnet_context_t* context = GPU(convnet)->contexts + i;
-			cudaFreeHost(context->host.input);
-			cudaFree(context->device.input);
+			if (context->host.input)
+				cudaFreeHost(context->host.input);
+			if (context->device.input)
+				cudaFree(context->device.input);
 			if (context->host.c)
 				cudaFreeHost(context->host.c);
 			if (context->device.c)
@@ -2918,8 +2927,10 @@ void cwc_convnet_compact(ccv_convnet_t* convnet)
 				cudaFreeHost(context->host.out);
 			if (context->device.out)
 				cudaFree(context->device.out);
-			cudaStreamDestroy(context->device.stream);
-			cublasDestroy(context->device.cublas);
+			if (context->device.stream)
+				cudaStreamDestroy(context->device.stream);
+			if (context->device.cublas)
+				cublasDestroy(context->device.cublas);
 		}
 		for (i = 0; i < convnet->count; i++)
 		{

diff --git a/site/_posts/0000-01-01-doc-convnet.markdown b/site/_posts/0000-01-01-doc-convnet.markdown
@@ -53,18 +53,33 @@ Accuracy-wise:
 The test is performed on ILSVRC 2010 test dataset, as of time being, I cannot obtain the validation
 dataset for ILSVRC 2012.
 
-The training stopped to improve at around 90 epochs, at that time, the central patch obtained
-42.81% of top-1 missing rate (lower is better). In Alex's paper, they reported 37.5% top-1
-missing rate when averaging 10 patches, and 39% top-1 missing rate when using one patch.
+The training stopped to improve at around 60 epochs, at that time, the central patch from test set
+obtained 39.71% of top-1 missing rate (lower is better) and the training set obtained 37.80% of
+top-1 missing rate. In Alex's paper, they reported 37.5% top-1 missing rate when averaging 10 patches,
+and 39% top-1 missing rate when using the central patch in test set.
 
-By applying this patch: <https://gist.github.com/liuliu/9420735>
+Assuming you have ILSVRC 2010 test set files ordered in image-net-test.txt, run
 
-	git am -3 9420935.patch
+	./cnnclassify image-net-test.txt ../samples/image-net.sqlite3 > image-net-classify.txt
 
-For 32-bit float point image-net.sqlite3, the top-1 missing rate is 39.51%, 2% shying from
-Alex's result. For half precision image-net.sqlite3 (the one included in ./samples/), the top-1
-missing rate is 39.8%, 0.3% worse than the 32-bit float point one. You can download the float
-point one with ./samples/download-image-net.sh
+For complete test set to finish, this command takes an hour on GPU, and if you don't have GPU
+enabled, it will take about a day to run on CPU.
+
+Assuming you have the ILSVRC 2010 ground truth data in LSVRC2010_test_ground_truth.txt
+
+	./cnnvldtr.rb LSVRC2010_test_ground_truth.txt image-net-classify.txt
+
+will reports the top-1 missing rate as well as top-5 missing rate.
+
+For 32-bit float point image-net.sqlite3 on GPU, the top-1 missing rate is 36.82%, 0.68% better
+than Alex's result, the top-5 missing rate is 16.26%, 0.74% better than Alex's. For half precision
+image-net.sqlite3 (the one included in ./samples/), the top-1 missing rate is 36.83% and the top-5
+missing rate is 16.25%.
+
+For 32-bit float point image-net.sqlite3 on CPU, the top-1 missing rate is 37.34%, and the top-5
+missing rate is 16.62%.
+
+You can download the 32-bit float point one with ./samples/download-image-net.sh
 
 Speed-wise:
 
@@ -73,7 +88,7 @@ frequency, and Samsung MZ-7TE500BW 500GiB SSD with clang, libdispatch, libatlas
 Scientific Library.
 
 The CPU version of forward pass (from RGB image input to the classification result) takes about
-350ms per image. This is achieved with multi-threaded convolutional kernel computation. Decaf (
+700ms per image. This is achieved with multi-threaded convolutional kernel computation. Decaf (
 the CPU counter-part of Caffe) reported their forward pass at around 0.5s per image with
 unspecified hardware over 10 patches (the same as ccv's cnnclassify implementation). I cannot
 get sensible number off OverFeat on my machine (it reports about 1.4s for forward pass, that
@@ -109,8 +124,6 @@ I downloaded the ImageNet dataset from this torrent:
 Assuming you've downloaded / bought all these and installed on your computer, get a hot tea, it will
 take a while to get all the puzzles and riddles in place for the training starts.
 
-Ready? Continue!
-
 The ImageNet metadata for 2010 challenge can be downloaded from
 <http://www.image-net.org/challenges/LSVRC/2010/download-public>
 
@@ -167,7 +180,8 @@ The generated image-net.sqlite3 file is about 600MiB in size because it contains
 and resume. You can either open this file with sqlite command-line tool (it is a vanilla sqlite database
 file), and do:
 
-	drop table function_state, momentum_data;
+	drop table function_state;
+	drop table momentum_data;
 	vacuum;
 
 The file size will shrink to about 200MiB. You can achieve further reduction in file size by rewrite it into
@@ -183,3 +197,61 @@ practically anywhere and anyhow with proper attribution. As far as I can tell, t
 data released under commercial-friendly license (Caffe itself is released under FreeBSD license but
 its pre-trained data is "research only" and OverFeat is released under custom research only license).
 
+Differences between ccv's implementation, Caffe's and Alex's
+------------------------------------------------------------
+
+Although the network topology of ccv's implementation followed closely to Alex's (as well as Caffe's),
+the reported results diverged significantly enough for me to document the differences in implementation
+details.
+
+Network Topology:
+
+ccv's local response normalization layer followed the convolutional layer, and the pooling layer is after
+the local response normalization. This is briefly mentioned in Alex's paper, but in Caffe, their local
+response normalization layer followed the pooling layer.
+
+The input dimension to ccv's implemented network is 225x225, and in Caffe, it is 227x227. Alex's paper
+mentioned their input size is 224x224. For 225x225, it implies a 1 pixel padding around the input image
+such that with 11x11 filter and 4 stride size, a 55x55 output will be generated.
+
+Data Preparation:
+
+Caffe's implementation resizes image into 256x256 size without retaining aspect ratio. Alex's implementation
+resizes image into sizes such that the minimal dimension is 256 while retains the aspect ratio (at least
+as the paper implied) and cropped the image into 256x256 size. ccv's implementation resizes image into sizes
+such that the minimal dimension is 257 while retains the aspect ratio (downsamples with CCV_INTER_AREA
+interpolation and upsamples with CCV_INTER_CUBIC interpoliation if needed). ccv's implementation obtains
+the mean image from center cropped 257x257 images.
+
+Data Augmentation:
+
+Caffe's implementation randomly crops image from 256x256 to 227x227. Alex's implementation randomly crops
+image from 256x256 to 224x224 and then applied color augmentation with Gaussian random coefficient sampled
+with sigma == 0.1. ccv's implementation randomly crops image from the aspect retained sizes into 257x257,
+subtract the mean image and then randomly crops it into 225x225, color augmentation is applied with Gassian
+random coefficient sampled with sigma == 0.001. All three implementations did horizontal mirroring as a
+data augmentation technique.
+
+Averaged Classification:
+
+Caffe averages the softmax output of 10 patches from the test image by first resize image into 256x256 without
+retaining aspect ratio, and then the first 5 patches of size 227x227 cropped from top left, top right, center,
+bottom left, bottom right of the resized test image, the second 5 patches are the horizontal mirrors of the
+first 5 patches.
+
+Alex's implementation averages the softmax output of 10 patches from the test image by first resize image into
+sizes such that the minimal dimension is 256 while retains the aspect ratio and then center-crops into 256x256.
+The 10 patches of size 224x224 are sampled from the 256x256 crop the same way as Caffe did.
+
+ccv's GPU implementation averages the softmax output of 30 patches from the test image by first resize the image
+into sizes such that the minimal dimension is 257. Then it makes 3 crops from top left, center, and bottom right
+so that the cropped image is 257x257. The cropped images subtract mean image, and then each cropped from
+top left, top right, center, bottom left, bottom right into 225x225. This generates 15 patches, and each one
+of them has its horizontally-mirrored counter-part.
+
+ccv's CPU implementation for efficiency considerations averages the softmax output of 10 patches from the test
+image by first resize the image into sizes such that the minimal dimension is 257. The mean image is upsampled
+into the same size with CCV_INTER_CUBIC and then is subtracted from the resized image. The top left, top right,
+center, bottom left, bottom right patches of 225x225 is extracted and horizontally mirrored to generate the 10
+patches.
+