Merge pull request #7 from margottd/master

Graphs and ReadMe updated with TX2 and new backends
jolibrain · Sep 16, 2019 · a687a9b · a687a9b
2 parents bdaaf34 + b427e6b
commit a687a9b
Show file tree

Hide file tree

Showing 8 changed files with 58 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -3,8 +3,6 @@
 
 This report documents the performances of the [DeepDetect](https://deepdetect.com/) Open Source Deep Learning server on a variety of platforms and popular or particularly effective neural network architectures. The full server source code is available from <https://github.com/beniz/deepdetect>.
 
-The reported performances use a customized version of [Caffe](https://github.com/beniz/caffe) as backend.
-
 ### Reference platforms
 Given different platforms, the result should serve as a reference for parties and users interested in choosing the right NN model for their work on their server or embedded systems.
 
@@ -45,25 +43,56 @@ One important aspect of choosing a model is the limitation of the hardware, such
 ### Results Overview
 
 Below are performances, displayed in log scale.
+The reported performances are per image in ms. When batch size is greater than one, the reported value is the average time per image for that batch size. On GPUs and platforms with limited memory, not all batch sizes are applicable.
+
+#### With Caffe as a backend 
+
+The reported performances use a customized version of [Caffe](https://github.com/beniz/caffe) as backend.
 
 <table style="width=100%">
   <tr>
      <th><img src="graph/gtx1080_log.png" width="450"></th>
-     <th><img src="graph/TX1_log.png" width="450"></th>
+     <th><img src="graph/TK1_log.png" width="450"></th>
   </tr>
 </table>
 <table style="width=100%">
   <tr>
-     <th><img src="graph/TK1_log.png" width="450"></th>
-     <th><img src="graph/Raspi_log.png" width="450"></th>
+     <th><img src="graph/TX1_log.png" width="450"></th>
+     <th><img src="graph/TX2_caffe_log.png" width="450"></th>
   </tr>
 </table>
 <table style="width=100%">
  <tr>
     <th><img src="graph/Jetson-nano-log.png" width="450"></th>
+    <th><img src="graph/Raspi_log.png" width="450"></th>
  </tr>
 </table>
-The reported performances are per image in ms. When batch size is greater than one, the reported value is the average time per image for that batch size. On GPUs and platforms with limited memory, not all batch sizes are applicable.
+
+#### With TensorRT as a backend 
+
+<table style="width=100%">
+  <tr>
+     <th><img src="graph/TX2_TensorRT_log.png" width="450"></th>
+  </tr>
+</table>
+
+<details>
+  <summary>See linear-scale plot</summary> 
+
+  ![alt text](graph/TX2_TensorRT_linear.png)
+
+</details>
+
+#### With NCNN as a backend
+
+The graph shows the performances difference between the Raspberry Pi 3 and the Raspberry Pi 4 (2 GB) using NCNN as a backend. 
+
+<table style="width=100%">
+  <tr>
+     <th><img src="graph/NCNN_models_RPI3_RPI4.png" width="450"></th>
+  </tr>
+</table>
+
 
 ### Discussion
 
@@ -97,7 +126,7 @@ The reported performances are per image in ms. When batch size is greater than o
   At 1 TFLOPS theoritical output, TX1 is able to push squeezenet\_v1.0, squeezenet\_v1.1, mobilenet\_depthwise, googlenet, and shufflenet to more than 25 fps.
   In extreme cases, Tx1 can compute up to 85 fps with batch-size equal or more than 16 for squeezenet\_v1.1. For a project with critical time constraint 
   such as autonomous cars, 
-  TX1 could prove to be viable solution.     
+  TX1 could prove to be viable solution. 
 
 ![alt text](graph/TX1_log.png)
 <details>
@@ -107,13 +136,30 @@ The reported performances are per image in ms. When batch size is greater than o
 
 </details>
 
+
+- Jetson TX2 (1.5 TFLOPS 256 cores)
+
+  The Jetson TX2 offers a 1.5 TFLOPS output and is a great solution for fast and power-efficient embedded systems. The TX2 is equipped with an NVIDIA Pascal GPU.
+  This 7.5-watt modules can push up to 80 fps for a batch size of 128 for the Squeezenet model. In the same conditions, the frame per second  goes up to 68 computed images for Googlenet.  
+  On a 64 batch size, the Jetson TX2 can reach up to 50 frames per second. For a project with real time computation like autonomous cars, the Jetson TX2 would be an ideal candidate. 
+  TX2's performances allows very quick on edge computation.
+
+![alt text](graph/TX2_caffe_log.png)
+<details>
+  <summary>See linear-scale plot</summary> 
+
+  ![alt text](graph/TX2_caffe_linear.png)
+
+</details>
+
+
 - Jetson Nano (500 GFLOPS 128 cores)
 
   Weighting in at 5 W max while in operation, Nano is a low cost solution for embedded system applications and IA on edge. 
   It has a 500 GFLOPS output. For a batch size of 1, ShuffleNet and SqueezeNet reach respectively 12 and 25 fps. 
   Jetson Nano can push up to 10 fps with a batch-size superior or equal to 2 for Squeezenet-SSD-faces, SqueezeNet-SS-voc and ResNet18-ocr.
   When pushed to a 64 batch size, the Nano can compute up to 48 fps for SqueezeNet and ResNet18-ocr. For a large-scale project or one with budget-constraints 
-  Jetson Nano seems to be an interesting solution.     
+  Jetson Nano seems to be an interesting solution. 
 
 ![alt text](graph/Jetson-nano-log.png)
 <details>
@@ -139,10 +185,11 @@ The reported performances are per image in ms. When batch size is greater than o
 
 </details>
 
+
 - Raspberry Pi3 model B (24GFLOPs GPU and 2.3 DMIPS/MHz CPU at 35 USD)
 
   The last in our book is the Raspberry Pi3. At merely 4 watts under load, the Pi ought to be the preferred solution for remote sensing.
-  The downside lies in its ability to process images, at merely 1 fps max performance.  
+  The downside lies in its ability to process images, at merely 1 fps max performance.
 
 ![alt text](graph/Raspi_log.png)
 
@@ -154,6 +201,8 @@ The reported performances are per image in ms. When batch size is greater than o
 </details>
 
 ## Networks comparison across platforms
+
+The reported performances use a customized version of [Caffe](https://github.com/beniz/caffe) as backend.
 The results of the comparison of each model accross multiple platform are displayed below.
 The legend shows the number of batch size in color coded manner. Note that not all batch sizes are available for all architectures.
 

diff --git a/graph/Jetson-nano-log.png b/graph/Jetson-nano-log.png
diff --git a/graph/Jetson-nano.png b/graph/Jetson-nano.png
diff --git a/graph/NCNN_models_RPI3_RPI4.png b/graph/NCNN_models_RPI3_RPI4.png
diff --git a/graph/TX2_TensorRT_linear.png b/graph/TX2_TensorRT_linear.png
diff --git a/graph/TX2_TensorRT_log.png b/graph/TX2_TensorRT_log.png
diff --git a/graph/TX2_caffe_linear.png b/graph/TX2_caffe_linear.png
diff --git a/graph/TX2_caffe_log.png b/graph/TX2_caffe_log.png