Skip to content

Commit

Permalink
Merge pull request #7 from margottd/master
Browse files Browse the repository at this point in the history
Graphs and ReadMe updated with TX2 and new backends
  • Loading branch information
beniz committed Sep 16, 2019
2 parents bdaaf34 + b427e6b commit a687a9b
Show file tree
Hide file tree
Showing 8 changed files with 58 additions and 9 deletions.
67 changes: 58 additions & 9 deletions README.md
Expand Up @@ -3,8 +3,6 @@

This report documents the performances of the [DeepDetect](https://deepdetect.com/) Open Source Deep Learning server on a variety of platforms and popular or particularly effective neural network architectures. The full server source code is available from <https://github.com/beniz/deepdetect>.

The reported performances use a customized version of [Caffe](https://github.com/beniz/caffe) as backend.

### Reference platforms
Given different platforms, the result should serve as a reference for parties and users interested in choosing the right NN model for their work on their server or embedded systems.

Expand Down Expand Up @@ -45,25 +43,56 @@ One important aspect of choosing a model is the limitation of the hardware, such
### Results Overview

Below are performances, displayed in log scale.
The reported performances are per image in ms. When batch size is greater than one, the reported value is the average time per image for that batch size. On GPUs and platforms with limited memory, not all batch sizes are applicable.

#### With Caffe as a backend

The reported performances use a customized version of [Caffe](https://github.com/beniz/caffe) as backend.

<table style="width=100%">
<tr>
<th><img src="graph/gtx1080_log.png" width="450"></th>
<th><img src="graph/TX1_log.png" width="450"></th>
<th><img src="graph/TK1_log.png" width="450"></th>
</tr>
</table>
<table style="width=100%">
<tr>
<th><img src="graph/TK1_log.png" width="450"></th>
<th><img src="graph/Raspi_log.png" width="450"></th>
<th><img src="graph/TX1_log.png" width="450"></th>
<th><img src="graph/TX2_caffe_log.png" width="450"></th>
</tr>
</table>
<table style="width=100%">
<tr>
<th><img src="graph/Jetson-nano-log.png" width="450"></th>
<th><img src="graph/Raspi_log.png" width="450"></th>
</tr>
</table>
The reported performances are per image in ms. When batch size is greater than one, the reported value is the average time per image for that batch size. On GPUs and platforms with limited memory, not all batch sizes are applicable.

#### With TensorRT as a backend

<table style="width=100%">
<tr>
<th><img src="graph/TX2_TensorRT_log.png" width="450"></th>
</tr>
</table>

<details>
<summary>See linear-scale plot</summary>

![alt text](graph/TX2_TensorRT_linear.png)

</details>

#### With NCNN as a backend

The graph shows the performances difference between the Raspberry Pi 3 and the Raspberry Pi 4 (2 GB) using NCNN as a backend.

<table style="width=100%">
<tr>
<th><img src="graph/NCNN_models_RPI3_RPI4.png" width="450"></th>
</tr>
</table>


### Discussion

Expand Down Expand Up @@ -97,7 +126,7 @@ The reported performances are per image in ms. When batch size is greater than o
At 1 TFLOPS theoritical output, TX1 is able to push squeezenet\_v1.0, squeezenet\_v1.1, mobilenet\_depthwise, googlenet, and shufflenet to more than 25 fps.
In extreme cases, Tx1 can compute up to 85 fps with batch-size equal or more than 16 for squeezenet\_v1.1. For a project with critical time constraint
such as autonomous cars,
TX1 could prove to be viable solution.
TX1 could prove to be viable solution.

![alt text](graph/TX1_log.png)
<details>
Expand All @@ -107,13 +136,30 @@ The reported performances are per image in ms. When batch size is greater than o

</details>


- Jetson TX2 (1.5 TFLOPS 256 cores)

The Jetson TX2 offers a 1.5 TFLOPS output and is a great solution for fast and power-efficient embedded systems. The TX2 is equipped with an NVIDIA Pascal GPU.
This 7.5-watt modules can push up to 80 fps for a batch size of 128 for the Squeezenet model. In the same conditions, the frame per second goes up to 68 computed images for Googlenet.
On a 64 batch size, the Jetson TX2 can reach up to 50 frames per second. For a project with real time computation like autonomous cars, the Jetson TX2 would be an ideal candidate.
TX2's performances allows very quick on edge computation.

![alt text](graph/TX2_caffe_log.png)
<details>
<summary>See linear-scale plot</summary>

![alt text](graph/TX2_caffe_linear.png)

</details>


- Jetson Nano (500 GFLOPS 128 cores)

Weighting in at 5 W max while in operation, Nano is a low cost solution for embedded system applications and IA on edge.
It has a 500 GFLOPS output. For a batch size of 1, ShuffleNet and SqueezeNet reach respectively 12 and 25 fps.
Jetson Nano can push up to 10 fps with a batch-size superior or equal to 2 for Squeezenet-SSD-faces, SqueezeNet-SS-voc and ResNet18-ocr.
When pushed to a 64 batch size, the Nano can compute up to 48 fps for SqueezeNet and ResNet18-ocr. For a large-scale project or one with budget-constraints
Jetson Nano seems to be an interesting solution.
Jetson Nano seems to be an interesting solution.

![alt text](graph/Jetson-nano-log.png)
<details>
Expand All @@ -139,10 +185,11 @@ The reported performances are per image in ms. When batch size is greater than o

</details>


- Raspberry Pi3 model B (24GFLOPs GPU and 2.3 DMIPS/MHz CPU at 35 USD)

The last in our book is the Raspberry Pi3. At merely 4 watts under load, the Pi ought to be the preferred solution for remote sensing.
The downside lies in its ability to process images, at merely 1 fps max performance.
The downside lies in its ability to process images, at merely 1 fps max performance.

![alt text](graph/Raspi_log.png)

Expand All @@ -154,6 +201,8 @@ The reported performances are per image in ms. When batch size is greater than o
</details>

## Networks comparison across platforms

The reported performances use a customized version of [Caffe](https://github.com/beniz/caffe) as backend.
The results of the comparison of each model accross multiple platform are displayed below.
The legend shows the number of batch size in color coded manner. Note that not all batch sizes are available for all architectures.

Expand Down
Binary file modified graph/Jetson-nano-log.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified graph/Jetson-nano.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added graph/NCNN_models_RPI3_RPI4.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added graph/TX2_TensorRT_linear.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added graph/TX2_TensorRT_log.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added graph/TX2_caffe_linear.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added graph/TX2_caffe_log.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a687a9b

Please sign in to comment.