Skip to content
Permalink
Browse files

Added Ubuntu 18 instructions

  • Loading branch information...
gineshidalgo99 committed Feb 25, 2019
1 parent f4984c0 commit 36777bee483a18eb70ede9e27d3fc2e63c38d017
@@ -78,4 +78,4 @@ You might select multiple topics, delete the rest:
- Portable demo or compiled library?

10. If **speed performance** issue:
- Report OpenPose timing speed based on [this link](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/installation.md#profiling-speed).
- Report OpenPose timing speed based on [this link](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/speed_up_openpose.md#profiling-speed).
@@ -10,12 +10,27 @@ set(CMAKE_MACOSX_RPATH 1)


### CMAKE HEADERS
if (${CMAKE_VERSION} VERSION_GREATER 3.0.0)
cmake_policy(SET CMP0048 NEW)
project(OpenPose VERSION ${OpenPose_VERSION})
else (${CMAKE_VERSION} VERSION_GREATER 3.0.0)
project(OpenPose)
endif (${CMAKE_VERSION} VERSION_GREATER 3.0.0)
# Ubuntu 18 default. After 3.8, no need for find_CUDA
# https://cmake.org/cmake/help/v3.10/module/FindCUDA.html
# https://cmake.org/cmake/help/v3.10/command/project.html
# https://devblogs.nvidia.com/building-cuda-applications-cmake/
if (${CMAKE_VERSION} VERSION_GREATER 3.9.0)
cmake_policy(SET CMP0048 NEW)
project(OpenPose VERSION ${OpenPose_VERSION})
# # Not tested
# cmake_policy(SET CMP0048 NEW)
# set(CUDACXX /usr/local/cuda/bin/nvcc)
# project(OpenPose VERSION ${OpenPose_VERSION} LANGUAGES CXX CUDA)
# set(AUTO_FOUND_CUDA TRUE)
# # else
# set(AUTO_FOUND_CUDA FALSE)
# Ubuntu 16 default
elseif (${CMAKE_VERSION} VERSION_GREATER 3.0.0)
cmake_policy(SET CMP0048 NEW)
project(OpenPose VERSION ${OpenPose_VERSION})
else (${CMAKE_VERSION} VERSION_GREATER 3.9.0)
project(OpenPose)
endif (${CMAKE_VERSION} VERSION_GREATER 3.9.0)
cmake_minimum_required(VERSION 2.8.7 FATAL_ERROR) # min. cmake version recommended by Caffe


@@ -158,7 +158,7 @@ Output (format, keypoint index ordering, etc.) in [doc/output.md](doc/output.md)
## Speeding Up OpenPose and Benchmark
Check the OpenPose Benchmark as well as some hints to speed up and/or reduce the memory requirements for OpenPose on [doc/speed_up_preserving_accuracy.md](doc/speed_up_preserving_accuracy.md).
Check the OpenPose Benchmark as well as some hints to speed up and/or reduce the memory requirements for OpenPose on [doc/speed_up_openpose.md](doc/speed_up_openpose.md).
@@ -37,19 +37,19 @@ OpenPose - Frequently Asked Question (FAQ)
### Speed Up, Memory Reduction, and Benchmark
**Q: Low speed** - OpenPose is quite slow, is it normal? How can I speed it up?

**A**: Check [doc/speed_up_preserving_accuracy.md](./speed_up_preserving_accuracy.md) to discover the approximate speed of your graphics card and some speed tips.
**A**: Check [doc/speed_up_openpose.md](./speed_up_openpose.md) to discover the approximate speed of your graphics card and some speed tips.



### CPU Version Too Slow
**Q: The CPU version is insanely slow compared to the GPU version.**

**A**: Check [doc/speed_up_preserving_accuracy.md#cpu-version](./speed_up_preserving_accuracy.md#cpu-version) to discover the approximate speed and some speed tips.
**A**: Check [doc/speed_up_openpose.md#cpu-version](./speed_up_openpose.md#cpu-version) to discover the approximate speed and some speed tips.



### Profiling Speed and Estimating FPS without Display
Check the [doc/installation.md#profiling-speed](./installation.md#profiling-speed) section.
Check the [doc/speed_up_openpose.md#profiling-speed](./speed_up_openpose.md#profiling-speed) section.



@@ -109,7 +109,7 @@ COCO model will eventually be removed. BODY_25 model is faster, more accurate, a
### How to Measure the Latency Time?
**Q: How to measure/calculate/estimate the latency/lag time?**

**A**: [Profile](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/installation.md#profiling-speed) the OpenPose speed. For 1-GPU or CPU-only systems (use `--disable_multi_thread` for simplicity in multi-GPU systems for latency measurement), the latency will be roughly the sum of all the reported measurements.
**A**: [Profile](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/speed_up_openpose.md#profiling-speed) the OpenPose speed. For 1-GPU or CPU-only systems (use `--disable_multi_thread` for simplicity in multi-GPU systems for latency measurement), the latency will be roughly the sum of all the reported measurements.



Large diffs are not rendered by default.

@@ -0,0 +1,51 @@
OpenPose - Maximizing the OpenPose Speed
========================================================================================

## Contents
1. [OpenPose Benchmark](#openpose-benchmark)
2. [Profiling Speed](#profiling-speed)
3. [CPU Version](#cpu-version)
4. [Speed Up Preserving Accuracy](#speed-up-preserving-accuracy)
5. [Speed Up and Memory Reduction](#speed-up-and-memory-reduction)





## OpenPose Benchmark
Check the [OpenPose Benchmark](https://docs.google.com/spreadsheets/d/1-DynFGvoScvfWDA1P4jDInCkbD4lg0IKOYbXgEq0sK0/edit#gid=0) to discover the approximate expected speed of your graphics card.



### CPU Version
The CPU version runs at about 0.3 FPS on the COCO model, and at about 0.1 FPS (i.e., about 15 sec / frame) on the default BODY_25 model. Switch to COCO model and/or reduce the `net_resolution` as indicated above. Contradictory fact: BODY_25 model is about 5x slower than COCO on CPU-only version, but it is about 40% faster on GPU version.



### Profiling Speed
OpenPose displays the FPS in the basic GUI. However, more complex speed metrics can be obtained from the command line while running OpenPose. In order to obtain those, compile OpenPose with the `PROFILER_ENABLED` flag on CMake-gui. OpenPose will automatically display time measurements for each subthread after processing `F` frames (by default `F = 1000`, but it can be modified with the `--profile_speed` flag, e.g. `--profile_speed 100`).

- Time measurement for 1 graphic card: The FPS will be the slowest time displayed in your terminal command line (as OpenPose is multi-threaded). Times are in milliseconds, so `FPS = 1000/millisecond_measurement`.
- Time measurement for >1 graphic cards: Assuming `n` graphic cards, you will have to wait up to `n` x `F` frames to visualize each graphic card speed (as the frames are splitted among them). In addition, the FPS would be: `FPS = minFPS(speed_per_GPU/n, worst_time_measurement_other_than_GPUs)`. For < 4 GPUs, this is usually `FPS = speed_per_GPU/n`.

Make sure that `wPoseExtractor` time is the slowest timing. Otherwise the input producer (video/webcam codecs issues with OpenCV, images too big, etc.) or the GUI display (use OpenGL support as detailed in [doc/speed_up_openpose.md](./speed_up_openpose.md)) might not be optimized.



## Speed Up Preserving Accuracy
Some speed tips to maximize the OpenPose runtime speed while preserving the accuracy (do not expect miracles, but it might help a bit boosting the framerate):

1. Enable the `WITH_OPENCV_WITH_OPENGL` flag in CMake to have a much faster GUI display. It reduces the lag and increase the speed of displaying images by telling OpenCV to render the images using OpenGL support. This speeds up display rendering about 3x. E.g., it reduces from about 30 msec to about 3-10 msec the display time for HD resolution images. It requires OpenCV to be compiled with OpenGL support and it provokes a visual aspect-ratio artifact when rendering a folder with images of different resolutions. Note: Default OpenCV in Ubuntu 16 (from apt-get install) does have OpenGL support included. Nevertheless, default one from Ubuntu 18 and the Windows portable binaries do not.
2. Change GPU rendering by CPU rendering to get approximately +0.5 FPS (`--render_pose 1`).
3. Use cuDNN 5.1 or 7.2 (cuDNN 6 is ~10% slower).
4. Use the `BODY_25` model for simultaneously maximum speed and accuracy (both COCO and MPII models are slower and less accurate). But it does increase the GPU memory, so it might go out of memory more easily in low-memory GPUs.



## Speed Up and Memory Reduction
Some speed tips to highly maximize the OpenPose speed, but keep in mind the accuracy trade-off:

1. Reduce the `--net_resolution` (e.g., to 320x176) (lower accuracy). Note: For maximum accuracy, follow [doc/quick_start.md#maximum-accuracy-configuration](./quick_start.md#maximum-accuracy-configuration).
2. For face, reduce the `--face_net_resolution`. The resolution 320x320 usually works pretty decently.
3. Points 1-2 will also reduce the GPU memory usage (or RAM memory for CPU version).
4. Use the `BODY_25` model for maximum speed. Use `MPI_4_layers` model for minimum GPU memory usage (but lower accuracy, speed, and number of parts).

This file was deleted.

@@ -40,7 +40,6 @@ CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_61,code=sm_61 \
-gencode arch=compute_62,code=sm_62 \
-gencode arch=compute_70,code=sm_70 \
-gencode arch=compute_71,code=sm_71 \
-gencode arch=compute_72,code=sm_72 \
-gencode arch=compute_75,code=sm_75 \
-gencode arch=compute_75,code=compute_75
@@ -40,7 +40,6 @@ CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_61,code=sm_61 \
-gencode arch=compute_62,code=sm_62 \
-gencode arch=compute_70,code=sm_70 \
-gencode arch=compute_71,code=sm_71 \
-gencode arch=compute_72,code=sm_72 \
-gencode arch=compute_72,code=compute_72

0 comments on commit 36777be

Please sign in to comment.
You can’t perform that action at this time.