Permalink
Browse files

improved experimental results

  • Loading branch information...
1 parent 1450f52 commit fc3e488bfde5b4bbc6053640d89fbe59f95d9e5d @shinpei0208 committed Apr 4, 2013
View
@@ -19,13 +19,23 @@ \section{Assumption}
This is one of the most recognized approach to object detection.
See \cite{Felzenszwalb10} for the detail.
+\begin{figure}[t]
+ \begin{center}
+ \includegraphics[width=\hsize]{fig/deformable_model.eps}\\
+ \caption{Vehicle detection flow with deformable models.}
+ \label{fig:deformable_model}
+ \end{center}
+\end{figure}
+
Object detection often requires a machine learning phase to construct
the object models.
We assume that this learning phase has already been done a priori and
the object models are stored in the system.
Particularly we restrict our attention to vehicle detection in this
paper, utilizing the vehicle models provided by prior work
\cite{Niknejad12}.
+A brief concept of this approach is illustrated in
+Fig.~\ref{fig:deformable_model}.
Although these models achieve a high detection rate, the computational
cost of scoring similarity of an imput image and the models using HOG
features is very expensive.
View
@@ -2,17 +2,17 @@ \section{Conclusion}
\label{sec:conclusion}
In this paper, we have presented GPU implementations of HOG-based object
-detection and their performance evaluation.
+detection and their detailed performance evaluation.
Unlike preceding work that highly stressed on performance improvements,
our implementations are based on an analysis of performance bottlenecks
posed due to an introduction of the deformable models in HOG-based
object detection.
-This approach ensures that the GPU truly accelerates approapriate
+This approach ensures that the GPU truly accelerates appropriate
computational blocks.
-Our evaluation using a commodity GPU showed that our GPU implementation
-can speed up the existing HOG-based vehicle detection program tailored
-to the deformable models by 3x to 5x over traditional CPU
-implementations.
+Our experimental results using commodity GPUs showed that our GPU
+implementations can speed up the existing HOG-based vehicle detection
+program tailored to the deformable models by 3x to 5x over traditional
+CPU implementations.
Given that this performance improvement is obtained from the entire
program runtime rather than particular algorithm parts of the program,
our contribution is useful and significant for real-world applications
@@ -27,8 +27,8 @@ \section{Conclusion}
Our conclusion is that GPUs are promising to meet the required
performance of vision-based object detection in the real world.
-In future work, we plan to complement this work with systemized
-coordinations of computations and I/O devices.
+In future work, we plan to complement this work with systematized
+coordination of computations and I/O devices.
Since real-world applications require camera sensors to obtain input
images while GPUs are compute devices off the host computer, the data
I/O latency could become a non-trivial bottleneck on the data bus.
View
Binary file not shown.
View
@@ -2,11 +2,11 @@ \section{Evaluation}
\label{sec:evaluation}
We now demonstrate performance improvements brought by our GPU
-implementations using the existing vehicle detection program
+implementations for the existing vehicle detection program
\cite{Niknejad12}.
-We further provide the details of performance comparisons among our GPU
-implementations and prior CPU implementations to discuss fundamental
-factors that allowed the GPU to outperform the CPU.
+We also discuss the details of performance comparisons among our GPU
+implementations and traditional CPU implementations identifying the
+fundamental factors that allowed the GPU to outperform the CPU.
\subsection{Experimental Setup}
\label{sec:setup}
@@ -35,6 +35,14 @@ \subsection{Experimental Results}
\end{center}
\end{figure}
+\begin{figure}[t]
+ \begin{center}
+ \includegraphics[width=\hsize]{fig/double_exe_time.eps}\\
+ \caption{computation times of the double precision floating point program.}
+ \label{fig:double_exe_time}
+ \end{center}
+\end{figure}
+
Fig.~\ref{fig:float_exe_time} shows the computation times of all variants
of the vehicle detection program configured to use the single precision
for floating operations.
@@ -60,8 +68,8 @@ \subsection{Experimental Results}
Since the vehicle detection program is compute-intensive as depicted
through Listing~\ref{lst:score} to \ref{lst:hog}, the operating
frequency is more dominating than the architectural benefit.
-This is a useful finding toward the future development of image
-processing with GPUs.
+This is a useful finding toward the future development of GPU-based
+image processing.
As a result, the best performance is obtained from such a setup that
uses the multicore implementation for the HOG calculation while using
@@ -74,14 +82,6 @@ \subsection{Experimental Results}
order-of-magnitude speed-up is reported for a particular part of the
program or the algorithm.
-\begin{figure}[t]
- \begin{center}
- \includegraphics[width=\hsize]{fig/double_exe_time.eps}\\
- \caption{computation times of the double precision floating point program.}
- \label{fig:double_exe_time}
- \end{center}
-\end{figure}
-
Fig. \ref{fig:double_exe_time} shows the computation times of all variants
of the vehicle detection problem configuired to use the double precision
for floating operations.
@@ -91,8 +91,9 @@ \subsection{Experimental Results}
as the generation of GPUs advances.
Another notable finding is that the TITAN GPU is slightly faster than
the K20Xm GPU for our vehicle detection program.
-Given that the TITAN GPU is a consumer price while the K20Xm is very
-expensive for supercomputing, we suggest that the vehicle detection
+This is due to a slightly higher operating frequency of the TITAN GPU.
+Since the TITAN GPU is a consumer market price while the K20Xm is a very
+expensive supercomputing device, we suggest that the vehicle detection
program uses the TITAN GPU for a better cost performance.
\begin{figure}[t]
@@ -118,19 +119,28 @@ \subsection{Experimental Results}
\begin{figure}[t]
\begin{center}
- %\includegraphics[width=\hsize]{fig/time_on_image_size.eps}\\
- ADD FIGURE HERE
- \caption{Impact of the block and thread shapes on computation times.}
- \label{fig:time_on_block_thread_shapes}
+ \includegraphics[width=0.6\hsize]{fig/breakdown_gpu.eps}\\
+ \caption{The breakdown of computation times of the GPU implementation.}
+ \label{fig:breakdown_gpu}
\end{center}
\end{figure}
+Fig. \ref{fig:breakdown_gpu} shows the breakdown of computation times of
+the GPU implementation that achieves the best performance for the single
+precision vehicle detection program.
+The memory copy overhead is often claimed to be a bottleneck in GPU
+programming \cite{Jablin_PLDI11}, but our analysis explains that it is
+not the case for the exhibited workload.
+This means that further advances of GPU technology will lead to faster
+implementations of the vehicle detection program, which encourages
+future work to use state-of-the-art GPUs.
\begin{figure}[t]
\begin{center}
%\includegraphics[width=\hsize]{fig/time_on_image_size.eps}\\
ADD FIGURE HERE
- \caption{The breakdown of computation times of the GPU implementation.}
- \label{fig:breakdown_gpu}
+ \caption{Impact of the block and thread shapes on computation times.}
+ \label{fig:time_on_block_thread_shapes}
\end{center}
\end{figure}
+
Oops, something went wrong.

0 comments on commit fc3e488

Please sign in to comment.