a

naishh · Jun 26, 2012 · cf58544 · cf58544
1 parent d37fe7e
commit cf58544
Show file tree

Hide file tree

Showing 9 changed files with 43 additions and 42 deletions.
diff --git a/ch0_abstract.tex b/ch0_abstract.tex
@@ -1,9 +1,10 @@
 \begin{abstract}
 % afkijken van conclusion
-We build an automatic system that adds semantics to a urban scene.  
-A 3D model was extracted by combining the \emph{FIT3D toolbox\cite{FIT3D}} combined with a skyline detection algorithm.
-The skyline detector extracts straight line segments from the upper edges of an
-image.  These are used to set the heights of the walls of the 3D model.
+We build an automatic system that adds semantics to an urban scene.  
+A 3D model was extracted by combining the \emph{FIT3D toolbox\cite{FIT3D}} with a skyline detection algorithm.
+The skyline detector extracts straight line segments from the upper edge of an
+image.  These are used to determine the wall heights of an extended 2D model extracted from
+\emph{Openstreetmap\cite{Openstreetmap}}. 
 Next, windows are extracted by two different methods.  The first method
 searches for connected horizontal and vertical edges which come from the window
 frame.  The second method works on rectified facades and consist of 1) the

diff --git a/ch1_intro.tex b/ch1_intro.tex
@@ -1,7 +1,7 @@
 \section{Introduction}
 When we humans look at an urban scene we immediately can tell which part
 represents a building, a tree, a door, a window or a parked car.
-Even if the scene suffers from high occlusion (a tree occluding the largest part
+Even if the scene suffers from high occlusion (e.g. a tree occluding the largest part
 of a building) or extreme perspective distortion (a building seen from the
 corners of your eye) we perform this task with a very high accuracy.
 For a computer system however, this task is far from trivial.\\
@@ -12,22 +12,21 @@ \section{Introduction}
 
 The most important reason of our excellent visual perception is that we combine 
 a series of depth cues \cite{psy} (which enables us to experience depth) with top down
-processing (which enables us to classify objects).\\
-
+processing (which enables us to classify objects).
 One of the most important depth cue is stereopsis.  We use two eyes and look at
 the same scene from slightly different angles.  This makes it possible to
 triangulate the distance to an object with a high degree of accuracy
 \cite{psy} \cite{hartley}. \\ 
 
-Objects are classified using a widely supported theory in psychology: top down processing \cite{anderson}
-If we want to perceive a window, we tap into the neurons that are activated according to
+Objects are classified using a widely supported theory in psychology: top down
+processing \cite{anderson}. If we want to perceive a window, we tap into the neurons that are activated according to
 our (generalized) description of a window.  I.e. because windows are often
 rectangular shaped and the color stands out we tap into the neurons that
 perceive straight parallel lines, orthogonal corners and intense color changes.
 These visual processes are extremely informative if we want to build a computer system
 that acts accordingly.  \\
 
-This thesis is about our work of an automatic system that adds semantics to a
+This thesis is about our work of an automatic system that adds semantics to an
 urban scene inspired on the human brain.  We use stereopsis to generate a 3D
 model of the building. This model is improved by detecting the building contour
 using edge features.  The 3D model is used to rectify a facade which makes
@@ -94,7 +93,7 @@ \subsection{Application examples}
 	structure explicit.  For a more accurate disambiguation, other types of
 	contextual information are desired.  The semantical interpretation of the
 	facade can provide this need.  In this context, window detection can be used
-	as a strong discriminator.
+	as a strong discriminator.\\
 
 We can conclude that semantic interpretation plays an important role in the
 interpretation of urban scenes and is applied in a wide range of domains.  
@@ -103,15 +102,16 @@ \subsection{Application examples}
 \subsection{Thesis outline}
 The outline of this thesis is as follows:\\ 
 
-We start with explaining basic computer vision techniques and the 3rd party
-software \emph{FIT3D toolbox} in Chapter 2.  These techniques are the driving
+We start with explaining basic computer vision techniques and the \emph{FIT3D toolbox} in Chapter 2.  
+These techniques are the driving
 force behind the algorithms used in both skyline detection and window detection.  
 In Chapter 3 we explain a novel application of skyline detection: the detection
-of building contours in urban scenes. Next we use this result in Chapter 4 to
-extract a 3D model of a building.
-In Chapter 5 we start a new topic: window detection. We propose a window
+of building contours in urban scenes. Next, we use this result to
+extract a 3D model of a building in Chapter 4.
+
+In Chapter 5 we start a new topic: window detection. First we propose a window
 detection method that operates on an unrectified facade. The second method uses
-a rectified facade. We discuss and compare two window alignment methods and two
+a rectified facade. We discuss and compare two window alignment and
 classification methods.  We conclude in Chapter 6 and we finish with additional
 results in the Appendices.\\
 

diff --git a/ch2_preliminaries.tex b/ch2_preliminaries.tex
@@ -3,7 +3,7 @@ \section{Preliminaries on Computer Vision}
 \label{sec:ch2}
 In this chapter we discuss the basic computer vision techniques that are used for
 the skyline detection, and window detection.  Furthermore we discuss 3rd
-party software, the \emph{FIT3D toolbox} \cite{FIT3D} which is used in 3D building extraction and facade rectification.
+party software, the \emph{FIT3D toolbox} \cite{FIT3D} which is used for 3D building extraction and facade rectification.
 
 \subsection{Hough transform}
 \label{sec:prelimHough}
@@ -31,20 +31,22 @@ \subsubsection{Theory}
 
 	This means that a point in $(x,y)$ space appears as a sinusoidal
 	curve in the Hough parameter $(r,\theta)$ space.  Furthermore a line in
-	$(x,y)$ space appears as a point in $(r,\theta)$ space. Let's see an example, the
-	following image is transformed into the space $(r,\theta)$.
+	$(x,y)$ space appears as a point in $(r,\theta)$ space.\\
+	Let's see an example, the image of Figure \ref{fig:HoughTransform_edge.eps} 
+	is transformed into the space $(r,\theta)$.
 
 	%todo introduce the accumulator array,
 	%todo transform gif to eps 
 	\fig{HoughTransform_edge.eps}{An input image, consisting of eight straight lines, for the Hough transform}{0.5}
 	\figsHor{HoughTransform_peaks}{HoughTransform_peaks1.eps}{HoughTransform_peaks2.eps}{Hough transform} {$(r, \theta)$ values}{$(r, \theta)$ accumulator array (quantized)}
+	\clearpage
 
 
 	 As you can see for every edge point 
 	 in Figure \ref{fig:HoughTransform_edge.eps} 
 	 a curve is generated in $(r,\theta)$ space in Figure 
 \ref{fig:HoughTransform_peaks1.eps}.
-	 On eight positions (white dots) the number of intersecting sinusoidal
+	 On eight positions (dots) the number of intersecting sinusoidal
 	 curves is high. These position correspond to the eight separate straight
 	 line segments in Figure \ref{fig:HoughTransform_edge.eps} .
 
@@ -53,18 +55,14 @@ \subsubsection{Theory}
 
 \subsubsection{Implementation}
 	The input of a Hough transform is a binary image. In our research it is the output of 
-	the skyline detector (\ref{sec:skylinedetection}). In the case of window
-	detection (\ref{sec:windowDetection}) it is the output of an edge image.\\
+	the skyline detector (Chapter \ref{sec:skylinedetection}). In the case of window
+	detection (Chapter \ref{sec:windowDetection}) it is the output of an edge image.\\
 
-	The Hough transform develops an accumulator array of a quantized parameter space $(r, \theta)$.
-
-	It loops through the binary image and for each positive value 
+	The Hough transform develops an accumulator array of a quantized parameter space $(r, \theta)$.  It loops through the binary image and for each positive value 
 	it generates all possible lines, quantized $(r, \theta)$ pairs, that intersect with this point.
 	For each candidate it increases a vote in the accumulator array.
 	Lines $(r, \theta)$ that receive a large amount of votes
-	i.e. the dots in Figure \ref{fig:HoughTransform_peaks1.eps} are the found straight lines in the $(x,y)$ space.
-
-	These positions are found by looking for local maxima in the accumulator array.
+	i.e. the dots in Figure \ref{fig:HoughTransform_peaks1.eps} are the found straight lines in the $(x,y)$ space.  These positions are found by looking for local maxima in the accumulator array.
 
 \subsubsection{$\theta$ constrained Hough transform}
 The accumulator array consist of two dimensions $r$ and $\theta$.
@@ -76,13 +74,11 @@ \subsubsection{$\theta$ constrained Hough transform}
 For example the skyline of a building will appear about horizontal. If we
 want to detect windows we would like to detect edges in the horizontal and vertical directions.
 This can easily achieved by adjusting the $\theta$ range.
-For example if one would detect lines in the horizontal direction of
-a photograph of a building taken by a user, $\theta = [-10..0..10]$.Although
-only $\theta = 0$ presents an exact horizontal line we broaden the interval
-because the user hardly ever holds the camera exactly orthogonal.
+For example if one would detect lines in the horizontal direction, $\theta = [-10..0..10]$. Intervals are used because in practise the lines
+often differ slightly from an exact horizontal line where $\theta = 0$.
 
 \subsubsection{MATLAB\cite{matlab} parameters}
-We used a standard MATLAB\cite{matlab} implementation of the Hough transform for straight lines.  This implementation comes with some interesting parameters:\\
+We used a standard MATLAB\cite{matlab} implementation of the Hough transform.  This implementation comes with some interesting parameters:\\
 
 	The \emph{MinimumLength} parameter specifies the minimum length that a line must have to be valid. This is especially interesting if we want to detect a large straight skyline or if we want to discard lines that are to small to form for example a window.\\
 
@@ -171,7 +167,7 @@ \subsection{Camera calibration}
 explained next.
 
 
-\subsection{FIT3D toolbox \ref{FIT3D}}
+\subsection{FIT3D toolbox \cite{FIT3D}}
 \label{sec:prelimFIT3D}
 %todo give FIT3D a proper intro
 The \emph{FIT3D toolbox} \ref{FIT3D} is used for several aims in this thesis.

diff --git a/ch5_windowDetection.tex b/ch5_windowDetection.tex
@@ -328,7 +328,7 @@ \subsubsection{3D plane based rectification}
 computational very expensive as each pixel needs to be projected. To keep the
 computational cost to a minimum we project only the necessary data. Since we
 are using Hough lines we project only the coordinates of the endpoints of the found Hough lines. 
-This is allowed because the projective transformation is we apply is a affine
+This is allowed because the projective transformation is we apply is an affine
 transformation which preserves the
 straightness of the lines \cite{linearalgebra}. Note that this means we apply the edge detection and
 Houghline extraction on the unrectified image.\\
@@ -734,7 +734,7 @@ \subsubsection{Basic window classification (based on line amount)}
 non-window areas: the window classification, we developed two different methods for this.
 
 Instead of classifying each block independently, we classify full rows and
-columns of blocks as window or non-window areas.  This approach results in a accurate
+columns of blocks as window or non-window areas.  This approach results in an accurate
 classification as it combines a full blockrow and blockcolumn as evidence for a singular
 window. 
 
@@ -1027,7 +1027,7 @@ \subsubsection{Improved window classification (based on shape of the histogram f
 window in Figure areas are false negatively classified, see the left side of
 Figure \ref{fig:w_Dirk4Trans_ImClassRect.eps}.  This is caused by a small area
 between the windows that is classified as a non-window area. This could be
-solved by by adding a minimum size constraint of a area to be threaded as a
+solved by by adding a minimum size constraint of an area to be threaded as a
 non-window area.  In this way small negatively classified areas cannot interrupt
 the adjacent windows.
 

diff --git a/commandsPre.tex b/commandsPre.tex
@@ -46,11 +46,11 @@
 	\begin{figure}[!ht]
 	\centering
 	\subfigure[#5]{
-		\includegraphics[width=6cm]{img/#2}
+		\includegraphics[width=5cm]{img/#2}
 		\label{fig:#2}
 	}
 	\subfigure[#6]{
-		\includegraphics[width=6cm]{img/#3}
+		\includegraphics[width=5cm]{img/#3}
 		\label{fig:#3}
 	}
 	\caption{#4}

diff --git a/header.tex b/header.tex
@@ -12,7 +12,7 @@
 %\subject		{subject comes here}
 %\keywords		{keywords come here}
 
-\title 			{\LARGE \sc{[DRAFT]Semantic annotation of urban scenes:}\\Skyline and window detection}
+\title 			{\LARGE \sc{Semantic annotation of urban scenes:}\\Skyline and window detection}
 %\title{\huge{Dynamic Programming For \\Extensive Form Games With \\Imperfect
 %Information}}
 

diff --git a/main.pdf b/main.pdf
diff --git a/references.bib b/references.bib
@@ -274,7 +274,9 @@ @Book{anderson
 }
 
 @MISC{kovesi, 
-	title = {P. D. Kovesi. MATLAB and Octave functions for computer vision and image processing. School of Computer Science \& Software Engineering, The University of Western Australia.  Available from: }
+	title = {P. D. Kovesi. MATLAB and Octave functions for computer vision and
+	image processing. School of Computer Science \& Software Engineering, The
+	University of Western Australia.  Available from: },
 	howpublished = "\url{http://www.csse.uwa.edu.au/~pk/research/matlabfns/}"
 }
 

diff --git a/todalooMain.txt b/todalooMain.txt
@@ -10,6 +10,8 @@ Hoe bereik ik dit, door simpelweg te DOEN, lukt dit niet:
 
 
 ---------------------------------------------------------------------------------
+ch1 
+	L and H patterns uitleggen en opzoeken
 
 ---------------------------------------------------------------------------------
 todos staan op 2 plekken!