<h1 id="Constant-window-benchmarking-on-the-Middlebury-2014-dataset">Constant window benchmarking on the Middlebury 2014 dataset</h1>
<h2 id="Introduction">Introduction</h2>
<p>The results of the experiment <a href="ALG_005_EXP_001-VIS.ipynb" target="_blank" rel="noopener">ALG_005_EXP_001-VIS</a> showed that constant-weighted (CW) support windows can significantly improve matching accuracy. However, as to the dimensions of such matrices, the experiment was not conclusive. In order to obtain a better understanding in this regard a larger dataset was thought to give a clearer picture.</p>
<h2 id="Abstract">Abstract</h2>
<p>After the introduction of constant weight windows (CW) to the pipeline, the algorithm was benchmarked on the 2014 version of the Middlebury Evaluation Framework. This was the first time this dataset was used in this project. Achieved improvement in average error throughout the dataset compared to the previous pipeline without CWs was 10% at the expense of a factor of 27 deterioration of time complexity.</p>
<h2 id="Relevant-theory">Relevant theory</h2>
<p>When it comes to the applied technique, this experiment&rsquo;s algorithm was not incremental, more precisely, it was identical to the one used in <a href="ALG_005_EXP_001-VIS.ipynb" target="_blank" rel="noopener">ALG_005_EXP_001-VIS</a>. However, a larger and newer dataset, Middlebury 2014 was used. Short discussion detailing this version of the Middlebury framework follows.</p>
<h3 id="Introduction-to-the-Middlebury-2014-Framework">Introduction to the Middlebury 2014 Framework</h3>
<p>The most recent version of the Middlebury Framework is the one of 2014 coupled with a new dataset. The new dataset was obtained&nbsp; using structured light sensors (Daniel Scharstein and Heiko Hirschm&uuml;ller, 2014) providing subpixel-accurate ground truth images for indoor scenes. Apart from meticulous optical setup and digital processing, manual rectification was also performed on the images (Scharstein <em>et al.</em>, 2014) to achieve higher accuracy. This&nbsp; iteration consists of a&nbsp; handful of images (33 coloured scenes) provided in three sizes, quarter, half and full (~6 megapixels). The dataset consists of&nbsp; 10 for training, 10 for evaluation (using the framework) and an additional 13 scenes. Deep learning approaches typically need much larger datasets to optimise their architecture, therefore the state of the art results are somewhat still dominated still&nbsp; by non-end-to-end algorithms (Scharstein and Hirschm&uuml;ller, 2014; Poggi <em>et al.</em>, 2020; Zhou, Meng and Cheng, 2020).</p>
<h3 id="Metrics">Metrics</h3>
<p>Its latest online evaluation framework (version 3)&nbsp; provides a means for both dense and sparse benchmarking. Apart from &ldquo;universal&rdquo; metrics (average error, root mean squared error, runtime, runtime/megapixel), &ldquo;bad n&rdquo; metrics are used as well, which is the percentage of bad pixels whose absolute error is greater than a threshold value of &ldquo;n&rdquo;. Another dimension for evaluation provided is&nbsp; the option to include (&ldquo;all&rdquo;) or not (&ldquo;nonocc&rdquo;) occluded regions in the calculations. Some of the relevant&nbsp; changes to the previous version (version 2)&nbsp; are the deprecation of error calculated near discontinuities and the representation of occlusions.</p>
<h3 id="File-format">File format</h3>
<p>While one might argue that the Middlebury 2014 framework is dated, it was thought to be relevant to discuss this aspect of this dataset as, based on personal experience, accurate documentation was difficult to be obtained in this regard.</p>
<p>The disparity files are stored in PFM format (Scharstein and Hirschm&uuml;ller, 2014). This is a 24 bit image format enabling its users to represent intensities with single precision floating point accuracy (Paul Debevec, no date). Disparities are not scaled, as opposed to previous versions of the Middlebury datasets. For instance, for Middlebury 2003, the disparities provided at quarter size are in &ldquo;png&rdquo; format upscaled by a factor of 4 each intensity level&nbsp; representing a quarter disparity. For the latest dataset it was not necessary due to the new format.</p>
<p>Additionally, 0, the representation of occluded pixels is&nbsp; replaced by infinity and is labelled as &ldquo;invalid&rdquo; pixels if falsely predicted. When benchmarking, invalid pixels are handled separately, they are not included in the calculation of any of the aforementioned metrics. For instance, if one has only one correctly estimated disparity and the rest of the map is &ldquo;invalid&rdquo; (values of infinity), that would be regarded as 100% accuracy. It is also important to note that the map for &ldquo;unknown&rdquo; and &ldquo;occluded&rdquo; pixels are provided in one &ldquo;png&rdquo; file, where a pixel intensity of 0 encodes the first and 128 the second scenario.</p>
<h2 id="Method">Method</h2>
<p>The scenes used for this experiment&nbsp; were the part of the &ldquo;trainingQ&rdquo;&nbsp; set provided at the Middlebury Evaluation site. The &ldquo;Q&rdquo; stands for quarter resolution images. Part of this package, additional images are &nbsp;provided making the number of scenes totalling at 15. One of them is &ldquo;Teddy&rdquo;, an improved version of the scene present in the 2003 dataset and the additional scenes (ArtL, MotorcycleE, PianoL, PlaytableP) . The additional scenes all have some sort of discrepancy in terms of lighting or exposure between the right and left views &nbsp;posing a more difficult challenge to stereo correspondence algorithms.</p>
<p>Similarly to the first experiment on this pipeline (ALG_005_EXP_001-PatchMatch-MacLean_et_al-Numba) , different sizes of &ldquo;patches&rdquo; (support windows) with uniform weight (CW=1) were used. The height of the patches ranged from 1 to 15 while the width ranged from 1 to 7.&nbsp; While keeping gap and egap values at a constant respective -20 and -1 values, match values were tested ranging from 10 &nbsp;to 110 with a step of 20.</p>
<p>Please note that separete notebook was not created for the code of this experiment, please refer to "runalg" python files in the ".\benchmarking\MiddEval\MiddEval3\" folder for further details.</p>
<h2 id="Results-and-discussion">Results and discussion</h2>
<p>The best results were achieved with vertically biased patches (9*3, 11*3, 13*3&hellip;etc.). The best average error rate with a threshold of 4 (1 disparity error) was 46.01% with&nbsp; match value of 30. This tendency, when it comes to support window sizes, was observable throughout the dataset (including more challenging scenes mentioned above as well). Most of the scenes performed the best when the algorithm was initialized with match value 30. However, this was not the case when it comes to ArtL, MotorcycleE and PianoL. The suffix of these images show that there was a difference in terms of lighting (L) or exposure (E) between the left and right images. These scenes performed better when match values were at the maximum of this experiment (110). The reason behind this was thought to be the greater difference between left and right pixel intensities thus resulting in a higher tolerance for differences. It is also important to note that when the window&rsquo;s dimension was set to 1x1, the pipeline still performed considerably well (56.45% mean error at threshold 4). This was the first experiment conducted on this dataset, therefore it was regarded as the baseline result. The improvement resulting from introducing CWs into the pipeline can be interpreted as the difference between the performance with 1x1 support window and the best performing one (11x3) which is approximately 10%. While it &nbsp;might be regarded as a significant improvement, in reality, it has come at the expense of the time complexity of the cost calculation which was 27 times higher. In order to improve on the algorithm it would be reasonable to experiment on more advanced support-window methods, additionally, to modify the matching cost calculation so that it would consider image gradients as well. This was thought to&nbsp; make the pipeline more robust to lighting and exposure variations which is closer to real world scenarios.</p>
<h2 id="Conclusion">Conclusion</h2>
<p>The experiment showed that support weight, even the most primitive ones, can improve on the accuracy of stereo correspondence. As this was that the 2014 version of the Middlebury datasets was used, a baseline performance was established. Achieved improvement in average error throughout the dataset compared to the previous pipeline was 10% at the expense of a factor of 27 deterioration of time complexity.</p>
<h2 id="References">References</h2>
<ol>
<li>Daniel Scharstein and Heiko Hirschm&uuml;ller (2014) <em>Submit | Middlebury Stereo Evaluation - Version 3</em>. Available at: http://vision.middlebury.edu/stereo/submit3/ (Accessed: 17 October 2019).</li>
<li>Farid, H. and Simoncelli, E. P. (2004) &lsquo;Differentiation of discrete multidimensional signals&rsquo;, <em>IEEE Transactions on Image Processing</em>, 13(4), pp. 496&ndash;508. doi: 10.1109/TIP.2004.823819.</li>
<li>Paul Debevec (no date) <em>PFM Format Documentation</em>, <em>Paul Debevec&rsquo;s Website</em>. Available at: http://www.pauldebevec.com/Research/HDR/PFM/ (Accessed: 29 July 2020).</li>
<li>Poggi, M. <em>et al.</em> (2020) &lsquo;On the Synergies between Machine Learning and Stereo: a Survey&rsquo;. Available at: http://arxiv.org/abs/2004.08566 (Accessed: 29 May 2020).</li>
<li>Scharstein, D. <em>et al.</em> (2014) &lsquo;High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth&rsquo;, in <em>Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)</em>. Springer Verlag, pp. 31&ndash;42. doi: 10.1007/978-3-319-11752-2_3.</li>
<li>Scharstein, D. and Hirschm&uuml;ller, H. (2014) <em>Middlebury Stereo Evaluation - Version 3 </em>. Available at: http://vision.middlebury.edu/stereo/submit3/ (Accessed: 17 October 2019).</li>
<li>skimage (2020) <em>Module: morphology &mdash; skimage v0.17.2 docs</em>, <em>scikit-image website</em>. Available at: https://scikit-image.org/docs/stable/api/skimage.morphology.html (Accessed: 28 July 2020).</li>
<li>Zhou, K., Meng, X. and Cheng, B. (2020) &lsquo;Review of Stereo Matching Algorithms Based on Deep Learning&rsquo;, <em>Computational Intelligence and Neuroscience</em>. Edited by C. Y&aacute;&ntilde;ez-M&aacute;rquez. Hindawi, 2020, p. 8562323. doi: 10.1155/2020/8562323.</li>
</ol>

In [1]:
import pandas as pd
import ipywidgets as widgets
import numpy as np
import sys
import os

sys.path.append(os.path.join("..", ".."))
import glob
import cv2

import plotly.graph_objs as go
import plotly.express as px
from ipywidgets import HBox, VBox, Button

from components.utils import plotly_helpers as ph
from components.utils import utils as u

In [2]:

available_metrix = ['abs_error',
       'mse', 'avg', 'eucledian', 'bad1', 'bad2', 'bad4', 'bad8']

metrics_selector = widgets.Dropdown(
    options=[(m,m) for m in available_metrix],
    description='Metric:',
    value="bad4"
)


nonoccluded = widgets.Dropdown(
    options=[("yes", False), ("No", True)],
    description='Nonoccluded:'
)


### Please select metrics and whether occlusions are counted as errors

In [3]:
VBox([metrics_selector, nonoccluded])

VBox(children=(Dropdown(description='Metric:', index=6, options=(('abs_error', 'abs_error'), ('mse', 'mse'), (…

### Loading the data and building the plots

In [4]:

selected_file = os.path.join("..","..", "benchmarking", "MiddEval", "custom_log", "bm_benchmarking.csv")
#selected_file = "./fixed_csv2.csv"
df = ph.load_n_clean(selected_file)

##Filtering to selected occlusion parameter

df = df[df["are_occlusions_errors"]==nonoccluded.value]
df.sort_values(by=["scene", "match", "h", "w"], inplace=True)

number_of_samples = df.shape[0]

In [5]:
### Dashboard 1


from ipywidgets import Image, Layout

img_widget = Image(value=df["loaded_imgs"].iloc[0], 
                   layout=Layout(height='375px', width='450px'))

fig_a = ph.get_figure_widget (df, "scene", metrics_selector.value, 
                           "Scene w.r.t."+metrics_selector.value)
fig_b = ph.get_figure_widget (df, "match", "kernel_size", "Kernel sizes w.r.t. match values")


figs = [fig_a, fig_b]
ph.bind_hover_function(figs, img_widget, df)
ph.bind_brush_function(figs, df)

button = ph.get_reset_brush_button(figs)
dashboard1 = VBox([button, fig_a,
                  HBox([img_widget, fig_b])])


### Dashboard 2

df.sort_values(by=["experiment_id"])
traced_fig_1, dfs_1 = ph.get_figure_widget_traced(df, "scene", metrics_selector.value, "experiment_id")

traced_fig_widget_1 = go.FigureWidget(traced_fig_1)



traced_fig_1_imw_1 = Image(value=df["loaded_imgs"].iloc[0], 
                   layout=Layout(height='375px', width='450px'))
traced_fig_1_imw_2 = Image(value=df["loaded_gts"].iloc[0], 
                   layout=Layout(height='375px', width='450px'))

#figs, img_widget, selected_scene_df
ph.bind_hover_function2([traced_fig_widget_1], traced_fig_1_imw_1, dfs_1, img_widget_groundtruth=traced_fig_1_imw_2)


turn_the_lights_on = ph.get_dropdown_widget(["On", "Off"], label="Turn plots:", values = [True, False])

ph.bind_dropdown_switch_traces_fn(turn_the_lights_on, traced_fig_widget_1)

dashboard2 = VBox([turn_the_lights_on, traced_fig_widget_1, HBox([traced_fig_1_imw_1,traced_fig_1_imw_2])])


### Dashboard 3


traced_fig_2, dfs_2 = ph.get_figure_widget_traced(df, "experiment_id", metrics_selector.value, "scene")

traced_fig_widget_2 = go.FigureWidget(traced_fig_2)

traced_fig_2_imw_1 = Image(value=df["loaded_imgs"].iloc[0], 
                   layout=Layout(height='375px', width='450px'))
traced_fig_2_imw_2 = Image(value=df["loaded_gts"].iloc[0], 
                   layout=Layout(height='375px', width='450px'))



#figs, img_widget, selected_scene_df
ph.bind_hover_function2([traced_fig_widget_2], traced_fig_2_imw_1, dfs_2, img_widget_groundtruth=traced_fig_2_imw_2)

turn_the_lights_on_2 = ph.get_dropdown_widget(["On", "Off"], label="Turn plots:", values = [True, False])

ph.bind_dropdown_switch_traces_fn(turn_the_lights_on_2, traced_fig_widget_2)


dashboard3 = VBox([turn_the_lights_on_2, traced_fig_widget_2, HBox([traced_fig_2_imw_1,traced_fig_2_imw_2])])


### To aid interaction with the plots, the best results in a tabular form are displayed below

In [6]:
df.pivot_table(index = ["experiment_id", "kernel_size", "match"], values = "bad4", aggfunc=np.mean).sort_values(by="bad4").head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,bad4
experiment_id,kernel_size,match,Unnamed: 3_level_1
bm_30_9x3,9x3,30,0.460599
bm_30_11x3,11x3,30,0.460722
bm_30_13x3,13x3,30,0.460871
bm_30_15x3,15x3,30,0.460989
bm_30_9x5,9x5,30,0.461394
bm_30_7x3,7x3,30,0.461725
bm_30_13x5,13x5,30,0.461955
bm_30_11x5,11x5,30,0.462402
bm_30_7x5,7x5,30,0.462667
bm_30_15x5,15x5,30,0.462833


In [7]:
df.pivot_table(index = "experiment_id", columns="scene", values = "bad4", aggfunc=np.min).head(10)

scene,Adirondack,ArtL,Jadeplant,Motorcycle,MotorcycleE,Piano,PianoL,Pipes,Playroom,Playtable,PlaytableP,Recycle,Shelves,Teddy,Vintage
experiment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
bm_10_11x3,0.466294,0.695609,0.632247,0.235984,0.883453,0.392349,0.836646,0.313056,0.567141,0.367424,0.294489,0.350783,0.506069,0.238666,0.67068
bm_10_11x5,0.46398,0.697558,0.631817,0.246167,0.882871,0.39681,0.840726,0.313589,0.569975,0.363423,0.288926,0.354387,0.506464,0.233957,0.669932
bm_10_13x3,0.464741,0.696412,0.634718,0.247858,0.883076,0.393088,0.837339,0.317979,0.570226,0.362274,0.294714,0.354783,0.505422,0.250613,0.66725
bm_10_13x5,0.460656,0.698069,0.633164,0.255363,0.883065,0.399628,0.839492,0.31369,0.574668,0.357284,0.290419,0.356774,0.504263,0.244272,0.667215
bm_10_13x7,0.459162,0.700435,0.636706,0.263075,0.883001,0.404348,0.838654,0.325699,0.5865,0.36138,0.293895,0.360921,0.49832,0.239333,0.665627
bm_10_15x3,0.463806,0.694504,0.633597,0.253987,0.88357,0.398597,0.839259,0.319395,0.572803,0.362219,0.299415,0.358233,0.508066,0.256608,0.664506
bm_10_15x5,0.461326,0.697965,0.633062,0.26381,0.883226,0.403751,0.839716,0.320491,0.577588,0.357715,0.295789,0.359334,0.507051,0.250443,0.665899
bm_10_15x7,0.458455,0.697579,0.639273,0.270984,0.883112,0.408157,0.839433,0.329819,0.587927,0.361941,0.300306,0.363519,0.50019,0.246523,0.663611
bm_10_15x9,0.456875,0.700924,0.642143,0.2781,0.883012,0.412855,0.83957,0.340307,0.597659,0.366801,0.306519,0.367597,0.498737,0.251444,0.664412
bm_10_1x1,0.578891,0.711701,0.644057,0.338822,0.882435,0.478058,0.858989,0.392435,0.625311,0.482142,0.431799,0.428745,0.567097,0.297369,0.751031


### Dashboard 1: Scene w.r.t. {metric} (selection plot)
<ol>
    <li>The following figure allows to use the "lasso" tool as a tool of selection.</li>
    <li>As a result, the relevant datapoints and their corresponding values in the figure in the bottom right corner will be highlighted.</li>
    <li>Pressing the "clear selection" button will reset the figure.</li>
    <li> Additionally, if a data point is hovered, the corresponding disparity output value will be displayed in the bottom right corner.</li>

</ol>

In [8]:
dashboard1

VBox(children=(Button(description='clear selection', style=ButtonStyle()), FigureWidget({
    'data': [{'custo…

### Dashboard 2: Scenes w.r.t. {metric} with color coded "epochs"
An "epoch" in this context means an experiment with the same settings evaluated across every scene in the Middlebury 2004 training dataset.<br>
<ol>
    <li>The following figure allows to turn all the plots on and off</li>
    <li>Additionally, their visibiliy can also be handled by interacting with their legend entries on the right side of the plot.
    </li>
    <li> Therefore custom comparison can be made between different scenes, kernel sizes and match values. </li>
    <li> The figure in the bottom left corner shows the corresponding disparity map. </li>
    <li> The figure in the bottom right corner shows the corresponding ground truth disparity map. </li>
</ol>

In [9]:
dashboard2

VBox(children=(Dropdown(description='Turn plots:', options=(('On', True), ('Off', False)), value=True), Figure…

### Dashboard 3: "Epoch" w.r.t. {metric} with color coded scenes
An "epoch" in this context means an experiment with the same settings evaluated across every scene in the Middlebury 2004 training dataset.<br>
<ol>
    <li>The following figure allows to turn all the plots on and off</li>
    <li>Additionally, their visibiliy can also be handled by interacting with their legend entries on the right side of the plot.
    </li>
    <li> Therefore custom comparison can be made between different scenes, kernel sizes and match values. </li>
    <li> The figure in the bottom left corner shows the corresponding disparity map. </li>
    <li> The figure in the bottom right corner shows the corresponding ground truth disparity map. </li>
</ol>

In [10]:
dashboard3

VBox(children=(Dropdown(description='Turn plots:', options=(('On', True), ('Off', False)), value=True), Figure…