Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b66d49b
commit 883936a
Showing
13 changed files
with
677 additions
and
113 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,305 @@ | ||
Code Walk-through | ||
================= | ||
|
||
The goal of this page is to briefly introduce you to our **raynet** library. | ||
This page is mainly focused for those of you that want to have a better | ||
understanding of the main modules implemented in our codebase in order to help | ||
you familiarize with it. | ||
|
||
## Using different datasets as input | ||
|
||
We want to be able to deal with multiple datasets however, not all of them can | ||
be parsed using the same format. To this end, we created the | ||
[Dataset](https://github.com/paschalidoud/raynet/blob/deb209064039596c321d7ddd4cb4a210b0e8a5d8/raynet/common/dataset.py#L8) | ||
class that handles various datasets in a generic manner. In principle, a dataset is defined | ||
as a collection of scenes, however different datasets are represented with | ||
different folder conventions. Until now, we have tested our implementation on | ||
two challenging datasets, | ||
|
||
* [Aerial dataset](https://www.sciencedirect.com/science/article/pii/S0924271614002354) | ||
* [DTU Multi-view stereo benchmark](https://link.springer.com/article/10.1007/s11263-016-0902-9) | ||
|
||
For example, the **Aerial Dataset** is represented by a directory that contains | ||
one directory for every scene. Every scene is represented by a directory with | ||
two inner directories containing the views and the camera poses. On the | ||
contrary, the **DTU Dataset** is represented by a directory that contains three | ||
subdirectories, one for the camera poses, one the raw images of every scene and | ||
one for the ground-truth data of every scene. To this end, we created two | ||
wrapper classes to the `Dataset` class to handle the corresponding dataset | ||
types: `RestrepoDataset` and the `DTUDataset`. In order to create an instance | ||
of a of `Dataset` class it simply suffices to specify two arguments, | ||
|
||
* **dataset_directory**: The path to the folder containing the | ||
dataset. | ||
* **select_neighbors_based_on**: This argument is used to control how | ||
neighbouring views are selected. We provide two options based either | ||
on their geometrical distance or their order in the file system. They can be | ||
selected with `distance` and `filesystem` respectively. | ||
|
||
For instance one can create a `RestrepoDataset` object just by writing | ||
|
||
```python | ||
In [1]: from raynet.common.dataset import RestrepoDataset | ||
In [2]: dataset = RestrepoDataset( | ||
"/path/to/folder/containing/the/probabilistic_reconstruction_data_downsampled/", | ||
"distance" | ||
) | ||
``` | ||
|
||
Every dataset is defined as a set of scenes and every scene can be specified | ||
with a unique scene index. Therefore, the API for the **Dataset** class is | ||
|
||
```python | ||
get_scene(scene_idx) | ||
``` | ||
|
||
and it creates a | ||
[Scene](https://github.com/paschalidoud/raynet/blob/deb209064039596c321d7ddd4cb4a210b0e8a5d8/raynet/common/scene.py#L2) | ||
object. For the **DTU Dataset** we use the provided scene indices, while for the | ||
**Aerial Dataset** we map the scenes to indices based on their alphabetical | ||
order. | ||
|
||
A **Scene** is defined as a collection of raw images, camera poses, | ||
ground-truth data as well as a bounding box that specifies the borders of the | ||
scene. However, similar to the datasets, not all scenes are represented using | ||
the same format. Therefore, we implement two wrappers, one for the scenes | ||
following the format of the *Aerial Dataset* and one for the scenes following | ||
the format of the *DTU Dataset*. Their API is the following, | ||
|
||
* `get_image(i)`: Returns the \(i^{th}\) image of the current scene. | ||
* `get_images()`: Returns a list of | ||
[Image](https://github.com/paschalidoud/raynet/blob/deb209064039596c321d7ddd4cb4a210b0e8a5d8/raynet/common/image.py#L10) | ||
objects one for every image of the current scene. | ||
* `get_random_image()`: Returns an `Image` object for a random image of the | ||
current scene. | ||
* `get_image_with_neighbors(i, neighbors)`: Returns a list of `Image` objects, | ||
where the first is the \(i^{th}\) image and the rest are the neighbouring | ||
views to this image. The neighbouring views are selected based on the | ||
`select_neighbors_based_on` argument analysed before. | ||
* `get_depth_for_pixel(i, y, x)`: Returns the ground-truth depth value of the | ||
\((x, y)\) pixel of the \(i^{th}\) image. | ||
* `get_depth_map(i)`: Returns the ground-truth depth map for the \(i^{th}\) image. | ||
* `get_depthmaps()`: Returns a list with numpy arrays containing the | ||
corresponding depth map for every image in the current scene. | ||
* `get_pointcloud()`: Returns a | ||
[Pointcloud](https://github.com/paschalidoud/raynet/blob/deb209064039596c321d7ddd4cb4a210b0e8a5d8/raynet/pointcloud.py#L14) | ||
object containing the ground-truth point cloud of the current scene. | ||
|
||
In case you want to use a dataset that follows a different format, you need to | ||
implement a wrapper on the *Dataset* and on the *Scene* class based on your | ||
requirements. | ||
|
||
## Training different networks | ||
|
||
Now that we have analysed how one can use different datasets as inputs it is | ||
also worth mentioning how one can train different networks to perform the 3D | ||
reconstruction task. We have built in various architectures that can be used to | ||
extract similarity features between patches from different views. Part of the | ||
code that defines those architectures is shown below. The full code can be | ||
found | ||
[here](https://github.com/paschalidoud/raynet/blob/deb209064039596c321d7ddd4cb4a210b0e8a5d8/raynet/models.py#L1). | ||
|
||
* **simple_cnn**: Each layer comprises convolution, spatial batch normalization | ||
and a ReLU non-linearity. We repeat this schemes 5 times but we remove the ReLU | ||
from the last layer in order to retain information encoded both in the negative | ||
and positive range. The receptive field of this architecture is \(11 \times | ||
11\). | ||
|
||
```python | ||
common_params = dict( | ||
filters=32, | ||
kernel_size=3 | ||
) | ||
|
||
Sequential([ | ||
Conv2D(input_shape=input_shape, **common_params), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D(**common_params), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D(**common_params), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D(**common_params), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D(**common_params), | ||
BatchNormalization() | ||
]) | ||
``` | ||
|
||
* **simple_cnn_ln**: This architecture is the same as the above, with the only | ||
difference that we have replaced the spatial batch normalization with layer | ||
normalization. The receptive field of this architecture is \(11 \times 11\). | ||
|
||
|
||
```python | ||
common_params = dict( | ||
filters=32, | ||
kernel_size=3, | ||
) | ||
Sequential([ | ||
Conv2D(input_shape=input_shape, **common_params), | ||
LayerNormalization(), | ||
Activation("relu"), | ||
Conv2D(**common_params), | ||
LayerNormalization(), | ||
Activation("relu"), | ||
Conv2D(**common_params), | ||
LayerNormalization(), | ||
Activation("relu"), | ||
Conv2D(**common_params), | ||
LayerNormalization(), | ||
Activation("relu"), | ||
Conv2D(**common_params), | ||
LayerNormalization() | ||
]) | ||
``` | ||
* **dilated_cnn_receptive_field_25**: For this architecture we also utilize | ||
dilated convolutional layers in order to be able to increase the receptive | ||
field without increasing the number of parameters. Again we employ RELU | ||
non-linearity and we remove it from the last layer. The receptive field of this | ||
architecture is \(25 \times 25\). | ||
|
||
```python | ||
Sequential([ | ||
Conv2D( | ||
filters=32, | ||
kernel_size=5, | ||
input_shape=input_shape, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=5, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=5, | ||
kernel_regularizer=kernel_regularizer, | ||
dilation_rate=2 | ||
), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=3, | ||
kernel_regularizer=kernel_regularizer, | ||
), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=3, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=3, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization(), | ||
Activation("relu"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=3, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization() | ||
]) | ||
``` | ||
|
||
* **dilated_cnn_receptive_field_25_with_tanh**: This architecture is the same | ||
as the above with the only difference that we have replaced the RELU | ||
non-linearities with tanh non-linearities. Again the receptive field is \(25 \times 25 \). | ||
|
||
```python | ||
Sequential([ | ||
Conv2D( | ||
filters=32, | ||
kernel_size=5, | ||
input_shape=input_shape, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization(), | ||
Activation("tanh"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=5, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization(), | ||
Activation("tanh"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=5, | ||
kernel_regularizer=kernel_regularizer, | ||
dilation_rate=2 | ||
), | ||
BatchNormalization(), | ||
Activation("tanh"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=3, | ||
kernel_regularizer=kernel_regularizer, | ||
), | ||
BatchNormalization(), | ||
Activation("tanh"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=3, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization(), | ||
Activation("tanh"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=3, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization(), | ||
Activation("tanh"), | ||
Conv2D( | ||
filters=32, | ||
kernel_size=3, | ||
kernel_regularizer=kernel_regularizer | ||
), | ||
BatchNormalization() | ||
]) | ||
``` | ||
|
||
## Inferring 3D Reconstructions | ||
|
||
We provide three factories that can be used to test a previously trained | ||
models. The `multi_view_cnn` factory can be used to test a Multi-View CNN | ||
model, (namely without the MRF) and estimates discretized depth maps at | ||
uniformly sampled depth hypotheses. Similar, the `multi_view_cnn_voxel_space` | ||
factory is the same with the `multi_view_cnn` factory, with the only difference | ||
that it predicts discretized depths on the voxel grid, defined using the | ||
bounding box of the scene. Finally, the `raynet` factory can be used to infer | ||
the 3D Model of a scene using our end-to-end trainable model. All factories | ||
share the same API, | ||
|
||
```python | ||
forward_pass(scene, images_range) | ||
``` | ||
|
||
**Arguments** | ||
|
||
* scene: A [Scene](https://github.com/paschalidoud/raynet/blob/deb209064039596c321d7ddd4cb4a210b0e8a5d8/raynet/common/scene.py#L2) | ||
that specifies the scene to be processed | ||
* images_range: A tuple that specifies the indices of the views of the scene to | ||
be used for the reconstruction | ||
|
||
**Returns** | ||
|
||
Given a `Scene` object and an image range that holds the views to be used for | ||
the reconstruction, we predict a corresponding depth-map for every view. |
Oops, something went wrong.