# 15.2 - Exploration of ISPRS Detection Votes Dataset

In this notebook, the *ISPRSDetectionVotesDataset* class is explored with the prepared 3D point cloud patches that are made from 100.000 points each. **The notebook must be located in a folder called "ISPRS", which in turn is in the folder "votenet" (that resulted by cloning the GitHub repository of VoteNet)**. The "ISPRS" folder must contain the files "ISPRS_detection_dataset.py", "ISPRS_utils.py", "model_util_ISPRS.py", and "pc_util.py".

The **ISPRSDetectionVotesDataset** class (defined in "ISPRS_detection_dataset.py") as well as the other Python files in this directory are based on the SUN RGB dataset class provided by VoteNet (and as thus still contain references to the original implementation). Most of the relevant original code that was replaced is left in the file on purpose to better see the changes that were made. The respective code lines are commented out. (Some minor changes, however, were done directly in the code and might not be as easily recognizable.) All the different dataset classes of VoteNet (Sun RGB, ScanNet) hide the specifics of each dataset and provide a uniform way for the neural network to successively request the data items. 

To each dataset class belongs also a configuration class, here the **ISPRSDatasetConfig** (defined in "model_util_ISPRS.py"), in which the dataset specific information like the number of classes, number of heading (orientation) bins, mean oriented bounding box sizes are stored, and in which functions for different transformations are contained. Such transformations are: size to class, class to size and residual, angle to class, class to angle and residual, and parameters to oriented bounding box. (VoteNet uses a combination of class label and residual values to train and predict bounding box sizes and their orientation angle. This is probably because it is more robust to regress small residual values combined with a prediction of a class, e.g. for the orientation angle, compared to regressing a large value directly. With 12 heading bins (classes), each class represents a range of 15° for the angles (like class 2 represents angles between 30° and 45°), and the regressed angle is then only between 0° and 15° (or -7.5° and 7.5° if the class represents an angle of 37.5°). Otherwise the regressed angles would be between 0° and 180°. Be reminded that bounding boxed are symmetric and that a bounding box that is rotated by 180° is the same as the original one.) The information needed for the configuration class (like the mean size of the bounding boxes) are dependent on how the data was prepared and stored, and is collected during data preparation, respectively.)

**Disclaimer: Please be aware that the exercise on VoteNet as well as the provided files (like the ones mentioned above) are the result of a rather quick hack and are probably not free of errors.**

# ISPRS Detection Votes Dataset class

First, we take a look at the ISPRSDetectionVotesDataset class. Instances of this class are used to provide the input data to the network for training and prediction. Later in this notebook, we also explore the files that the ISPRSDetectionVotesDataset class takes as input, and which it converts to the input used by the network.

As you can see from the class definition of ISPRSDetectionVotesDataset ("ISPRS_detection_dataset.py"), the class is derived from the PyTorch class Dataset. Looking at PyTorch's documentation on the subject (https://pytorch.org/docs/stable/data.html), you can read that there are two ways to design dataset classes. The first is a map-style dataset and the other an iterable-style dataset. The data items in a map-style dataset can be accessed with the index operator '[ ]' by providing the index number of the item to be retrieved. For this, the class needs to implement the *\_\_getitem\_\_()* and *\_\_len\_\_()* methods or protocols. The advantage is that the number of items in the Dataset object are known and that they can be accessed randomly. The ISPRSDetectionVotesDataset class is a map-style dataset. An iterable-style dataset is defined as a subclass of the IterableDataset class and must implement the *\_\_iter\_\_()* method or protocol. An iterable-style dataset object can be iterated without using an index in a for-loop. And this dataset style is typically favored when the items are generated on-the-fly, or when the items cannot be efficiently accessed randomly. 

Both dataset styles can be used in the DataLoader class, which is the core utility class for loading data in PyTorch, which, e.g., takes care of generating batches.

An instance of the ISPRSDetectionVotesDataset class can be constructed as any other object as shown in the following code cell. The parameters for initialization of the object are still from the SunRGB dataset class and do not all make sense for the ISPRS dataset. There is currently no data augmentation implemented in the class, the data has no color information, and there are no different versions of the dataset. The respective Boolean input arguments should therefore be False, so that the code from the Sun RGB implementation does not do anything incorrect. The only interesting argument is the number of points for each data item, which is 100.000 per patch. (However, 100.000 is already the default value for the number of points.)

The *len()* method of the dataset class returns the number of data items, which is equivalent to the number of prepared patches.

In [1]:
from ISPRS_detection_dataset import ISPRSDetectionVotesDataset

isprs_dataset = ISPRSDetectionVotesDataset(use_height=True, num_points=100000)

len(isprs_dataset)

672

You could now use the dataset object to iterate over the items. Printing them, however, would result in long outputs. Therefore, the loop is stopped after one iteration with the break statement. (Better not remove the break statement as printing all data from the dataset might crash the notebook or result in long response times.) And the print function is commented out. Uncomment it and check the output.

In [2]:
for item in isprs_dataset:
    #print(item)
    break

As mentioned above, a single data item can be accessed with the index operator ([ ]) from the dataset object.

In [3]:
i100 = isprs_dataset[100]

The object that is returned is of type dictionary, which means the object stores key-value pairs.

In [4]:
type(i100)

dict

The following loop iterates over all keys of the dictionary of the data item and prints the keys. As you can see, VoteNet expects quite a lot of different inputs.

In [5]:
for key in i100:
    print(key)

point_clouds
center_label
heading_class_label
heading_residual_label
size_class_label
size_residual_label
sem_cls_label
box_label_mask
vote_label
vote_label_mask
scan_idx
max_gt_bboxes
patch_center


Next, we take a look at some of the data values provided in the data items.

# 3D Point Cloud

The data stored with the key 'point_clouds' is the input point cloud as a tensor of shape (N, 3+C) or (100.000, 4) that contains the x,y,z-coordinates of the N points as well as an intensity as its only input feature. (C denotes the number of feature channels. Here, we have one feature channel, the intensity, so C is equal to one.)

In [6]:
i100 = isprs_dataset[100]

i100['point_clouds'].shape

(100000, 4)

Be reminded that the point coordinates of each point cloud patch are translated, so that the center point of the patch is located in the origin of the (local) coordinate system. This translation is performed within the dataset class.

In [7]:
i100['point_clouds']

array([[ 29.037,  35.726,   2.804,  16.144],
       [ 11.471,  35.335,   2.178,  15.518],
       [-50.435,  -9.31 ,  -1.7  ,  11.64 ],
       ...,
       [-27.439,  42.479,   4.882,  18.222],
       [ 23.005,  50.7  ,   2.81 ,  16.15 ],
       [ 27.813,  33.252,   2.666,  16.006]], dtype=float32)

# Oriented Bounding Boxes

**Center**

For an input point cloud patch, the network predicts a maximum of 80 oriented bounding boxes. You can find this value stored in the variable MAX_NUM_OBJ in the "ISPRS_detection_dataset.py" Python file. For training, the value of 'center_label' stores the x,y,z-coordinates of the ground truth center points of the oriented bounding boxes, or the vector (0,0,0) for non-existing bounding boxes (if there are less than 80 of them). The tensor is therefore always of shape (MAX_NUM_OBJ, 3).

In [8]:
i100['center_label'].shape

(80, 3)

As there are only 4 oriented bounding boxes for the current data item (or point cloud patch), only the first 8 elements are printed in the following.

In [9]:
i100['center_label'][0:8,]

array([[ 72.8173,  38.2576,   9.8878],
       [-46.2576,  -4.8597,  -3.3325],
       [-71.72  ,   8.8851,  -7.2575],
       [-57.7602,  53.7239,  -9.1138],
       [  0.    ,   0.    ,   0.    ],
       [  0.    ,   0.    ,   0.    ],
       [  0.    ,   0.    ,   0.    ],
       [  0.    ,   0.    ,   0.    ]], dtype=float32)

**Heading**

The network does not predict angles directly between 0° and 180° (or 360°), but rather uses a combination of a heading class and a heading residual to determine angles. The ground truth values for these are stored in 'heading_class_label' and 'heading_residual_label'. Each tensor holds 1 value per oriented bounding box and is therefore of shape (MAX_NUM_OBJ, ).

In [10]:
i100['heading_class_label'].shape

(80,)

As already mentioned, the heading values are classes and therefore the values are of type integer.

In [11]:
i100['heading_class_label'][0:8,]

array([1, 4, 4, 4, 0, 0, 0, 0])

From the class outputs above, we can see that the last 3 (of the 4) bounding boxes are oriented approximately in the same direction, whereas the first bounding box has a completely different orientation. The residuals for the headings then provide the finer heading information for the final angles. 

In [12]:
i100['heading_residual_label'][0:8,]

array([-0.09839877, -0.1772951 , -0.1610951 , -0.1053951 ,  0.        ,
        0.        ,  0.        ,  0.        ], dtype=float32)

These values are calculated from angles by the network implementation and do not need to be provided in this form. VoteNet provides specific functions for converting values back and forth in the ISPRSDatasetConfig class.

As described in the "ISPRS_detection_dataset.py" file, angles are provided from 0 to 2pi (or -pi to pi), and the class centers are then at 0°, 1\*(2pi/N)°, 2\*(2pi/N)°, ..., (N-1)\*(2pi/N)°, where N is the number of bins for the heading classes. (In the current implementation, N is equal to 12.) The angle is then calculated as angle = class\*(2pi/N) + residual.

**Size**

As with the angles, the network uses size classes and size residuals to describe the size of an oriented bounding box. The meaning is different, however. The network keeps for each object class a mean size, which is determined from the training data by the data preparation process. Since each object belongs to a certain class, its size is determined by the mean size of this class and the residual size. So, the network does predict the class the object belongs to, and the size difference of this object (the residual) to the mean size of the class.

In [13]:
i100['size_class_label'][0:8,]

array([0, 0, 0, 0, 0, 0, 0, 0])

Since the dataset only contains one class, buildings, all size labels are 0. But the residuals (in length, height, and width) are different.

In [14]:
i100['size_residual_label'][0:8,]

array([[42.385075 , 30.234741 ,  3.9852364],
       [21.256077 ,  4.4027414,  6.0556364],
       [ 9.053876 ,  7.3727417,  7.8656363],
       [29.935476 , 33.12374  ,  5.2822366],
       [ 0.       ,  0.       ,  0.       ],
       [ 0.       ,  0.       ,  0.       ],
       [ 0.       ,  0.       ,  0.       ],
       [ 0.       ,  0.       ,  0.       ]], dtype=float32)

As with the angles, the ISPRSDatasetConfig class provides functions to convert the sizes to classes and residuals and vice versa.

**Semantic class**

The 'sem_cls_label' holds the semantic class labels. Since all objects belong to the same class (class 0 for building), all values are 0.

In [15]:
i100['sem_cls_label'][0:8,]

array([0, 0, 0, 0, 0, 0, 0, 0])

**Box label mask** 

The 'box_label_mask' determines if the data item is a unique bounding box (value 1) or not (value 0). Because from the semantic class information, e.g., which holds only the values 0 for all entries, it could not be determined if the data item is a bounding box of class 0 or no valid bounding box at all.

In [16]:
i100['box_label_mask'][0:8,]

array([1., 1., 1., 1., 0., 0., 0., 0.], dtype=float32)

## Votes

In VoteNet, the points of the point cloud can vote for up to 3 objects. (Or we could say that a point can be part of up to 3 objects.) A vote for an object is given by the vector from the coordinates of the point itself to the coordinates of the center point of the object it votes for. In case the point votes only for 1 object, then the 3 coordinate triples (for the 3 vectors) are the same. Points that do not vote hold the coordinates (0,0,0) for each vote. The shape of the tensor for votes is therefore (N, 9).

In [17]:
i100['vote_label'][0:8,]

array([[ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,
         0.    ,  0.    ],
       [ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,
         0.    ,  0.    ],
       [ 4.1774,  4.4503, -1.6325,  4.1774,  4.4503, -1.6325,  4.1774,
         4.4503, -1.6325],
       [ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,
         0.    ,  0.    ],
       [ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,
         0.    ,  0.    ],
       [ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,
         0.    ,  0.    ],
       [ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,
         0.    ,  0.    ],
       [ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,
         0.    ,  0.    ]], dtype=float32)

The 'vote_label_mask' stores the value 1 to indicate that a point is in one of the object's oriented bounding boxes, or 0 otherwise.

In [18]:
i100['vote_label_mask'][0:8,]

array([0, 0, 1, 0, 0, 0, 0, 0])

If we sum up the 1s in the array, we get the number of vote points in the patch.

In [19]:
import numpy as np

np.sum(i100['vote_label_mask'])

5630

And the *argwhere()* function returns the indexes for which the condition is fulfilled. Here, the points that have a vote label mask value of 1.

In [20]:
np.argwhere(i100['vote_label_mask']==1)

array([[    2],
       [   28],
       [   34],
       ...,
       [99948],
       [99977],
       [99990]])

## Scan Index

The scan index just holds some integer value that can be used to identify the patch and relate it to the original data. Here, the patch number is stored. (The returned array is actually just a scalar value with the index value.)

In [21]:
i100['scan_idx']

array(100)

## Patch Center

In order to merge the results of the patches together to a full area wide dataset again, each patch holds the x,y,z-coordinates of the center point of the patch by which it needs to be translated.

In [22]:
i100['patch_center']

array([[4.96668485e+05, 5.41897510e+06, 2.63720000e+02]])

# ISPRS Dataset Config

The ISPRSDatasetConfig class manages all the information of the dataset that the network needs, like the number of classes, the number of heading bins, the number of size clusters, a dictionary of class values and class names, and a dictionary of class names to mean bounding box sizes. 

The sizes are not per class, but rather by what is called clusters of objects. The reasoning is simple: if there are objects of very different bounding box sizes, then the network has problems with predicting their sizes correctly. As a consequence, a class is split into several clusters. So many clusters can be associated with the same semantic class. In this example, there is just 1 cluster for the class buildings, even though buildings are typically of very different sizes. (Be careful, it may be that the terminology with cluster and class is not cleanly distinguished everywhere. It is only important to know that objects can be separated into several clusters depending on their sizes to improve prediction.)

A dataset configuration object can be instantiated like any other object.

In [23]:
from model_util_ISPRS import ISPRSDatasetConfig

isprs_config = ISPRSDatasetConfig()

Besides these parameters, the dataset configuration class contains methods to translate a size value to size class (*size2class()*), size class to size value (*class2size()*), heading angle to class and residual (*angle2class()*), heading class and residual to size value (*class2angle()*), and a function that generates an oriented bounding box from all these parameters (*param2obb()*).

For the 4 bounding boxes of the patch, the following code translates the ground truth class information to angles.

In [24]:
for i in range(4):
    print(isprs_config.class2angle(i100['heading_class_label'][i], i100['heading_residual_label'][i]))

0.4252000007360094
1.9170999987240287
1.933300004732177
1.9890000013823959


Or convert some random angles to classes and residuals.

In [25]:
import math
import random

print('i   angle  degree   cls   residual')
print('----------------------------------')

for i in range(5):
    a = random.uniform(0.0, 2.0*math.pi)
    a_deg = a * 360.0 / (2.0*math.pi)
    c, r = isprs_config.angle2class(a)
    print(f'{i}   {a:.2f}   {a_deg:>6,.2f}   {c:>2}   {r:>5,.2f}')

i   angle  degree   cls   residual
----------------------------------
0   1.76   100.81    3    0.19
1   3.67   210.31    7    0.01
2   4.73   270.73    9    0.01
3   0.51    28.99    1   -0.02
4   4.79   274.71    9    0.08


This also works similarly for the sizes of the oriented bounding boxes.

In [26]:
for i in range(4):
    print(isprs_config.class2size(i100['size_class_label'][i], i100['size_residual_label'][i]))

[51.44799872 36.07119974  8.25459995]
[30.31900091 10.23919996 10.32499995]
[18.11680002 13.20920023 12.13499989]
[38.9984004  38.96019968  9.55160012]


# Verification of the Data Classes

The Python file of the original implementation of the DetectionVotesDataset class contains a *main()* funtion, which can be used to visually verify the dataset classes. (Such a verification function that tests the basic functionality of a Python class can often be found at the end of class definition files.) Thereby different files are written in PLY format, which can be downloaded and visualized in CloudCompare. However, this *main()* function must be executed in the terminal with the Python command inside the directory that the Python file is located.

Based on this *main()* function, a more convenient function for the exercise named *write_ISPRSDetectionVotesDataset()* was implemented, which takes as argument the patch index and writes out the corresponding PLY files. (If you cannot see a PLY dataset in CloudCompare, e.g. the votes, then select the particular dataset and increase the point size in the properties panel or/and change the color in the menu Edit->Colors->Set unique.)

In [27]:
from ISPRS_detection_dataset import write_ISPRSDetectionVotesDataset

write_ISPRSDetectionVotesDataset(380)

[-44.19079971  27.02589989   7.23999977  13.45899983  10.3939999
   2.38000001  -3.0828    ]
[-43.87689972  10.42430019   4.49069977  13.47180004  11.09600004
   7.65859999  -3.089     ]
[-43.2112999   -1.12820005   6.35449982  12.27859993  11.05260023
   2.78300001  -3.0703    ]
[-42.73020172 -10.77900028   2.40199995  11.66420002   6.70699998
   9.14400019   3.14158531]
[-35.49240112 -25.31629944   0.76749998  18.91880007  13.86619982
   5.73599996  -2.36820001]
[-17.40789986  -8.15649986   3.08349991  13.50079985  11.57619986
   7.8749999   -0.8567    ]
[-17.78590012  23.91550064   5.5374999   45.83960028  12.13820013
   9.70500006  -2.35170001]
[-23.0135994   48.96530151   6.18030024  23.70499964  17.09779962
   8.39839997  -3.0365    ]
[ 5.40479994 16.66010094  4.65869999 21.65879984 18.20640024  8.45739998
 -0.77      ]
[24.62879944 -7.23409986  4.13180017 21.86179991 14.62700018  8.44340004
 -0.76530001]
[38.45080185 -1.60950005  2.81769991 17.13679953  6.43799999  2.6256
 -0.78

# User Provided Input Files

In the last part of this notebook, we take a quick look at the input files that the  ISPRSDetectionVotesDataset class reads, since these files are the ones that need to be created by the user. (Of course, the file structure could also be different if one decides to implement the data loading from scratch.)

In [28]:
import os
import numpy as np

# change the (4 integer digit) number for a different patch
patch_num = "0380"

## 3D Point Cloud

Read the 3D point cloud from a compressed (noticeable by the ending '.npz') NumPy file, which returns a dictionary of arrays from which the 3D point cloud is extracted from with the key 'pc'. (There are no other keys in the NumPy array, so no further data is contained in the file.) As can be seen from the coordinates, the array just contains the x,y,z-coordinates of the 3D point cloud. (No transformation of the coordinates was applied beforehand.)

In [29]:
pc_fp = os.path.join(os.path.expanduser("~/coursematerial/GIS/ISPRS/VoteNet/Patches100k"), "ISPRS_" + patch_num + "_pc.npz")
print(pc_fp)

pc = np.load(pc_fp, allow_pickle=True)

print(pc['pc'])

/home/jovyan/coursematerial/GIS/ISPRS/VoteNet/Patches100k/ISPRS_0380_pc.npz
[[4.97627335e+05 5.41963460e+06 2.69254000e+02]
 [4.97627325e+05 5.41963438e+06 2.69360000e+02]
 [4.97627227e+05 5.41963441e+06 2.69376000e+02]
 ...
 [4.97671790e+05 5.41960828e+06 2.71710000e+02]
 [4.97678566e+05 5.41962824e+06 2.74705000e+02]
 [4.97578869e+05 5.41965203e+06 2.71435000e+02]]


The utility module "pc_util" of VoteNet (which should be located in the same directory as this notebook) contains a helper function to write the 3D point cloud in PLY format, which can then be downloaded and visualized with CloudCompare.

In [30]:
import pc_util

pc_util.write_ply(pc['pc'], "exercise_pc.ply")

## Oriented Bounding Boxes

Next, the oriented bounding boxes are read from a (regular, non-compressed) NumPy file. 

In [31]:
obb_fp = os.path.join(os.path.expanduser("~/coursematerial/GIS/ISPRS/VoteNet/Patches100k"), "ISPRS_" + patch_num + "_bbox.npy")
print(obb_fp)

obb = np.load(obb_fp)

print(obb.shape)

/home/jovyan/coursematerial/GIS/ISPRS/VoteNet/Patches100k/ISPRS_0380_bbox.npy
(15, 8)


The oriented bounding box array contains the x,y,z-coordinates of the center points of the bounding box (array elements 0, 1, 2), the half-lengths (!) of the bounding box sizes (array elements 3, 4, 5), the heading angle (element 6), and the semantic class (element 7). **See also the definitions of the input file data in the heading of the "ISPRS_detection_dataset.py" file.**

In [32]:
print(obb)

[[4.97583090e+05 5.41966151e+06 2.76555000e+02 6.72950000e+00
  5.19700000e+00 1.19000000e+00 3.08280000e+00 0.00000000e+00]
 [4.97583404e+05 5.41964491e+06 2.73805700e+02 6.73590000e+00
  5.54800000e+00 3.82930000e+00 3.08900000e+00 0.00000000e+00]
 [4.97584070e+05 5.41963336e+06 2.75669500e+02 6.13930000e+00
  5.52630000e+00 1.39150000e+00 3.07030000e+00 0.00000000e+00]
 [4.97584551e+05 5.41962371e+06 2.71717000e+02 5.83210000e+00
  3.35350000e+00 4.57200000e+00 3.14160000e+00 0.00000000e+00]
 [4.97591789e+05 5.41960917e+06 2.70082500e+02 9.45940000e+00
  6.93310000e+00 2.86800000e+00 2.36820000e+00 0.00000000e+00]
 [4.97609873e+05 5.41962633e+06 2.72398500e+02 6.75040000e+00
  5.78810000e+00 3.93750000e+00 8.56700000e-01 0.00000000e+00]
 [4.97609495e+05 5.41965840e+06 2.74852500e+02 2.29198000e+01
  6.06910000e+00 4.85250000e+00 2.35170000e+00 0.00000000e+00]
 [4.97604267e+05 5.41968345e+06 2.75495300e+02 1.18525000e+01
  8.54890000e+00 4.19920000e+00 3.03650000e+00 0.00000000e+00]


To write the bounding boxes to PLY files with the helper function *write_oriented_bbox()*, the half-lengths must first be converted to full lengths (as in the *\_\_get_item\_\_()* method of the *ISPRSDetectionVotesDataset* class.)

In [33]:
obb[:,3:6] = obb[:,3:6]*2

And the heading angle must be inverted. (We cannot give a satisfying explanation for this, but it follows the function *viz_obb()* found in the *ISPRSDetectionVotesDataset* class. Maybe the code does not consistently use the same definition of the oriented bounding boxes.)

In [34]:
obb[:,6] = -obb[:,6]

Now, the *write_oriented_bbox()* function can be used to output the oriented bounding boxes.

In [35]:
pc_util.write_oriented_bbox(obb, "exercise_obbs.ply")

## Votes

Last, the votes are explored. Votes are input points of the 3D point cloud that vote for the center of up to 3 bounding boxes. The array contains as the first element the number of votes (between 0 and 3), and then 3 coordinate triples (x,y,z) for each vote. The array therefore has a shape of (N, 10). Since each point of the ISPRS dataset only votes for at most 1 bounding box, the first element is always 0 or 1. And for vote points, the coordinate triples are repeated 3 times to vote for the same object.

In [36]:
votes_fp = os.path.join(os.path.expanduser("~/coursematerial/GIS/ISPRS/VoteNet/Patches100k"), "ISPRS_" + patch_num + "_votes.npz")
print(votes_fp)

votes = np.load(votes_fp, allow_pickle=True)

print(votes['point_votes'].shape)

# extract the votes from the array, so that the variable can be used without the indirection
votes = votes['point_votes']

/home/jovyan/coursematerial/GIS/ISPRS/VoteNet/Patches100k/ISPRS_0380_votes.npz
(100000, 10)


The indexes for voting points can be retrieved from the whole array with the *argwhere()* function of NumPy, where the first element (column) is checked against the value 1. 

In [37]:
np.argwhere(votes[:,0]==1)

array([[ 3809],
       [ 3879],
       [ 3906],
       ...,
       [99969],
       [99976],
       [99978]])

Outputting the shape of the returned array gives the number of vote points. (By checking against the values 0, 2, or 3, the number of points that do not vote, vote for 2, or vote for 3 bounding boxes can be returned.) The sum of all vote points should be equal to the number of points in the 3D point cloud.

In [38]:
np.argwhere(votes[:,0]==0).shape

(68078, 1)

As an example, the information of the vote point at index 3809 is given next.

In [39]:
votes[3809]

array([ 1.    ,  4.4058,  7.8291, -1.2763,  4.4058,  7.8291, -1.2763,
        4.4058,  7.8291, -1.2763])

With the *viz_votes()* function in "ISPRS_detection_dataset.py", the votes can be written in PLY format. Besides the 3D point cloud and the 3 vote vectors as first and second arguments, respectively, the function takes as third argument the first element of the votes array as the votes mask. So, the 3 vote vectors and the so called vote mask must be separated beforehand with the slicing operator. 

As the votes are just vectors from the points of the 3D point cloud to the centers of the oriented bounding box, the output file visualizes the center points of the bounding boxes that the vote points vote for. So, many of the votes point to exactly the same center point. (During prediction, however, the points will typically vote for points with different center coordinates, since they are not as precise and show coordinate differences. This is the reason, why the vote points for one and the same object need to be collected with some sphere neighborhood threshold.) In addition, the points that do vote (where the vote mask is not 0) are outputted as another file as the object points.

In [40]:
from ISPRS_detection_dataset import viz_votes

viz_votes(pc['pc'], votes[:,1:10], votes[:,0], "exercise_")

This concludes the thorough examination of the input data and the corresponding classes that VoteNet uses for training and prediction.

**Continue now with the next notebook (15.3) on adapting the VoteNet network architecture and training the network.**