### Correspondence Selection ###

The following process is used to automatically compile a training data set to be used in learning a shadow attenuation model. The output of the previous segmentation has provided us with 'patches' of surface that are likely a simliar material, and are spatially connected to each other - partly in the shade, and partly in the open. From those segments we can select a well distributed subset of pixels in the shade, then identify nearby pixels which we hope are the same material in the open.

In [69]:
import matplotlib.pyplot as plt
import numpy as np
import pickle

lidar_import = np.load('data/lidar_data.npz')
intensity = lidar_import['intensity']
dem = lidar_import['dem']

lidar_segment_import = pickle.load(open('data/lidar_segmentation.p', 'rb'))

#### Selection Amounts ####

Each 'correspondence' will consist of one pixel in the shade, and at least one pixel in the open. Selecting more than one pixel in the open allows for some flexibility in assessing the learned models later in the pipeline.

In [70]:
rng = np.random.default_rng(131071)

number_of_correspondences = 200
number_of_candidate_pairs = 5

# building a list of (region_id, pixel_index) pairs to build training data from
training_pixels_in_shade = []
for i in range(number_of_correspondences):
    region_id = rng.integers(1, len(lidar_segment_import)+1)
    pixel_index = rng.integers(0, len(lidar_segment_import[region_id]['shade']))
    training_pixels_in_shade.append((region_id, pixel_index))

#### Candidate Pixel Identification ####

Once we have a batch of randomly selected shadow pixels, we need to pair them up with pixels in the open. The operations below iterate over each training pixel in the shade and perform the following steps:  
* Identify all pixels in the open with the same reigon ID
* Build a matrix of [x, y, z, intensity] values for each open pixel
* Find the *k* nearest candidate pixels to the current training pixel where *k* = `number_of_candidate_pairs`

**Note:** this part of the pipeline is highly customizable. In the below, only lidar attributes are used, and the squared-euclidean distance is applied to determine *nearness*. A number of other features (e.g. from the sepctral data), and other distance metrics could be utilized. In practical experiments the current configuration has provided reasonable training data sets.

In [71]:

# building a collection of the form { shade_pixel: possible_open_pixels[] }
training_data = {}

for region_id, pixel_index in training_pixels_in_shade:
    
    training_pixel = lidar_segment_import[region_id]['shade'][pixel_index]
    training_data[training_pixel] = []

    # matrix of observations in associated 'open' patch adjacent to training pixel
    candidate_data = np.array([[pixel[0], pixel[1], dem[pixel], intensity[pixel]] 
                               for pixel in lidar_segment_import[region_id]['open']])
    
    # scale feature values to candidate max to mitigate uneven weighting
    candidate_max = np.max(candidate_data, axis=0)
    training_point = np.array([training_pixel[0], training_pixel[1], dem[training_pixel], intensity[training_pixel]])
    training_point = training_point / candidate_max

    # compute distances of candidate pixels from training pixel; select specified number of 
    # nearest pixels for inclusion in training output
    candidate_distances = np.sum((candidate_data - training_point) ** 2, axis=1)
    ascending_distance_indices = np.argsort(candidate_distances)
    
    for candidate_index in ascending_distance_indices[:number_of_candidate_pairs]:
        training_data[training_pixel].append(lidar_segment_import[region_id]['open'][candidate_index])


pickle.dump(training_data, open('data/training_correspondences.p', 'wb'))