# Custom 3D style dataset parser

### The generated output includes the following annotation data:
*     bounding_box &nbsp;&nbsp;&nbsp; 4 x 1 float
*	  key_points_3D &nbsp;&nbsp;&nbsp;3 x k float (provide name sheet)
*	  key_points_2D &nbsp;&nbsp;&nbsp;2 x k float
*	  visibility &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 x k int (0 occluded or 1 visible)
*	  rot_mat	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3 x 3 float
*	  trans_mat	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3 x 1 float
*	  intrinsics_mat &nbsp;&nbsp;&nbsp; 3 x 3 float

### Example application(s) (as demonstrated in Plum et al. 2023):
* _experimental feature / not used in presented results_

### Output structure:
* target_dir
    * Data_3D_Pose.hdf5 _(pose annoation file)_
    * label_names.txt
    * all generated images
    
### Notes:

* This script is used to automatically generate custom datasets to support object detection as well as 3D and 2D pose estimation, including camera data.

* **WARNING :** In this version, restrict the **Colony size** to a **maxmimum of 1 indivudal**!

* The script **excludes empty samples** (*no animals present*) automatically and provide an additional **occlusion vector**, indicating whether a key point is visible (1) or occluded (0).

In [1]:
import cv2
import json
import time
import threading
import queue
import sys
import os

import numpy as np
import pandas as pd
import matplotlib as plt
import pathlib

from os import listdir
from os.path import isfile, join

### Required parameters

Specify the location of your **generated dataset** and in which **output directory** you wish to save it.

**Notes:**
* do not include trailing forward slashes in your paths (see examples below)
* Your **dataset** name should **NOT include underscores** as they are used to separate passes into their categories. Instead, use hyphens in your naming convention where applicable.

In [2]:
# define location of dataset and return all files
dataset_location = "../example_data/input-single"
target_dir = "../example_data/3D"

# specify which labels to ignore. By default, all keypoints are written into the dataset
# in this example we omit all keypoints relating to wings. Refer to the base_rig documentation for naming conventions
omit_labels = ['w_1_l', 'w_1_l_end', 'w_2_l', 'w_2_l_end', 'w_1_r', 'w_1_r_end', 'w_2_r', 'w_2_r_end', 'root']

### Optional parameters

In [3]:
# set True to show processing results for each image (disables parallel processing)
DEBUG = False

# we can optionally remove occluded points from the dataframe
EXCLUDE_OCCLUDED_KEYPOINTS = True

In [4]:
all_files = [f for f in listdir(dataset_location) if isfile(join(dataset_location, f))]
all_files.sort()

# next, sort files into images, depth maps, segmentation maps, data, and colony info
# we only need the location and name of the data files, as all passes follow the same naming convention
dataset_data = []
dataset_img = []
dataset_ID = []
dataset_depth = []
dataset_norm = []
dataset_colony = None

for file in all_files:
    loc = dataset_location + "/" + file
    file_info = file.split("_")
    
    if file_info[1] == "BatchData":
        dataset_colony = loc
        
    elif len(file_info) == 2:
        # images are available in various formats, but annotation data is always written as json files
        if file_info[-1].split(".")[-1] == "json":
            dataset_data.append(loc)
        else:
            dataset_img.append(loc)
            
    elif file_info[2].split(".")[0] == "ID":
        dataset_ID.append(loc)
    elif file_info[2].split(".")[0]  == "depth":
        dataset_depth.append(loc)
    elif file_info[2].split(".")[0]  == "norm":
        dataset_norm.append(loc)
        
print("Found",len(dataset_data),"samples...")

# next sort the colony info into its IDs to determine the colony size and individual scales
# Opening colony (BatchData) JSON file
colony_file = open(dataset_colony)
 
# returns JSON object as a dictionary
colony = json.load(colony_file)
colony_file.close()

""" !!! requires IDs, model names, scales !!! """

print("Loaded colony file with seed",colony['Seed'],"and",len(colony['Subject Variations']),"individual(s).")
if len(colony['Subject Variations']) == 1:
    print("Generating single-animal dataset!")
else:
    print("WARNING! Multi-animal datasets are currently NOT supported!")

Found 10 samples...
Loaded colony file with seed 12345 and 1 individual(s).
Generating single-animal dataset!


Now that we have the cleaned colony info, we can start loading the data associated with each frame.
For simplicity, we will produce a list of lists, containing all individuals and their attributes for each frame.

We will therefore access "data" as **[iteration] [individual] [attribute]**, where attributes will include [ID,bbox_x_0,bbox_y_0,...]

The data files additionally contain **camera information**, such as **extrinsics** in the form of **transformation** and **rotation** in the global coordinate frame, and camera **intrinsics**. For ease of use, the data files also contain the camera **View Projection Matrix**.

In [5]:
camera_data = []
camera_data_types = []

# loading the first iteration data file to retrieve keys of camera info
exp_file = open(dataset_data[0])
exp_data = json.load(exp_file)
exp_file.close()

# get the types of camera information stored
camera_data_types = list(exp_data["iterationData"]["camera"].keys())

print("The following camera data has been included:\n",camera_data_types)

The following camera data has been included:
 ['FOV', 'Location', 'Rotation', 'View Matrix', 'View Projection Matrix']


As there may be animals for which we don't use all bones we can return a list of all labels and exclude the respective locations from the pose data. As all animals use the same convention, we can simply read in one example and remove the corresponding indices from all animals.

In [6]:
# loading the first entry of first iteration file to retrieve skeleton info
exp_file = open(dataset_data[0])
exp_data = json.load(exp_file)
exp_file.close()

# for simplicity we'll assume that at this stage all subjects use the same armature and therefore report the same keypoints
first_entry_key = list(exp_data["iterationData"]["subject Data"][0].keys())[0]
labels = list(exp_data["iterationData"]["subject Data"][0][first_entry_key]["keypoints"].keys())

print("\nOmitting labels:", omit_labels)

# removing all occurences of omitted labels from the labels list to be used as keys below
labels = [x for x in labels if x not in omit_labels]

print("\nFinal labels:",labels)


Omitting labels: ['w_1_l', 'w_1_l_end', 'w_2_l', 'w_2_l_end', 'w_1_r', 'w_1_r_end', 'w_2_r', 'w_2_r_end', 'root']

Final labels: ['an_1_l', 'an_1_r', 'an_2_l', 'an_2_r', 'an_3_l', 'an_3_l_end', 'an_3_r', 'an_3_r_end', 'b_a_1', 'b_a_2', 'b_a_3', 'b_a_4', 'b_a_5', 'b_a_5_end', 'b_h', 'b_t', 'l_1_co_l', 'l_1_co_r', 'l_1_fe_l', 'l_1_fe_r', 'l_1_pt_l', 'l_1_pt_l_end', 'l_1_pt_r', 'l_1_pt_r_end', 'l_1_ta_l', 'l_1_ta_r', 'l_1_ti_l', 'l_1_ti_r', 'l_1_tr_l', 'l_1_tr_r', 'l_2_co_l', 'l_2_co_r', 'l_2_fe_l', 'l_2_fe_r', 'l_2_pt_l', 'l_2_pt_l_end', 'l_2_pt_r', 'l_2_pt_r_end', 'l_2_ta_l', 'l_2_ta_r', 'l_2_ti_l', 'l_2_ti_r', 'l_2_tr_l', 'l_2_tr_r', 'l_3_co_l', 'l_3_co_r', 'l_3_fe_l', 'l_3_fe_r', 'l_3_pt_l', 'l_3_pt_l_end', 'l_3_pt_r', 'l_3_pt_r_end', 'l_3_ta_l', 'l_3_ta_r', 'l_3_ti_l', 'l_3_ti_r', 'l_3_tr_l', 'l_3_tr_r', 'ma_l', 'ma_l_end', 'ma_r', 'ma_r_end']


Now that loaded example annotation data and the batch / colony info we can start plotting bounding boxes, joint locations, and check if the camera attributes have been exported correctly.

Let's quickly define a few functions to parse the produced data.

In [7]:
# transform between sRGB and linear colour space (optional)

def to_linear(srgb):
    linear = np.float32(srgb) / 255.0
    less = linear <= 0.04045
    linear[less] = linear[less] / 12.92
    linear[~less] = np.power((linear[~less] + 0.055) / 1.055, 2.4)
    return linear * 255.0

    
def from_linear(linear):
    srgb = linear.copy()
    less = linear <= 0.0031308
    srgb[less] = linear[less] * 12.92
    srgb[~less] = 1.055 * np.power(linear[~less], 1.0 / 2.4) - 0.055
    return srgb * 255.0

def fix_bounding_boxes(coords,max_val = [1024,1024]):
    # fix bounding box coordinates so they do not reach beyond the image
    fixed_coords = []
    for c, coord in enumerate(coords):
        if c == 0 or c == 2:
            max_val_temp = max_val[0]
        else:
            max_val_temp = max_val[1]
            
        if coord >= max_val_temp:
            coord = max_val_temp
        elif coord <= 0:
            coord = 0
        
        fixed_coords.append(int(coord))
        
    return fixed_coords

# and compute the XYZ rotation matrix from roll, pitch, and yaw
def get_rotation_matrix(roll,pitch,yaw,degrees=True):
    # convert to radian
    if degrees:
        roll = np.radians(-roll)
        pitch = np.radians(pitch)
        yaw = np.radians(-yaw)
    # roll rotation 
    Rx = np.array([[1, 0, 0],
                   [0,np.cos(roll),-np.sin(roll)],
                   [0,np.sin(roll),np.cos(roll)]])
    # pitch rotation
    Ry = np.array([[np.cos(pitch),0,np.sin(pitch)],
                   [0, 1, 0],
                   [-np.sin(pitch),0,np.cos(pitch)]])
    # yaw rotation
    Rz = np.array([[np.cos(yaw),-np.sin(yaw),0],
                   [np.sin(yaw),np.cos(yaw),0],
                   [0, 0, 1]])
    #Rxyz = np.round(np.matmul(np.matmul(Rz,Ry),Rx),3)
    Rxyz = Ry @ Rz @ Rx
    return Rxyz

def parse_projection_components(iteration_data_file):
    
    ####### DOUBLE CHECK THESE VALUES GO INTO THE RIGHT MATRIX ELEMENTS ############
    
    # converts Unreal view projection into rotation and translation components
    input_matrix = iteration_data_file["iterationData"]["camera"]["View Matrix"]
    w = input_matrix["wPlane"]
    x = input_matrix["xPlane"]
    y = input_matrix["yPlane"]
    z = input_matrix["zPlane"]
    # now, assign the respective transposed values to the rotation...
    cam_rot = np.array([[x["x"],y["x"],z["x"]],
                        [x["y"],y["y"],z["y"]],
                        [x["z"],y["z"],z["z"]]])
    # and the translation
    """
    cam_trans = np.array([iteration_data_file["iterationData"]["camera"]["Location"]["x"],
                          iteration_data_file["iterationData"]["camera"]["Location"]["y"],
                          iteration_data_file["iterationData"]["camera"]["Location"]["z"]])
    """
    cam_trans = np.array([w["x"],
                          w["y"],
                          w["z"]])
    # There. Tried to do it differently, had a break down, now it works. 
    # Bon appetit
    return cam_rot,cam_trans

def parse_camera_intrinsics(batch_data_file, iteration_data_file):
    # first get the image resolution from the batch data file and the current FOV from the iteration data file
    res_px_X = batch_data_file["Image Resolution"]["x"]
    res_px_Y = batch_data_file["Image Resolution"]["y"]
    FOV = iteration_data_file["iterationData"]["camera"]["FOV"]
    
    # then compute the image centre and focal length in x and y respectively
    # https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html 
    
    cx = res_px_X / 2
    cy = res_px_Y / 2
    
    fx = cx / np.tan(np.radians(FOV)/2)
    fy = cy / np.tan(np.radians(FOV)/2)
    
    return cx, cy, fx, fy

## Generating 3D pose output files
Now comes the difficult part: getting all this data into the required format.

We're going to want an **.h5** formatted file, essentially one dataframe for the entire dataset with the following entries:

*	  file_name &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 string (relative)
*	  rot_mat	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3 x 3 float
*	  trans_mat	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3 x 1 float
*	  intrinsics_mat &nbsp;&nbsp;&nbsp; 3 x 3 float
*     bounding_box &nbsp;&nbsp;&nbsp; 4 x 1 float
*	  key_points_3D &nbsp;&nbsp;&nbsp;3 x k float (provide name sheet)
*	  key_points_2D &nbsp;&nbsp;&nbsp;2 x k float
*	  visibility &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 x k int (0 occluded or 1 visible)

To provide visibility info, we will check whether the subject is visbile in the respective segmentation map at the given screen X & Y coordinates.

In [8]:
out_df = pd.DataFrame(index=range(len(dataset_data)),columns=["file_name",
                                                      "cam_rot",
                                                      "cam_trans",
                                                      "cam_intrinsics",
                                                      "bounding_box",
                                                      "key_points_3D",
                                                      "key_points_2D",
                                                      "visibility"])

print("Number samples:",len(dataset_data))
print("Colony size:",len(colony['Subject Variations']))
print("body parts:",len(labels)," (including image X & Y, as well as world X Y Z coordinates)\n")
print("Resulting in a dataframe of shape:",out_df.shape)

output_file_names = ["" for i in range(len(dataset_data))]

Number samples: 10
Colony size: 1
body parts: 62  (including image X & Y, as well as world X Y Z coordinates)

Resulting in a dataframe of shape: (10, 8)


With all dataset related parameters configured, we have provided a multi-threaded parsing solution below to minimise the processing time it takes to bring the entire dataset into the required output format. Currently, we instanciate one processing thread per (virtual) CPU core but you can adjust this value if you wish by changing:

```
threadList_export = createThreadList(#NumDesiredThreads)
```

**Note:** To see the process of mask generation from ID passes in action, set the **DEBUG** mode to **"True"**. This will however slow down the processing speed considerably and only run in single-threaded mode!

In [9]:
def getThreads():
    """ Returns the number of available threads on a posix/win based system """
    if sys.platform == 'win32':
        return int(os.environ['NUMBER_OF_PROCESSORS'])
    else:
        return int(os.popen('grep -c cores /proc/cpuinfo').read())

class exportThread(threading.Thread):
    def __init__(self, threadID, name, q):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.q = q

    def run(self):
        print("Starting " + self.name)
        process_detections(self.name, self.q)
        print("Exiting " + self.name)
        
def createThreadList(num_threads):
    threadNames = []
    for t in range(num_threads):
        threadNames.append("Thread_" + str(t))

    return threadNames

def process_detections(threadName, q):
    while not exitFlag_export:
        queueLock.acquire()
        if not workQueue_export.empty():
            
            data_input = q.get()
            i, data_loc, img, ID = data_input
            queueLock.release()
            
            display_img = cv2.imread(img)
            display_img_orig = display_img.copy()
            
            # compute visibility for each individual
            seg_img = cv2.imread(ID)
            seg_img_display = seg_img.copy()
            
            data_file = open(data_loc)
            # returns JSON object as a dictionary
            data = json.load(data_file)
            data_file.close()
            
            img_shape = display_img.shape
            
            # only add images that contain visibile individuals
            is_empty = True
            
            img_name = target_dir + "/" + img.split('/')[-1][:-4] + "_synth" + ".png"
            # write the file path to the all_points array
            output_file_names[i] = str(os.path.basename(img))[:-4] + "_synth" + ".png"

            img_info = []
                
            # check if the size of the image and segmentation pass match
            if img_shape != seg_img.shape:
                print("Size mismatch of image and segmentation pass for sample",data_input[1].split("/")[-1],"!")
                incorrectly_formatted_images.append(i)
            else:
                for individual in data["iterationData"]["subject Data"]:
                    ind_key = list(individual.keys())[0]
                    ind_ID = int(ind_key)
                    # WARNING ID numbering begins at 1
                    
                    bbox_orig = [individual[ind_key]["2DBounds"]["xmin"],
                                 individual[ind_key]["2DBounds"]["ymin"],
                                 individual[ind_key]["2DBounds"]["xmax"],
                                 individual[ind_key]["2DBounds"]["ymax"]]
                    
                    bbox = fix_bounding_boxes(bbox_orig, max_val=display_img.shape)
                    
                    # only process an individual if its bounding box width and height are not zero
                    if bbox[2] - bbox[0] == 0 or bbox[3] - bbox[1] == 0:
                        continue

                    try:
                        ID_mask = cv2.inRange(seg_img[bbox[1]:bbox[3],bbox[0]:bbox[2]], np.array([0, 0, ind_ID - 2]), np.array([0, 0, ind_ID + 2]))
                        indivual_occupancy = cv2.countNonZero(ID_mask)
                    except:
                        if DEBUG: 
                            print("Individual fully occluded:",ind_ID,"in",dataset_seg[i])
                        indivual_occupancy = 1

                    #indivual_occupancy = np.count_nonzero((seg_img == [0, 0, int((individual[0]/len(colony['ID']))*255)]).all(axis = 2)) + np.count_nonzero((seg_img == [0, 0, int((individual[0]/len(colony['ID']))*255 - 1)]).all(axis = 2)) + np.count_nonzero((seg_img == [0, 0, int((individual[0]/len(colony['ID']))*255 + 1)]).all(axis = 2))
                    bbox_area = abs((bbox[2] - bbox[0]) * (bbox[3] - bbox[1])) + 1
                    bbox_occupancy = indivual_occupancy / bbox_area
                    #print("Individual", individual[0], "with bounding box occupancy ",bbox_occupancy)

                    # write all camera attributes to the output file
                    # for details, refer to https://ksimek.github.io/2013/08/13/intrinsic/

                    out_df.loc[i]["cam_rot"],out_df.loc[i]["cam_trans"] = parse_projection_components(data)

                    cx, cy, fx, fy = parse_camera_intrinsics(colony,data)
                    # https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
                    out_df.loc[i]["cam_intrinsics"] = np.array([[fx, 0,  cx],
                                                                [0,  fy, cy],
                                                                [0,  0,  1]])

                    visbility_img = []
                    XY_2D_points = []
                    XYZ_3D_points = []

                   #cv2.putText(display_img, "ID: " + str(int(individual[0])), (bbox[0] + 10,bbox[3] - 10), font, fontScale, fontColor, lineType)
                    if bbox_occupancy > visibility_threshold:
                        # let's binarise the image and dilate it to make sure all points that visible are found
                        seg_bin = cv2.inRange(seg_img, np.array([0, 0, 1]), np.array([0,0, 3]))
                        kernel = np.ones((5,5), np.uint8)
                        seg_bin_dilated = cv2.dilate(seg_bin,kernel,iterations = 2)
                        if DEBUG:
                            cv2.imshow("dilated mask",seg_bin_dilated)
                            cv2.waitKey(0)

                        for point in range(len(labels)):
                            # get rid of all invalid points first. Those should simply stay NaN in the array
                            if individual[ind_key]["keypoints"][labels[point]]["2DPos"]["x"] > img_shape[0] or individual[ind_key]["keypoints"][labels[point]]["2DPos"]["x"] < 0 or individual[ind_key]["keypoints"][labels[point]]["2DPos"]["y"] > img_shape[1] or individual[ind_key]["keypoints"][labels[point]]["2DPos"]["y"] < 0:
                                continue

                            if individual[ind_key]["keypoints"][labels[point]]["2DPos"]["x"] < 0.1 or individual[ind_key]["keypoints"][labels[point]]["2DPos"]["y"] < 0.1:
                                # exclude all points that lie outside the image
                                visibility_point = 0
                                XY_2D_points.append([0,0])
                                XYZ_3D_points.append([0,0,0])
                            else:
                                # check if the 2D point is occluded in the segmentation image
                                # thanks opencv, of course this has to be indexed as Y,X... thanks, really.
                                if seg_bin_dilated[int(individual[ind_key]["keypoints"][labels[point]]["2DPos"]["y"]),
                                                   int(individual[ind_key]["keypoints"][labels[point]]["2DPos"]["x"])] == 255:                   
                                    visibility_point = 1
                                    XY_2D_points.append([individual[ind_key]["keypoints"][labels[point]]["2DPos"]["x"],
                                                         individual[ind_key]["keypoints"][labels[point]]["2DPos"]["y"]])
                                    XYZ_3D_points.append([individual[ind_key]["keypoints"][labels[point]]["3DPos"]["x"],
                                                          individual[ind_key]["keypoints"][labels[point]]["3DPos"]["y"],
                                                          individual[ind_key]["keypoints"][labels[point]]["3DPos"]["z"]])

                                    # draw 2D points for visualisation
                                    if DEBUG:
                                        seg_img_display = cv2.circle(seg_img_display, (int(individual[ind_key]["keypoints"][labels[point]]["2DPos"]["x"]),
                                                                                       int(individual[ind_key]["keypoints"][labels[point]]["2DPos"]["y"])), 
                                                                                       radius=0, color=(255, 100, 100), 
                                                                                       thickness=5)
                                else:
                                    visibility_point = 0
                                    XY_2D_points.append([0,0])
                                    XYZ_3D_points.append([0,0,0])

                            visbility_img.append(visibility_point)

                    if len(threadList_export) == 1:       
                        cv2.imshow("Segmentation and points", seg_img_display)
                        cv2.imshow("Segmentation binarised and dilated", seg_bin_dilated)
                        cv2.waitKey(0) 

                    out_df.loc[i]["visibility"] = visbility_img

                # if no entries were found for the respective image, remove it from the output list and don't write out the image
                if all([ v == 0 for v in out_df.loc[i]["visibility"]]):
                    remove_empty_entries.append(i)
                else:    
                    out_df.loc[i]["file_name"] = output_file_names[i]
                    out_df.loc[i]["bounding_box"] = bbox
                    out_df.loc[i]["key_points_2D"] = XY_2D_points
                    out_df.loc[i]["key_points_3D"] = XYZ_3D_points
                    cv2.imwrite(img_name, display_img)

        else:
            queueLock.release()
            
# setup as many threads as there are (virtual) CPU cores
exitFlag_export = 0
# set the following to 1 (instead of getThreads()) to display segmentation maps and visible key points
if DEBUG:
    threadList_export = createThreadList(1)
else:
    threadList_export = createThreadList(getThreads())
print("Using", len(threadList_export), "threads for export...")
queueLock = threading.Lock()

# define paths to all images and set the maximum number of items in the queue equivalent to the number of images
workQueue_export = queue.Queue(len(dataset_img))
threads = []
threadID = 1

np.random.seed(seed=1)

font = cv2.FONT_HERSHEY_SIMPLEX
fontScale = 0.5
lineType = 2

# remove entries that contain no data
remove_empty_entries = []

# set true if generating dataset for animal without wings
exclude_wings = True

# we can additionally plot the points in the data files to check joint locations
plot_joints = True

# remember to refine an export folder when saving out your dataset
generate_dataset = True

# determine the proportion of a bounding box that needs to be filled before considering the visibility as too low
# WARNING: At the moment the ID shown in segmentation maps does not always correspond to the ID in the data file (off by 1)
visibility_threshold = 0.001

timer = time.time()

# Create new threads
for tName in threadList_export:
    thread = exportThread(threadID, tName, workQueue_export)
    thread.start()
    threads.append(thread)
    threadID += 1

# Fill the queue with stacks
queueLock.acquire()
for i, (data, img, ID) in enumerate(zip(dataset_data , dataset_img, dataset_ID)):
    workQueue_export.put([i, data, img, ID])
queueLock.release()

# Wait for queue to empty
while not workQueue_export.empty():
    pass

# Notify threads it's time to exit
exitFlag_export = 1

# Wait for all threads to complete
for t in threads:
    t.join()
print("Exiting Main export Thread")

# close all windows if they were opened
cv2.destroyAllWindows()

print("Total time elapsed:",time.time()-timer,"seconds")

Using 12 threads for export...
Starting Thread_0
Starting Thread_1
Starting Thread_2
Starting Thread_3
Starting Thread_4
Starting Thread_5
Starting Thread_6
Starting Thread_7
Starting Thread_8Starting Thread_9
Starting Thread_10

Starting Thread_11
Exiting Thread_8
Exiting Thread_6
Exiting Thread_1
Exiting Thread_10
Exiting Thread_3
Exiting Thread_9
Exiting Thread_2
Exiting Thread_7
Exiting Thread_4
Exiting Thread_0
Exiting Thread_5
Exiting Thread_11
Exiting Main export Thread
Total time elapsed: 0.3549225330352783 seconds


Now, remove empty entries and dump it all into one **.h5** file.

In [10]:
# remove empty entries when training with a single animal
out_df.drop(out_df.index[remove_empty_entries], inplace=True)
# reset the indices of the updated dataframe
out_df.reset_index(drop=True, inplace=True)
out_df.to_hdf(os.path.join(target_dir, "Data_3D_Pose.hdf5"),"df_with_missing",mode="w")

out_df

your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block0_values] [items->Index(['file_name', 'cam_rot', 'cam_trans', 'cam_intrinsics', 'bounding_box',
       'key_points_3D', 'key_points_2D', 'visibility'],
      dtype='object')]

  pytables.to_hdf(


Unnamed: 0,file_name,cam_rot,cam_trans,cam_intrinsics,bounding_box,key_points_3D,key_points_2D,visibility
0,input-single_00_synth.png,"[[0.9631354338199376, 0.15470715323881307, -0....","[21.713797499947457, -28.07079918287375, 159.2...","[[2076.99246791101, 0.0, 256.0], [0.0, 2076.99...","[0, 0, 506, 338]","[[-18.240633941729538, -13.81867652400733, 57....","[[64.03198535903246, 223.98150473832203], [57....","[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, ..."
1,input-single_01_synth.png,"[[-0.7373834923482336, -0.6075230715062317, 0....","[-23.256818275771145, 19.4930820681534, 179.98...","[[2074.7401265614826, 0.0, 256.0], [0.0, 2074....","[0, 272, 475, 481]","[[29.805996367260214, -33.63352241920913, 66.5...","[[156.6185300699741, 456.86018870096837], [178...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."
2,input-single_02_synth.png,"[[-0.17515340765031187, -0.9840537391887164, 0...","[47.68707428150692, 5.235968654795805, 147.746...","[[2530.4281404216035, 0.0, 256.0], [0.0, 2530....","[50, 10, 512, 512]","[[14.431245483962773, 42.98263987719931, 38.09...","[[348.8072653973518, 454.9527574488973], [318....","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."
3,input-single_03_synth.png,"[[-0.874734111833003, -0.46874041455975274, -0...","[74.06831784097388, -5.655559315019467, 179.73...","[[1572.0231376795268, 0.0, 256.0], [0.0, 1572....","[135, 74, 349, 417]","[[55.79065185601214, 37.78110726883063, 43.364...","[[279.84630145688413, 180.06852916920457], [29...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."
4,input-single_04_synth.png,"[[0.6831799145733118, -0.35844447251935113, 0....","[-54.576999104327584, -26.34614998240263, 108....","[[1803.226557676136, 0.0, 256.0], [0.0, 1803.2...","[255, 74, 474, 512]","[[28.276725879940223, -27.135526864062484, 54....","[[423.67694975324616, 424.76737868267605], [42...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."
5,input-single_05_synth.png,"[[0.4092028380091051, 0.9077464515065712, 0.09...","[-24.321859411475153, 35.60502790503329, 162.3...","[[2457.9755309073125, 0.0, 256.0], [0.0, 2457....","[0, 0, 217, 512]","[[-10.641655341475062, 15.1320498533644, 43.55...","[[44.29380093618792, 111.34142741787582], [51....","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."
6,input-single_06_synth.png,"[[0.45234815948434015, 0.8890876645994118, 0.0...","[-37.155947631112035, 35.13992248160784, 129.5...","[[2011.545874946529, 0.0, 256.0], [0.0, 2011.5...","[106, 0, 463, 512]","[[-7.264965961438021, 47.304771244793926, 49.3...","[[364.1178467378925, 385.8169464690583], [345....","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."
7,input-single_07_synth.png,"[[0.7034439525086023, 0.3931730858417975, 0.59...","[-12.800108120644502, 57.40484597773351, 137.7...","[[1363.9622903611182, 0.0, 256.0], [0.0, 1363....","[147, 114, 433, 367]","[[-33.37689914596733, 20.629531019154175, 43.6...","[[226.73826459320938, 159.99692492236375], [24...","[1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."
8,input-single_08_synth.png,"[[0.2744059579495233, 0.9284236858925052, -0.2...","[15.99272048867316, 17.92685165164932, 168.953...","[[1677.359418337137, 0.0, 256.0], [0.0, 1677.3...","[212, 0, 448, 443]","[[-23.081692416354947, 11.970014185179561, 37....","[[403.63664284781987, 321.2175024977605], [393...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."
9,input-single_09_synth.png,"[[0.7191430191077919, -0.5374472990780087, 0.4...","[-10.559842344564345, -46.70254969215734, 145....","[[2128.1362733094925, 0.0, 256.0], [0.0, 2128....","[211, 0, 512, 395]","[[-19.671893359689296, -24.11368244222806, 46....","[[417.2879557427745, 283.1024716449414], [408....","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."


In [11]:
# and finally, dump all the labels into a lookup table
with open(os.path.join(target_dir,'label_names.txt'), 'w') as f:
    # skip first for, and use only every 5th element
    for label in labels:
        f.write(label.split(".")[0]+"\n")

print("All done!")

All done!


### Now that the dataset is generated, let's run a few checks to see if the data contains what we need

In [12]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def set_axes_equal(ax):
    '''Make axes of 3D plot have equal scale so that spheres appear as spheres,
    cubes as cubes, etc..  This is one possible solution to Matplotlib's
    ax.set_aspect('equal') and ax.axis('equal') not working for 3D.
    
    Function based on https://stackoverflow.com/questions/13685386/
    matplotlib-equal-unit-length-with-equal-aspect-ratio-z-axis-is-not-equal-to

    Input
      ax: a matplotlib axis, e.g., as output from plt.gca().
    '''

    x_limits = ax.get_xlim3d()
    y_limits = ax.get_ylim3d()
    z_limits = ax.get_zlim3d()

    x_range = abs(x_limits[1] - x_limits[0])
    x_middle = np.mean(x_limits)
    y_range = abs(y_limits[1] - y_limits[0])
    y_middle = np.mean(y_limits)
    z_range = abs(z_limits[1] - z_limits[0])
    z_middle = np.mean(z_limits)

    # The plot bounding box is a sphere in the sense of the infinity
    # norm, hence I call half the max range the plot radius.
    plot_radius = 0.5*max([x_range, y_range, z_range])

    ax.set_xlim3d([x_middle - plot_radius, x_middle + plot_radius])
    ax.set_ylim3d([y_middle - plot_radius, y_middle + plot_radius])
    ax.set_zlim3d([z_middle - plot_radius, z_middle + plot_radius])

We'll start off by creating a 3D scatter plot of an example sample. 

In [13]:
%matplotlib qt
# to open plot externally and use it interactively
show_entry = 0
cam_entry = 0

fig = plt.figure()
ax = fig.add_subplot(projection='3d')

display_points_3D = out_df.loc[show_entry]["key_points_3D"]
display_img = cv2.imread(dataset_img[show_entry])

for i,xyz in enumerate(display_points_3D):
    if out_df.loc[show_entry]["visibility"][i] == 1:
        ax.scatter(xyz[0], xyz[1], xyz[2], marker='o',s=10)

"""
# also plot the camera location
ax.scatter(out_df.loc[cam_entry]["cam_trans"][0], 
           out_df.loc[cam_entry]["cam_trans"][1], 
           out_df.loc[cam_entry]["cam_trans"][2], marker='x')
"""

ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')

# use custom function to ensure equal axis proportions
set_axes_equal(ax)

# opens external plot
plt.title(out_df.loc[show_entry]["file_name"])
plt.show()

And finally show that the provided camera intrinsics and extrinsics allow us to project the 
subject 3D coordinates in world space into the equivalent screen 2D coordinates in pixel space.

In [15]:
#%matplotlib qt
# file_name	cam_rot	cam_trans	cam_intrinsics	bounding_box	key_points_3D	key_points_2D	visibility
R = np.array(out_df.loc[cam_entry]["cam_rot"])
T = np.reshape(np.array(out_df.loc[cam_entry]["cam_trans"]),(3,1))
C = np.array(out_df.loc[cam_entry]["cam_intrinsics"])

#plt.plot(X_2d[0, :], X_2d[1, :], '.')  # plot the locations of the 3D keypoints in 2D as viewed from the camera
#plt.show()

fig = plt.figure()
ax = fig.add_subplot()

for i, x in enumerate(display_points_3D):
    X = np.reshape(np.array(out_df.loc[show_entry]["key_points_3D"][i]),(3,-1))

    # given the above data, it should be possible to project the 3D points into the corresponding image,
    # so they land in the correct position on the image 
    P = C @ np.hstack([R, T])  # projection matrix
    X_hom = np.vstack([X, np.ones(X.shape[1])])  # 3D points in homogenous coordinates

    X_hom = P @ X_hom  # project the 3D points
    
    X_2d = X_hom[:2, :] / X_hom[2, :]  # convert them back to 2D pixel space
    
    gt_x_2d = out_df.loc[show_entry]["key_points_2D"][i][0]
    gt_y_2d = out_df.loc[show_entry]["key_points_2D"][i][1]
    
    ax.scatter(gt_x_2d, gt_y_2d, marker='o', s=20)
    ax.scatter(X_2d[0], display_img.shape[1]-X_2d[1], marker='^', s=8)

ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')

ax.set_xlim([0,display_img.shape[0]])
ax.set_ylim([0,display_img.shape[1]])
ax.set_aspect('equal')

ax.invert_yaxis()

# opens external plot
plt.title(str(out_df.loc[show_entry]["file_name"]) + " projected")
plt.show()