# Explore the dataset


In this notebook, we will perform an EDA (Exploratory Data Analysis) on the processed Waymo dataset (data in the `processed` folder). In the first part, you will create a function to display 

In [2]:
from utils import get_dataset

In [3]:
dataset = get_dataset("/data/waymo/*.tfrecord")

INFO:tensorflow:Reading unweighted datasets: ['/data/waymo/*.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['/data/waymo/*.tfrecord']
INFO:tensorflow:Number of filenames to read: 103
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
Instructions for updating:
Use `tf.data.Dataset.map()


## Write a function to display an image and the bounding boxes

Implement the `display_instances` function below. This function takes a batch as an input and display an image with its corresponding bounding boxes. The only requirement is that the classes should be color coded (eg, vehicles in red, pedestrians in blue, cyclist in green).

In [6]:
import tensorflow.compat.v1 as tf
import numpy as np
import matplotlib
matplotlib.use('TkAgg') #Images cannot be displayed without this line
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

def display_instances(batch):
    """
    This function takes a batch from the dataset and display the image with 
    the associated bounding boxes.
    """
    # ADD CODE HERE
    #Color mapping of classes
    colormap = {1:[1,0,0], 2:[0,1,0], 4:[0,0,1]}
    
    small_batch = batch.take(20)

    for sample in small_batch:        
        image = sample['image'].numpy()
        print(sample['filename'])
        
        #get the size of the image
        width, height, _ = tf.shape(sample['image']).numpy()
        
        #create figure and axes
        fig, ax = plt.subplots()
        ax.imshow(image)
        
        #Get groundtruth boxes and classes
        bboxes  = sample['groundtruth_boxes']
        classes = sample['groundtruth_classes']
        
        #Create rectangular patches for each of the boxes
        for cl, bb in zip(classes,bboxes):
            #Get rectangle coordinates
            y1, x1, y2, x2 = bb.numpy()
            #Correct reactangle coordinates
            # ** width and height are the current image size
            # ** 1920 and 1080 are the orginal image size (*.tfrecord) 
            #    and 79 is the offset in the y-direction
            y1 = y1 * height * (height/1080) - 79
            y2 = y2 * height * (height/1080) - 79
            x1 = x1 * width  * (width/1920)
            x2 = x2 * width  * (width/1920)
            #Create and add rectangles to the image
            rec = Rectangle((x1,y1), x2 - x1, y2 - y1, facecolor='none',
                           edgecolor=colormap[cl.numpy()])
            ax.add_patch(rec)
                
        plt.show()

## Display 10 images 

Using the dataset created in the second cell and the function you just coded, display 10 random images with the associated bounding boxes. You can use the methods `take` and `shuffle` on the dataset.

In [7]:
## STUDENT SOLUTION HERE
display_instances(dataset)

tf.Tensor(b'segment-12012663867578114640_820_000_840_000_with_camera_labels_34.tfrecord', shape=(), dtype=string)
tf.Tensor(b'segment-10226164909075980558_180_000_200_000_with_camera_labels_81.tfrecord', shape=(), dtype=string)
tf.Tensor(b'segment-11236550977973464715_3620_000_3640_000_with_camera_labels_14.tfrecord', shape=(), dtype=string)
tf.Tensor(b'segment-10075870402459732738_1060_000_1080_000_with_camera_labels_38.tfrecord', shape=(), dtype=string)
tf.Tensor(b'segment-1022527355599519580_4866_960_4886_960_with_camera_labels_92.tfrecord', shape=(), dtype=string)
tf.Tensor(b'segment-11343624116265195592_5910_530_5930_530_with_camera_labels_9.tfrecord', shape=(), dtype=string)
tf.Tensor(b'segment-11126313430116606120_1439_990_1459_990_with_camera_labels_5.tfrecord', shape=(), dtype=string)
tf.Tensor(b'segment-10975280749486260148_940_000_960_000_with_camera_labels_41.tfrecord', shape=(), dtype=string)
tf.Tensor(b'segment-10793018113277660068_2714_540_2734_540_with_camera_labels_7.t

## Additional EDA

In this last part, you are free to perform any additional analysis of the dataset. What else would like to know about the data?
For example, think about data distribution. So far, you have only looked at a single file...

In [9]:
import tensorflow.compat.v1 as tf
import numpy as np
import matplotlib
matplotlib.use('TkAgg') #Images cannot be displayed without this line
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

def display_instances_mosaic(batch):
    """
    This function takes a batch from the dataset and display several images with 
    the associated bounding boxes.
    """
    # ADD CODE HERE
    #Color mapping of classes
    colormap = {1:[1,0,0], 2:[0,1,0], 4:[0,0,1]}
    
    small_batch = batch.take(20)
    #create figure and axes
    #Figure will display 20 images
    fig, ax = plt.subplots(4, 5, figsize=(20,10))
    #Image counter
    i_image = 0
    for sample in small_batch:        
        image = sample['image'].numpy()        
        #get the size of the image
        width, height, _ = tf.shape(sample['image']).numpy()
        #Image index in the mosaic
        x_image = i_image % 4
        y_image = i_image % 5
        ax[x_image, y_image].imshow(image)
        
        #Get groundtruth boxes and classes
        bboxes  = sample['groundtruth_boxes']
        classes = sample['groundtruth_classes']        
        
        #Create rectangular patches for each of the boxes
        for cl, bb in zip(classes,bboxes):                                                
            #Get rectangle coordinates
            y1, x1, y2, x2 = bb.numpy()
            #Correct reactangle coordinates
            # ** width and height are the current image size
            # ** 1920 and 1080 are the orginal image size (*.tfrecord) 
            #    and 79 is the offset in the y-direction
            y1 = y1 * height * (height/1080) - 79
            y2 = y2 * height * (height/1080) - 79
            x1 = x1 * width  * (width/1920)
            x2 = x2 * width  * (width/1920)
            #Create and add rectangles to the image
            rec = Rectangle((x1,y1), x2 - x1, y2 - y1, facecolor='none',
                           edgecolor=colormap[cl.numpy()])
            ax[x_image, y_image].add_patch(rec)
            
        ax[x_image, y_image].axis('off')
        #Update image counter
        i_image += 1
        
    plt.tight_layout()            
    plt.show()

In [15]:
display_instances_mosaic(dataset)