<img src="https://10botics.com/logo_jnb.png" width="300"/>

# Understanding the data

## The data directory

In donkeycar, the data you collected are stored in a folder called `data` under `mycar`

Use the file explorer on the left, open `mycar` and execute the following cell

In [None]:
%cd ~/mycar
!ls

Use the following command to change directory to data and list all files and directory under `data`

In [None]:
%cd ~/mycar/data
!ls

Each time you collect data, they are stored in a directory called `Tub`. The directory name start with `tub` and then has a sequential number followed by the date it was collected (in YY-MM-DD format)

## Inside a Tub

Let's count how many tubs we have. We will use a library called `os` and import a function called `listdir` to list all the directories

In [None]:
from os import listdir

data_path = "/home/pi/mycar/data"

tubs = [d for d in listdir(data_path)]

len(tubs)

Let's use the first tub we find in the previous program

In [None]:
tub = tubs[0]
print(f"Tub name = {tub}")

Let's check what's in this tub. Note that we use `$tub` below, it is because we use the symbol `$` to refer a python variable

In [None]:
tub_path = f"{data_path}/{tub}"

%cd $tub_path
!ls

Inside a tub, there are a number of files.

1. A folder called `images`
2. Catalog files
3. `manifest.json`
4. `meta.json`
5. A histogram file end with _hist.png

### The `images` folder



Let's ls the images folder and see the first 3 files in this folder (sorted by filename). 

The linux command `head` can be used to display the first 3 lines in this file. You can change the number to display more lines.

In [None]:
%%bash 
ls images | head -100

We can use a library called `PIL` and its function `Image` to open one image file we have. 

Feel free to change the filename to see different images inside the tub.

In [None]:
from PIL import Image
from IPython.display import display

tub_path
img = Image.open(f"{tub_path}/images/0_cam_image_array_.jpg")
display(img)

We can use the `size` function to check the dimension of the image. It should be 160 x 120 pixels.

In [None]:
# check the size of the image
img.size

Write a for loop to display multiple images

In [None]:
for i in range(0,3):
    img = Image.open(f"{tub_path}/images/{i}_cam_image_array_.jpg")
    display(img)

In [None]:

# Display function
# ================
def play_tub(tub_name):
    import cv2
    from IPython.display import display

    display_handle=display(None, display_id=True)
    for i in range(0,300):
        image_path = f"{tub_path}/images/{i}_cam_image_array_.jpg"
        
        img =  cv2.imread(image_path)

        image_bgr = cv2.imread(image_path)
        
        # Convert the image from BGR to RGB
        image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
    
        image_pil = Image.fromarray(image_rgb)
        display_handle.update(image_pil)

play_tub(tub)

### Catalog

The catalog file contains a list of lines. Each line store the filename of each picture you have taken during recording, together with the time, angle and throttle at that particular moment. By default, each catalog contains at most 1000 lines. When one catalog file reach 1000 lines, another catalog file is automatically created. 

In [None]:
!head -3 catalog_0.catalog

What is the format of the catalog file used to store data?

It is called JSON Lines. JSON Lines format is similar to the CSV format where each line in the file represents a separate JSON object. 

For more details, check https://jsonlines.org/

### Manifest.json

The `manifest.json` stores how many catalog are presented in this tub. In addition, it keeps track of the current index and the system will resume from this index if it continues recording. 

Note that there is a field called `deleted_indexes`, it is used for storing which images are deleted in this tub. We call this soft delete because the images are marked as deleled only but not physically deleted.

In [None]:
!head -10 manifest.json

### Meta.json

In [None]:
!head -10 meta.json

This file contains the following information:

- Size: Total size of this tub calculated in MB
- uuid : The uuid of this tub. Tub does not have an uuid until it is uploaded for remote training. If you delete this field, the tub will be uploaded again to our server when you use this tub for training. This is sometimes useful if you believe that the tub on our server is corrupted.

<hr/>

## Congratulation! You have finished this chapter.

This jupyter notebook is created by 10Botics. <br>
For permission to use in school, please contact info@10botics.com <br>
All rights reserved. 2024.