# GTSRB Dataset Conversion Script

- Dataset: [GTSRB Dataset](https://benchmark.ini.rub.de/gtsrb_dataset.html)
- Original Dataset contains images in `.ppm` format, but __Keras__ supports image in `.jpeg`, `.jpg`, `.png`, `.bmp`, `.gif` formats.
- So we can convert every image to `.png` format and change the filenames of images in `.csv` files.

## Train
---

#### Download `GTSRB_Final_Training_Images` Dataset and Unzip

In [1]:
!wget https://sid.erda.dk/public/archives/daaeac0d7ce1152aea9b61d9f1e19370/GTSRB_Final_Training_Images.zip
!unzip -q GTSRB_Final_Training_Images.zip -d data/

--2024-03-13 14:18:20--  https://sid.erda.dk/public/archives/daaeac0d7ce1152aea9b61d9f1e19370/GTSRB_Final_Training_Images.zip
Resolving sid.erda.dk (sid.erda.dk)... 130.225.104.13
Connecting to sid.erda.dk (sid.erda.dk)|130.225.104.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 276294756 (263M) [application/zip]
Saving to: ‘GTSRB_Final_Training_Images.zip’


2024-03-13 14:24:48 (699 KB/s) - ‘GTSRB_Final_Training_Images.zip’ saved [276294756/276294756]



#### Change images to `png`

In [2]:
from PIL import Image
import glob
import os
from tqdm import tqdm

for img in tqdm(glob.glob("data/GTSRB/Final_Training/Images/*/*.ppm")):
    im = Image.open(img)
    fname = img.strip(".ppm") + ".png"
    im.save(fname)

100%|████████████████████████████████████| 39209/39209 [01:12<00:00, 541.87it/s]


#### Remove .ppm files

In [3]:
for img in tqdm(glob.glob("data/GTSRB/Final_Training/Images/*/*.ppm")):
    os.remove(img)

100%|██████████████████████████████████| 39209/39209 [00:01<00:00, 24306.59it/s]


#### Rename filenames in csv file

Using Stream EDitor (sed)
- https://www.cyberciti.biz/faq/how-to-use-sed-to-find-and-replace-text-in-files-in-linux-unix-shell/

In [6]:
!sed -i 's/\.ppm/.png/g' data/GTSRB/Final_Training/Images/*/*.csv | echo Done

Done


## Test
---

#### Download `GTSRB_Final_Test_Images` Dataset and Unzip

In [7]:
!wget https://sid.erda.dk/public/archives/daaeac0d7ce1152aea9b61d9f1e19370/GTSRB_Final_Test_Images.zip
!unzip -q GTSRB_Final_Test_Images.zip -d data/

--2024-03-13 14:32:55--  https://sid.erda.dk/public/archives/daaeac0d7ce1152aea9b61d9f1e19370/GTSRB_Final_Test_Images.zip
Resolving sid.erda.dk (sid.erda.dk)... 130.225.104.13
Connecting to sid.erda.dk (sid.erda.dk)|130.225.104.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 88978620 (85M) [application/zip]
Saving to: ‘GTSRB_Final_Test_Images.zip’


2024-03-13 14:34:48 (780 KB/s) - ‘GTSRB_Final_Test_Images.zip’ saved [88978620/88978620]



#### Change images to `png`

In [9]:
from PIL import Image
import glob
import os
from tqdm import tqdm

for img in tqdm(glob.glob("data/GTSRB/Final_Test/Images/*.ppm")):
    im = Image.open(img)
    fname = img.strip(".ppm") + ".png"
    im.save(fname)

100%|████████████████████████████████████| 12630/12630 [00:23<00:00, 542.71it/s]


#### Remove .ppm files

In [13]:
for img in tqdm(glob.glob("data/GTSRB/Final_Test/Images/*.ppm")):
    os.remove(img)

100%|██████████████████████████████████| 12630/12630 [00:00<00:00, 17935.33it/s]


#### Rename filenames in csv file

Using Stream EDitor (sed)
- https://www.cyberciti.biz/faq/how-to-use-sed-to-find-and-replace-text-in-files-in-linux-unix-shell/

In [14]:
!sed -i 's/\.ppm/.png/g' data/GTSRB/Final_Test/Images/*.csv | echo Done

Done


## Test Ground Truth
---
only required if we need to test predictions with their ground truth

#### Download `GTSRB_Final_Test_GT` Dataset and Unzip

In [None]:
!wget https://sid.erda.dk/public/archives/daaeac0d7ce1152aea9b61d9f1e19370/GTSRB_Final_Test_GT.zip
!unzip -q GTSRB_Final_Test_GT.zip -d data/GTSRB/Final_Test/