# Chinese Traffic Sign Database Transformation & Packaging

This notebook handles the import, transformation and packaging of the Chinese Traffic Sign Dataset test set (http://www.nlpr.ia.ac.cn/pal/trafficdata/index.html) so it can be evaluated by the German Traffic Sign Dataset Classifier Spatial Transformer Network model proposed by the following repo (https://github.com/wolfapple/traffic-sign-recognition).

## Set-up the Environment

Import the required libraries

In [1]:
# Script specific libraries
import glob
import pickle
import numpy as np
import pandas as pd
import cv2
import os
from tqdm import tqdm
%matplotlib inline

Set the global variables

In [2]:
# Set global variables
# CHANGE THIS PATH TO WHERE CHINESE TEST SET IMAGES ARE LOCATED IN YOUR LOCAL PC TO WORK
test_images_path = "C:/Users/653211/Downloads/TSRD-Test/"
china_data_path = "china_data"

Import the data files and save them in appropriate data structures

In [3]:
# Read & format image annotations text file
china_test_annotation = pd.read_csv('TsignRecgTest1994Annotation.txt', sep=";", header=None, usecols = [i for i in range(8)])
china_test_annotation.columns = ["file_name", "width", "height", "c1", "c2", "c3", "c4", "label"]

## Format & Transform the Files Data

Visualize the Annotation Dataframe information

In [4]:
china_test_annotation

Unnamed: 0,file_name,width,height,c1,c2,c3,c4,label
0,000_0001_j.png,50,47,14,9,35,35,0
1,000_0002_j.png,85,89,23,15,64,66,0
2,000_0003_j.png,93,83,17,17,67,67,0
3,000_0004_j.png,181,171,27,25,146,140,0
4,000_0005_j.png,180,167,32,27,151,144,0
...,...,...,...,...,...,...,...,...
1989,056_1_0018_1_j.png,122,94,25,20,80,79,56
1990,056_1_0019_1_j.png,224,207,39,39,188,178,56
1991,056_1_0020_1_j.png,128,115,32,30,89,79,56
1992,057_1_0001_1_j.png,100,95,21,22,74,75,57


### Intersect Chinese Traffic Signals with German Traffic Signals

We need to replace the Chinese labels with the German Traffic Sign equivalent labels. Naturally only a determined subset of the Chinese traffic signs will intersect with their German twins. So all of the signals that belong only to the Chinese traffic signals will be dropped from the test set and only the intersected signals will be preserved.

In [5]:
# Create a copy for the intersected annotation Data Frame so we preserve the raw annotation
china_test_intersection = china_test_annotation.copy()
china_test_intersection

Unnamed: 0,file_name,width,height,c1,c2,c3,c4,label
0,000_0001_j.png,50,47,14,9,35,35,0
1,000_0002_j.png,85,89,23,15,64,66,0
2,000_0003_j.png,93,83,17,17,67,67,0
3,000_0004_j.png,181,171,27,25,146,140,0
4,000_0005_j.png,180,167,32,27,151,144,0
...,...,...,...,...,...,...,...,...
1989,056_1_0018_1_j.png,122,94,25,20,80,79,56
1990,056_1_0019_1_j.png,224,207,39,39,188,178,56
1991,056_1_0020_1_j.png,128,115,32,30,89,79,56
1992,057_1_0001_1_j.png,100,95,21,22,74,75,57


In [6]:
# Create a data dictionary to hold the equivalencies between Chinese and German labels
# Keys represent Chinese labels whereas associated values represent German labels
label_equiv = { 2: 1,
                4: 2,
                5: 3,
                6: 4,
                7: 5,
                53: 15,
                55: 17,
                24: 33,
                21: 35,
                22: 34,
                20: 36,
                26: 38,
                25: 39,
                27: 40  }

# Create a list of the Chinese labels to keep from Data Dictionary
labels_preserve = list(label_equiv.keys())
labels_preserve

[2, 4, 5, 6, 7, 53, 55, 24, 21, 22, 20, 26, 25, 27]

Drop all the rows that contain images that do not intersect with the German signals.

In [7]:
keep_elements = [x for x in china_test_annotation.label if x in labels_preserve]

china_test_intersection.set_index('label', inplace = True)
china_test_intersection.drop(china_test_intersection.index.difference(keep_elements),axis=0,inplace=True)
china_test_intersection.reset_index(inplace=True)
china_test_intersection = china_test_intersection [["file_name", "width", "height", "c1", "c2", "c3", "c4", "label"]]

china_test_intersection

Unnamed: 0,file_name,width,height,c1,c2,c3,c4,label
0,002_0001_j.png,186,170,37,33,141,138,2
1,002_0002_j.png,127,119,29,21,99,93,2
2,002_0003_j.png,271,252,46,35,224,218,2
3,002_0004_j.png,95,88,20,16,69,67,2
4,002_0005_j.png,103,94,25,21,76,74,2
...,...,...,...,...,...,...,...,...
511,055_1_0025_1_j.png,88,93,30,25,66,71,55
512,055_1_0026_1_j.png,93,97,21,23,74,75,55
513,055_1_0027_1_j.png,104,99,25,20,74,69,55
514,055_1_0028_1_j.png,141,134,24,19,110,103,55


Save the names of the resulting intersected files

In [8]:
china_test_file_names = (china_test_intersection.iloc[:,0]).tolist()
china_test_file_names

['002_0001_j.png',
 '002_0002_j.png',
 '002_0003_j.png',
 '002_0004_j.png',
 '002_0005_j.png',
 '002_0006_j.png',
 '002_0007_j.png',
 '002_0008_j.png',
 '002_0009_j.png',
 '002_0010_j.png',
 '002_0011_j.png',
 '002_0012_j.png',
 '002_0014.png',
 '002_0015.png',
 '002_0016.png',
 '002_0017.png',
 '002_0018.png',
 '002_0019.png',
 '002_0020.png',
 '002_0021.png',
 '002_0022.png',
 '002_0023.png',
 '002_0024.png',
 '002_0025.png',
 '002_0026.png',
 '002_0027.png',
 '002_0028.png',
 '002_0029.png',
 '002_0030.png',
 '002_0031.png',
 '004_0001_j.png',
 '004_0002_j.png',
 '004_0003_j.png',
 '004_0004_j.png',
 '004_0005_j.png',
 '004_0006_j.png',
 '004_0007_j.png',
 '004_0008_j.png',
 '004_0009_j.png',
 '004_0010_j.png',
 '004_0011_j.png',
 '004_0012_j.png',
 '004_0013_j.png',
 '004_0014_j.png',
 '004_0015_j.png',
 '004_0016_j.png',
 '004_0017_j.png',
 '004_0018_j.png',
 '004_0019_j.png',
 '004_0020_j.png',
 '004_0021_j.png',
 '004_0022_j.png',
 '004_0023_j.png',
 '004_0024_j.png',
 '004_0025

Save the names of the resulting intersected full paths of image files

In [9]:
china_test_file_paths = [f'{test_images_path}{file_name}' for file_name in china_test_file_names]
china_test_file_paths

['C:/Users/653211/Downloads/TSRD-Test/002_0001_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0002_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0003_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0004_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0005_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0006_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0007_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0008_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0009_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0010_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0011_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0012_j.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0014.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0015.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0016.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0017.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0018.png',
 'C:/Users/653211/Downloads/TSRD-Test/002_0019.png',
 'C:/Users/653211/Down

With the information of the full path name of each intersected image saved, we can proceed to replace all the original Chinese labels safely.

In [10]:
# Replace Chinese labels with their German label equivalent
china_test_intersection['label'] = china_test_intersection['label'].apply( lambda x : label_equiv[x])
china_test_intersection

Unnamed: 0,file_name,width,height,c1,c2,c3,c4,label
0,002_0001_j.png,186,170,37,33,141,138,1
1,002_0002_j.png,127,119,29,21,99,93,1
2,002_0003_j.png,271,252,46,35,224,218,1
3,002_0004_j.png,95,88,20,16,69,67,1
4,002_0005_j.png,103,94,25,21,76,74,1
...,...,...,...,...,...,...,...,...
511,055_1_0025_1_j.png,88,93,30,25,66,71,17
512,055_1_0026_1_j.png,93,97,21,23,74,75,17
513,055_1_0027_1_j.png,104,99,25,20,74,69,17
514,055_1_0028_1_j.png,141,134,24,19,110,103,17


With the intersected Data Frame completed, we can proceed to import all the intersected image files safely.

In [11]:
# Save all images in folder into a list of image channel information (raw pixel data)
china_test_features = [cv2.imread(file) for file in china_test_file_paths]
#china_test_features = [cv2.imread(file) for file in glob.glob(test_images_path)]

### Fill Image Sizes Information

Get the sizes of the original not-resized images and save them in a list of tuples.

In [12]:
# Create empty list to store size tuples
china_test_sizes = []

# Generate a list of 2-tuples containing (width, height) per image
for image in china_test_features:
    china_test_sizes.append(image.shape[:-1])

# Convert list of native Python tuples into a numpy array to match pickled data format
china_test_sizes = np.array(china_test_sizes, dtype='uint8')
print(type(china_test_sizes))    

<class 'numpy.ndarray'>


### Fill Image Channels Information

Main script expected pickled data contains (32 x 32) resized images so we resize the images to match the model input requirements.

In [13]:
# Main script expected pickled data contains (32 x 32) resized images
# We resized these images as well before packging to match expected format
china_test_features = list( map(lambda img : cv2.resize(img, (32, 32)), china_test_features) )

In [14]:
# Check the shape of an traffic sign image to validate reshaping
china_image_shape = china_test_features[0].shape[:-1]
print(china_image_shape)

(32, 32)


In [15]:
# Convert the list of image numpy arrays into a 4-dim numpy array to match pickled data format
china_test_features = np.array( china_test_features )

### Fill Labels Coordinates Information

Save labels into separate numpy array

In [16]:
# Save the coordinates columns as a separate numpy array
china_test_labels = china_test_intersection.iloc[:,7].values
china_test_labels = china_test_labels.astype('uint8')
china_test_labels

array([ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,
        2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,
        2,  2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  3,  3,  3,  3,
        3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  4,
        4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  5,  5,  5,
        5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
        5,  5,  5,  5,  5, 36, 35, 35, 35, 35, 35, 35, 34, 34, 34, 34, 33,
       33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 39, 38, 38, 38, 38,
       38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38,
       38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38,
       38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38,
       38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 40, 40, 40, 40, 40,
       40, 40, 40, 40, 40

### Fill Image Coordinates Information

Save coordiantes into separate numpy array

In [17]:
# Save the coordinates columns as a separate numpy array
china_test_coords = china_test_intersection.iloc[:,3:7].values
china_test_coords = china_test_coords.astype('uint8')
china_test_coords

array([[ 37,  33, 141, 138],
       [ 29,  21,  99,  93],
       [ 46,  35, 224, 218],
       ...,
       [ 25,  20,  74,  69],
       [ 24,  19, 110, 103],
       [ 40,  34, 123, 119]], dtype=uint8)

## Package the Data

Package the 4 information data structures into a data dictionary (hash map) matching German dataset pickled data.

In [18]:
# Package the 4 information data structures into a data dictionary (hash map)
china_test = {'features': china_test_features,
              'labels': china_test_labels, 
              'sizes': china_test_sizes, 
              'coords': china_test_coords}

In [19]:
# Create folder to store Chinese data if it does not exist
if not os.path.isdir(china_data_path):
    os.makedirs(china_data_path)
    
# Save the test image data dictionary into a pickled file
with open(f"{china_data_path}/test.p", 'wb') as handle:
    pickle.dump(china_test, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [20]:
# Open pickled file for sanity check
with open(f"{china_data_path}/test.p", 'rb') as handle:
    read_pickle = pickle.load(handle)

# Print extracted info from pickled file as a sanity check
read_pickle

{'features': array([[[[ 38,  20,  34],
          [ 18,  18,  15],
          [ 22,  21,  16],
          ...,
          [ 15,  33,  51],
          [ 13,  32,  49],
          [ 17,  32,  50]],
 
         [[ 45,  27,  41],
          [ 99, 100,  96],
          [ 72,  70,  65],
          ...,
          [ 29,  45,  63],
          [ 17,  34,  51],
          [ 14,  29,  48]],
 
         [[ 47,  29,  43],
          [ 94,  96,  91],
          [ 77,  74,  69],
          ...,
          [ 17,  28,  47],
          [ 11,  23,  42],
          [ 10,  25,  43]],
 
         ...,
 
         [[ 32,  41,  60],
          [ 40,  49,  68],
          [ 40,  44,  64],
          ...,
          [ 18,  35,  61],
          [ 17,  34,  60],
          [ 18,  35,  61]],
 
         [[ 27,  36,  56],
          [ 36,  45,  64],
          [ 39,  43,  63],
          ...,
          [ 20,  37,  63],
          [ 18,  35,  61],
          [ 20,  37,  63]],
 
         [[ 25,  34,  53],
          [ 34,  43,  62],
          [ 31,  4

In [21]:
# Print the length of all the arrays in the data dictionary keys as a last sanity check
print(len(read_pickle['features']))
print(len(read_pickle['labels']))
print(len(read_pickle['sizes']))
print(len(read_pickle['coords']))

516
516
516
516


## Greyscale-Transformed Pickled File Generation

The pickled file data used by the script's model evaluation method expects a even more processed image data dictionary. This processing involves a CLAHE grayscale transformation, so we need to perform it and save its respective file as well in order to evaluate.

In [22]:
# Define a class to perform a CLAHE grayscale image just as German model
class CLAHE_GRAY:
    def __init__(self, clipLimit=2.5, tileGridSize=(8, 8)):
        self.clipLimit = clipLimit
        self.tileGridSize = tileGridSize

    def __call__(self, im):
        img_y = cv2.cvtColor(im, cv2.COLOR_RGB2YCrCb)[:, :, 0]
        clahe = cv2.createCLAHE(clipLimit=self.clipLimit,
                                tileGridSize=self.tileGridSize)
        img_y = clahe.apply(img_y)
        img_output = img_y.reshape(img_y.shape + (1,))
        return img_output

In [23]:
# Set components to create grayscale image data dictionary
X = china_test['features']
y = china_test['labels']

# Create instance of a CLAHE graysccale transformer
clahe = CLAHE_GRAY()

# Apply the transformer to every test image
for i in tqdm(range(len(X)), desc=f"Processing Chinese test dataset"):
    X[i] = clahe(X[i])

X = X[:, :, :, 0]

Processing Chinese test dataset: 100%|████████████████████████████████████████████| 516/516 [00:00<00:00, 14792.30it/s]


In [24]:
# Create folder to store Chinese data if it does not exist
if not os.path.isdir(china_data_path):
    os.makedirs(china_data_path)

# Save the test grayscale image data dictionary into a pickled file
with open(f"{china_data_path}/test_gray.p", "wb") as f:
    pickle.dump({"features": X.reshape(X.shape + (1,)), "labels": y}, f)

In [25]:
# Open pickled file for sanity check
with open(f"{china_data_path}/test_gray.p", 'rb') as handle:
    read_pickle = pickle.load(handle)

# Print extracted info from pickled file as a sanity check
read_pickle

{'features': array([[[[ 64],
          [ 32],
          [ 48],
          ...,
          [112],
          [ 96],
          [112]],
 
         [[ 96],
          [239],
          [143],
          ...,
          [187],
          [128],
          [ 80]],
 
         [[112],
          [223],
          [159],
          ...,
          [ 88],
          [ 48],
          [ 64]],
 
         ...,
 
         [[ 88],
          [171],
          [151],
          ...,
          [ 90],
          [ 76],
          [ 88]],
 
         [[ 64],
          [143],
          [143],
          ...,
          [108],
          [ 80],
          [112]],
 
         [[ 32],
          [112],
          [ 80],
          ...,
          [ 68],
          [ 64],
          [ 96]]],
 
 
        [[[ 64],
          [ 96],
          [159],
          ...,
          [151],
          [112],
          [112]],
 
         [[ 48],
          [ 80],
          [255],
          ...,
          [199],
          [159],
          [ 96]],
 
         

In [26]:
# Print the length of all the arrays in the data dictionary keys as a last sanity check
print(len(read_pickle['features']))
print(len(read_pickle['labels']))

516
516


## Use Greyscale-Transformed Pickled File to Evaluate Model on Chinese Test Set

Now that we have the final processed and packaged imgae test files for the Chinese dataset. We can proceed to evaluate the German dataset trained model performance on classifying the intersected signals of the Chinese Dataset.

For this, we need to run the evaluation script (evaluate.py) with the command-line-argument parameter 'data' set as 'china-data' so the script looks for our generated Chinese dataset files instead of the default route 'data' that looks for the German information.

After this you can just see this results of classification performance in the terminal.