# Data Preprocessing for PS-GAN using cityscapes dataset

Input: The code requires a set of images from cityscapes dataset and the corresponding json files with bounding box pedestrian annotations  

Output: For each bounding box pedestrian annotation associated with an image, the bounding box region is replaced with noise and a cropped 256 x 256 image centered around the noisy region is created and saved to directory A.  The corresponding cropped region of the original image is also saved to directory B. The bounding box annotations are saved to a json file in the format required by PS-GAN.

Notes:  
- Noise is created with random black, white, grey pixels in equal probabilities
- Only includes cropped images where bounding box height is between 28 and 250 and bounding box width is between 28 and 250. (The smaller bounds of 28 differ from the paper, which specifies 70 min height and 25 min width, and the larger bound of 250 is our additional restriction).
- Paper seems to use bbox, however we use bboxVis
- Train/val/test data is all taken from all cities (they are not separated by city as in the orginal cityscapes dataset)

To Do:
- Figure out a way to include people who are > 250 pixels wide/tall
- Ensure that people who are <28 pixels are too small to be recognized, else figure out how to include them
- Include people who are annotated as "riders" (cyclists, wheelchair users, etc)
- Change bboxes_AB json format from {x,y,w,h} to {x1,y1,x2,y2} when code is changed
- Write explanation of how train/val/test set was computed
- Write explanation of directories [A,B,AB,bboxes_AB,images,bboxes]
- Write explanation of how to process data start to finish


Code:
- specify the depth of the U-net


## Original Images  
Training Dataset:
- Total number of input images: 2975
- Total number of people saved as cropped output images: 8191
- Total number of people annotations in input data: 16526
- Percent people included: 0.5

Validation Dataset:
- 500 Images

Cityscapes Test Dataset:
- 1525 Images
- Bounding Box annotations not provided


## Cropped Images
Dataset splits (cropped images - using w>= 25 and h>=70):  
- training data: 5383  
- validation data: 1042  
- test data: 1168  

Dataset splits 'split_w28h28' (cropped images - using w>= 28 and h>=28):
- training data: 5667
- validation data: 1063
- test data: 1235

In [1]:
import os
import json
import pickle
import cv2
import numpy as np
import torch

from PIL import Image
from IPython.display import display

IMAGE_COUNT = 0 # Total number of input images processed
TOTAL_PEOPLE = 0 # Total number of people annotated in dataset
INCLUDED_PEOPLE = 0 # Total number of people saved as cropped image

IMG_WIDTH = 2048
IMG_HEIGHT = 1024

TRAIN = 0
VAL = 0
TEST = 0

## Load train/val/test splits

In [2]:
## Load Training Dataset
with open('train_images.pkl', 'rb') as f:
    train_images = pickle.load(f)

## Load Validation Dataset
with open('val_images.pkl', 'rb') as f:
    val_images = pickle.load(f)

## Load Test Dataset
with open('test_images.pkl', 'rb') as f:
    test_images = pickle.load(f)

## Process images

In [3]:
def process_image_directory(directory_input, directory_output, split=True):
    global IMAGE_COUNT
    global TRAIN
    global VAL
    global TEST
    
    images = os.listdir('images/'+directory_input)
    if '.DS_Store' in images:
        images.remove('.DS_Store')
#     print(images)
    IMAGE_COUNT += len(images)

    for image_path in images:
        if split:
            ## Determine train/val/test
            if image_path in train_images:
                directory_output_img = 'train_' + directory_output
                TRAIN += 1
            elif image_path in val_images:
                directory_output_img = 'val_' + directory_output
                VAL += 1
            elif image_path in test_images:
                directory_output_img = 'test_' + directory_output
                TEST += 1
            else:
                print('ERROR file not found', image_path)
                raise  
        else:
            directory_output_img = directory_output
        
        ## Get Filename
        filename = image_path[:-16]

        ## Read Image
        img = cv2.imread('images/'+directory_input+'/'+image_path)
        assert img.shape == (1024, 2048, 3)

#         ## Display Image
#         img_display = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Converting BGR to RGB for display
#         display(Image.fromarray(img_display))


        ## Get Bounding Box Annotations
        bbox_file = 'bboxes/'+directory_input+'/'+filename+'_gtBboxCityPersons.json'

        bbox_list = []
        with open(bbox_file) as f:
            data = json.load(f)
            for obj in data['objects']:
                if obj['label'] == 'pedestrian':
#                     print(obj)
                    bbox_list.append(obj['bboxVis'])
                    
        process_single_image(img, bbox_list, directory_output_img, filename)


In [4]:
def process_single_image(img, bbox_list, directory_output, filename='img'):
    global TOTAL_PEOPLE
    
    for i, bbox in enumerate(bbox_list):
        TOTAL_PEOPLE += 1
        
        x, y, w, h = bbox   ## x,y is top left coordinate in bbox
        
        ## handle annotations mistake - if bounding box goes past right edge, shift x to fit
        pixels_past_right_edge = x + w - IMG_WIDTH  
        if pixels_past_right_edge > 0:
            x = x - pixels_past_right_edge - 1
            
        ## handle annotations mistake - if bounding box goes past bottom edge, shift y to fit
        pixels_past_bottom_edge = y + h - IMG_HEIGHT 
        if pixels_past_bottom_edge > 0:
            y = y - pixels_past_bottom_edge - 1
        
        ## Create filename for cropped img
        img_path = directory_output+'/'+filename+'_'+str(i)
        
        ## Only process if person is not too small/large within img
        if w >= 28 and w <= 250 and h >= 28 and h <= 250:
            replace_person_with_noise(img_path, img, x, x+w, y, y+h, w, h)
        
    return
    
# process_single_image(img, bbox_list)    

In [5]:
def save_bbox_json(x, y, w, h, img_path):
    bbox_dict = {'x':x, 'y':y, 'w':x+w+1, 'h':y+h+1}
    with open('bboxes_AB/'+img_path+'.json', 'w') as f:
        json.dump(bbox_dict, f)
    return

In [6]:
## Determine offsets (where to crop image wrt noisy patch)  
def compute_cropped_offsets(x_mid, y_mid):
    if x_mid < 128:
        x_left_offset =  x_mid
        x_right_offset = 256 - x_left_offset
    elif x_mid > (IMG_WIDTH - 128):
        x_right_offset =  (IMG_WIDTH - x_mid)
        x_left_offset = 256 - x_right_offset
    else:
        x_left_offset = 128
        x_right_offset = 256 - x_left_offset
    
    if y_mid < 128:
        y_top_offset =  y_mid
        y_bottom_offset = 256 - y_top_offset
    elif y_mid > (IMG_HEIGHT - 128):
        y_bottom_offset = (IMG_HEIGHT - y_mid)
        y_top_offset = 256 - y_bottom_offset
    else:
        y_top_offset = 128
        y_bottom_offset = 256 - y_top_offset
        
    return x_left_offset, x_right_offset, y_top_offset, y_bottom_offset


## Crop images (returns cropped original and noisy images with new x and y coordinates)
def crop_images(img, img_noisy, x_left, y_top, w, h):
    ## Find bbox mid point coordinates (coords on full-sized image)
    x_mid = x_left + (w // 2)
    y_mid = y_top + (h // 2)
    
    ## Determine offsets (where to crop image wrt noisy patch)
    x_left_offset, x_right_offset, y_top_offset, y_bottom_offset = compute_cropped_offsets(x_mid, y_mid)
        
    ## Crop original and noisy images around annotated person  
    cropped_img = img[y_mid-y_top_offset:y_mid+y_bottom_offset, x_mid-x_left_offset:x_mid+x_right_offset, :]
    cropped_noisy_img = img_noisy[y_mid-y_top_offset:y_mid+y_bottom_offset, x_mid-x_left_offset:x_mid+x_right_offset, :]
    
    ## Compute new X,Y coordinates after cropping image
    new_x = x_left - (x_mid - x_left_offset)
    new_y = y_top - (y_mid - y_top_offset)
    
    return cropped_img, cropped_noisy_img, new_x, new_y
    

In [7]:
def replace_person_with_noise(img_path, img, x_left, x_right, y_top, y_bottom, w, h):
    global INCLUDED_PEOPLE
#     print()
#     print(img_path)

    img_noisy = img.copy()
    
    ## Create noise patch in Black & White & Grey
    randnoise_bw = np.random.choice([0,127,255], img_noisy[y_top:y_bottom+1, x_left:x_right+1, :].shape[:2])
    randnoise_bw = np.dstack((randnoise_bw,randnoise_bw,randnoise_bw))
    
    ## Add noise patch to image
    img_noisy[y_top:y_bottom+1, x_left:x_right+1, :] = randnoise_bw

    ## Crop images (returns cropped original and noisy images with new x and y coordinates)
    cropped_img, cropped_noisy_img, new_x, new_y = crop_images(img, img_noisy, x_left, y_top, w, h)

        
    ## Save/Display Final Cropped Images
    if cropped_img.shape == (256,256,3):
        # Save Images
        cv2.imwrite('A/'+img_path+'.png', cropped_img)
        cv2.imwrite('B/'+img_path+'.png', cropped_noisy_img)
        
        ## Save bbox json with new x,y coordinates
        save_bbox_json(new_x, new_y, w, h, img_path)

        INCLUDED_PEOPLE += 1

#         ## Display Image
#         display(Image.fromarray(cv2.cvtColor(cropped_img, cv2.COLOR_BGR2RGB)))
#         display(Image.fromarray(cv2.cvtColor(cropped_noisy_img, cv2.COLOR_BGR2RGB)))

    else:
        print(img_path)
        print('skip image -- cropped_img.shape', cropped_img.shape, 'w', w, 'h', h)
        try:
            display(Image.fromarray(cv2.cvtColor(cropped_img, cv2.COLOR_BGR2RGB)))
            display(Image.fromarray(cv2.cvtColor(cropped_noisy_img, cv2.COLOR_BGR2RGB)))
        except:
            pass


    return
    

In [8]:
def create_directories(output_directory, split=True):
    if split:
        if not os.path.exists('A/train_'+output_directory):
            os.makedirs('A/train_'+output_directory)
            os.makedirs('B/train_'+output_directory)
            os.makedirs('AB/train_'+output_directory)
            os.makedirs('A/val_'+output_directory)
            os.makedirs('B/val_'+output_directory)
            os.makedirs('AB/val_'+output_directory)
            os.makedirs('A/test_'+output_directory)
            os.makedirs('B/test_'+output_directory)
            os.makedirs('AB/test_'+output_directory)
        if not os.path.exists('bboxes_AB/train_'+output_directory):
            os.makedirs('bboxes_AB/train_'+output_directory)
            os.makedirs('bboxes_AB/val_'+output_directory)
            os.makedirs('bboxes_AB/test_'+output_directory)
    else:
        if not os.path.exists('A/'+output_directory):
            os.makedirs('A/'+output_directory)
            os.makedirs('B/'+output_directory)
            os.makedirs('AB/'+output_directory)
        if not os.path.exists('bboxes_AB/'+output_directory):
            os.makedirs('bboxes_AB/'+output_directory)
    return
    

---
## Example - Single Directory

In [10]:
IMAGE_COUNT = 0
INCLUDED_PEOPLE = 0
TOTAL_PEOPLE = 0

example_directory = 'demo'
example_directory_output = 'demo'
split = False

create_directories(example_directory_output, split)
            
process_image_directory(example_directory, example_directory_output, split)

print()
print('------ COMPLETE ------')


------ COMPLETE ------


In [11]:
print('Total number of people saved as cropped output images:', INCLUDED_PEOPLE)
print('Total number of people annotations in input data:', TOTAL_PEOPLE)
print('Percent people included:', round(INCLUDED_PEOPLE/TOTAL_PEOPLE,2))

Total number of people saved as cropped output images: 6
Total number of people annotations in input data: 11
Percent people included: 0.55


Total number of people saved as cropped output images: 1311
Total number of people annotations in input data: 4096
Percent people included: 0.32

---
## Run all images

In [None]:
IMAGE_COUNT = 0
INCLUDED_PEOPLE = 0
TOTAL_PEOPLE = 0

ignore_directories = ['.DS_Store', 'demo', 'test']

## Specify directory name to save output files
output_directory = 'split_w28h28_v3'

## Specify whether to split into train/val/test
split = True

## Create directories to save output files
create_directories(output_directory, split)
            
## Process all images in 'images' directory unless subfolder is included in 'ignore_directories' list         
for directory in os.listdir('images/'):
    if directory not in ignore_directories:
        print()
        print(directory)
        process_image_directory(directory, output_directory, split)
        
print()
print('------ COMPLETE ------')

In [None]:
print('Total number of input images:', IMAGE_COUNT)
print('Total number of people saved as cropped output images:', INCLUDED_PEOPLE)
print('Total number of people annotations in input data:', TOTAL_PEOPLE)
print('Percent people included:', round(INCLUDED_PEOPLE/TOTAL_PEOPLE,2))

In [None]:
all_test_images = os.listdir('A/test_' + output_directory)
all_train_images = os.listdir('A/train_' + output_directory)
all_val_images = os.listdir('A/val_' + output_directory)

print('output_directory', output_directory)
print('num train images',len(all_train_images))
print('num val images',len(all_val_images))
print('num test images',len(all_test_images))

In [None]:
sorted(os.listdir('images/'))

---
## Notes


split_w28h28  -- can use 5 conv layers, and add padding to final layer

Total number of input images: 3475
Total number of people saved as cropped output images: 7965
Total number of people annotations in input data: 19683
Percent people included: 0.4

num train images 5667
num val images 1063
num test images 1235


28

14
13
12
11
10


-----
split_w30h30  -- can use existing architecture unchanged

Total number of input images: 3475
Total number of people saved as cropped output images: 7343
Total number of people annotations in input data: 19683
Percent people included: 0.37

30

15
14
13
12
10

split_w26h26  -- can use 4 conv layers

Total number of input images: 3475
Total number of people saved as cropped output images: 8696
Total number of people annotations in input data: 19683
Percent people included: 0.44


split_w26h70 -- can use 4 conv layers

Total number of input images: 3475
Total number of people saved as cropped output images: 7461
Total number of people annotations in input data: 19683
Percent people included: 0.38


26

13
12
11
10

split_w25h70_v1

Total number of input images: 3475
Total number of people saved as cropped output images: 7593
Total number of people annotations in input data: 19683
Percent people included: 0.39

train_all_bboxvis_v2

Total number of input images: 2975
Total number of people saved as cropped output images: 6312
Total number of people annotations in input data: 16526
Percent people included: 0.38

train_all_bboxvis_v1

Total number of input images: 2975
Total number of people saved as cropped output images: 6131
Total number of people annotations in input data: 16526
Percent people included: 0.37

---