In [2]:
import pandas as pd
import numpy as np
import os 
import json 
from skimage import io
import skimage
import tqdm
from collections import defaultdict

path_instagram = '.../...' #@@@ OVERRIDE: Path of folder with images downloaded from IG for test set

## TEST SET

### 1. Data Collection and Cleaning

The aim of this project is to perform fashion instance segmentation into the wild. Fashion instance segmentation means detetcting each garment in a picture with a ploygon or mask. "In to the wild" refers to the quality of the images: the aim is not to detect garments in "street" photos, but in "wild" images. Street photos are out-of-the-studio good quality pictures, that usually focus on one professional model, have sophisticated backgrounds, different lighting conditions, and minor occlusion due to various yet standard poses. On the opposite side there are "wild" photos. They have no constraints at all: they are user-created, hence they may have heavy occlusion, bad lighting, cropping, and an overall poor quality.

**Data Collection**  
The train and validation set are built starting from publicly available fashion datasets.  
The test set, used to evaluate the segmentation model's performance, is built from scratch. First, a number of images from Instagram has been downloaded. Different keyworkds have been used as the selection creterion, resulting into a very wild dataset with 38,913 images with various size, occlusion, lighting, resolution, zoom-in, and content. Regarding this last aspect, many images don't have any garment at all. 

**Data Cleaning**  
To clean the 38,913 wild dataset, first no-color images have been removed to be consistent with the segmentation model that takes as input RGB images. By no-color images, I consider both black-and-white and greyscale images. Even if these two terms are used interchangeably, they mean two different things: a black-and-white image simply consists of two colors — black and white (1 channel), while a greyscale one is composed exclusively of shades of gray, varying from black to white (3 channels). The script to detect color and no-color images is: ```color_detector.py```

Secondly, a face detector, MTCNN - Multi-task Cascaded CNN [(Zhang, 2016)](https://arxiv.org/abs/1604.02878), has been applied to get the ids of images with at least one person. The cascade structure makes this network pretty fast. The motivation behind this step is that many images in the initial IG dataset show objects or landscapes. Therefore they should be removed, since they don't contain any garment. Moreover, uploading in the annotation tool images with at least one face would increase the probability to find garments and save a lot of time. However, since it would be interesting to test the performance of the model even on photos of clothes not on the human body, some pictures satisfying this condition are retrieved from the list of no-face ids and add to the face-images' pool. The script to detect face and no-face images is: ```face_detector.py```

After these two steps, the number of potential test images to annotate is 24,033 (61% of IG downloaded images).
  
Number of potential images to annotate for test set:
| Initial number | 38.913 
|---|---|
| Potential number | 24.033|

In [None]:
all_ids = [i for i in sorted(os.listdir(path_instagram)) if not i.startswith('.')]
assert len(all_ids) == len(set(all_ids)) == 38913

#### 1. NO-COLOR
To be consistent in training the model, select only color images.  
Problem: How to detect an image as black and white or greyscale? 
Implement an algorithm that checks the following 3 conditions:
1. ```Number of channels < 3```: rare case, many times images have 3 channels but they look like b/w because they have many shadows of grey pixels.
2. ```(R == G == B).all```: not always True
3. ```Channel variance < threshold```: define a threshold T, if variance between channels is lower, then it is a greyscale immage 

SCRIPT: ```color_detector.py```

In [None]:
# Run the color detector, output: txt files with the ids of no-color images
ids_color_lists = ['ids_maybe_color', 'ids_bw_grey', 'ids_no_idea_color', 'ids_monocolor']
ids_problem_color = {}
for name in ids_color_lists:
    with open(name+'.txt', 'r') as f:
        ids_problem_color[name] = [i.strip() for i in f.readlines()]
        
for k,v in ids_problem_color.items():
    print(k, len(v))

In [None]:
# Remove from potential test images: ids_bw_grey
num_potential_imgs = len(all_ids)
print('# Potential Images TEST: {}'.format(num_potential_imgs))
ids_no_bw = [i for i in all_ids if i not in ids_problem_color['ids_bw_grey']]
assert (len(all_ids)-len(ids_problem_color['ids_bw_grey'])) == len(ids_no_bw)
num_potential_imgs = len(ids_no_bw)
print('# Potential Images TEST: {}'.format(num_potential_imgs))

#### 2. NO-FACE

To avoid uploading in the annotation tool images of objects or landscapes without any garment.   
Method: Multi-task Cascade CNN - a network with three convolutional networks that learns simultaneously face detection and keypoints alignment. Accuracy: ~ 95%

SCRIPT: ```face_detector.py```

In [None]:
# Run the face detector, output; txt file with the ids of no-face images
with open('ids_no_face.txt', 'r') as f:
    ids_no_face = [i.strip() for i in f.readlines()]

print('ids_no_face', len(ids_no_face))

In [None]:
# Remove from potential test images: ids_no_face
print('# Potential Images TEST: {}'.format(num_potential_imgs))
ids_no_bw_face = [i for i in ids_no_bw if i not in ids_no_face]
assert (len(all_ids)-len(set(sorted(ids_no_face+ids_problem_color['ids_bw_grey'])))) == len(ids_no_bw_face)
num_potential_imgs = len(ids_no_bw_face)
print('# Potential Images TEST: {}'.format(num_potential_imgs))

### 2. Data Annotation

Annotator: [VGG Image Annotator (VIA)](#https://www.robots.ox.ac.uk/~vgg/software/via/)  
Download: [Link](#https://www.robots.ox.ac.uk/~vgg/software/via/downloads/via-2.0.11.zip)

Given the 24,033 images to use as test set, only 500 images have been annotated. The annotation tool used in this project is the Visual Geometry Group Image Annotator (VIA), a software that runs as an offline application in most modern web browsers.  

During the annotation phase, images have been discarded following these criteria: 
1. Skip images/instances if clothes are not human recognizable; 
2. Skip images/instances if clothes are too much occluded;
3. Skip images/instances if clothes are too small (precise segmentation is impossible);
4. As per DeepFashion2, include in the segmentation polygon even the area of the item that is occluded by another item/human body part.

The VIA annotation tool returns annotations in VIA format. However, the segmentation model needs as input annotations in COCO format. To convert the annotation file from VIA to COCO use the function in ```converters.ipynb```.

#### <font color='red'>Following cells contain code to manipulate the VIA json file with annotations. Customize or delete the code according to your needs.</font>

In [None]:
# MERGE JSONS: I saved annotations in 5 different jsosn, use this code to merge the 5 jsons in 1 json
paths_DLCV = [
    '/Users/francescabianchessi/THESIS/ANNOTATING_DF2/DLCV_501.json',
    '/Users/francescabianchessi/THESIS/ANNOTATING_DF2/DLCV_502_1000.json',
    '/Users/francescabianchessi/THESIS/ANNOTATING_DF2/DLCV_1001_2009.json',
    '/Users/francescabianchessi/THESIS/ANNOTATING_DF2/DLCV_38000_38912.json'
]

list_jsons = [json.load(open(p)) for p in paths_DLCV]
z = list_jsons[0]['_via_img_metadata'] | list_jsons[1]['_via_img_metadata'] | list_jsons[2]['_via_img_metadata'] | list_jsons[3]['_via_img_metadata'] 

new_js = {}
new_js['_via_settings'] = list_jsons[0]['_via_settings']
new_js['_via_settings']['project']['name'] = 'DLCV_merge'
new_js['_via_img_metadata'] = z
new_js['_via_attributes'] = list_jsons[0]['_via_attributes']
new_js['_via_data_format_version'] = list_jsons[0]['_via_data_format_version']
new_js['_via_image_id_list'] = sorted([l for j in list_jsons for l in j['_via_image_id_list']])

with open('DLCV_VIA_merge_pre_delete.json', 'w') as f:
    json.dump(new_js, f)

In [None]:
# REMOVE DATA: Remove no-annotated images still included in the merged json file
js = json.load(open('/Users/francescabianchessi/THESIS/ANNOTATING_DF2/DLCV_merge_pre_delate.json'))
meta_data = {k:v for k,v in js['_via_img_metadata'].items() if len(js['_via_img_metadata'][k]['regions'])>0}
image_list = [name for name in js['_via_image_id_list'] if name in meta_data.keys()]
new_js = {}
new_js['_via_settings'] = js['_via_settings']
new_js['_via_settings']['project']['name'] = 'DLCV_merge_post_delete'
new_js['_via_img_metadata'] = meta_data
new_js['_via_attributes'] = js['_via_attributes']
new_js['_via_data_format_version'] = js['_via_data_format_version']
new_js['_via_image_id_list'] = sorted(image_list)

with open('DLCV_merge_post_delete.json', 'w') as f:
    json.dump(new_js, f)

In [46]:
# TRACK annotation progress: save the annotations from VIA and check how many annotations you have done so far
path = '/Users/francescabianchessi/THESIS/ANNOTATING_DF2/via_test_reviewed.json'
j = json.load(open(path)) #'_via_img_metadata', '_via_attributes', '_via_data_format_version', '_via_image_id_list'

masks_count = []
for k, v in j['_via_img_metadata'].items():
    masks_count.append((int(k.split('.jpg')[0]), len(j['_via_img_metadata'][k]['regions'])))

print(sum([t[1] for t in masks_count]))

ids_segmented = []
for t in masks_count:
    if t[1] > 0:
        ids_segmented.append(t[0])
len(ids_segmented)

1893


500

In [None]:
# Check all images have annotations!
for d in j['_via_img_metadata'].values():
    for r in d['regions']:
        try:
            assert r['region_attributes']['clothing'] or r['region_attributes']['accessories'] or r['region_attributes']['shoes']
        except: 
            print(d['filename'])

## TRAIN - VALIDATION SET

### 1. Data Collection and Cleaning: DeepFashion2 

DeepFashion2 (DF2) is the only publicly-available fashion dataset in the instance segmentation literature with both wild and street photos. The street photos are retrieved from online shopping/fahsion websites and are referred to as "shop" images. The wild photos are pictures taken by consumers and are referred to as "user" images. The actual reason why DF2 contains also wild images is that the final goal of the researchers behind this dataset was the task of consumer-to-shop clothes retrieval. 

Idea: create a training/validation set with both street and wild images from DF2 to perform instance segmentation on IG wild test set.

DF2 offers 191,961 images, a part of the original training set, and 32,153 images, half of the original validation set. All of them are annotated with the 13 categories defined by DF2 authors. For a deeper analysis of this dataset look at ```summary_datasets.ipynb```.  
In this project 30 categories are considered, hence all the DF2 images have been re-annotated in accordance with the new categorization. 
Images to further annotate have been retrieved rom the 32k validation set because its smaller size makes it easier to unzip and manipulate. 


### 2. Data Annotation 

Annotator: [VGG Image Annotator (VIA)](#https://www.robots.ox.ac.uk/~vgg/software/via/)  
Download: [Link](#https://www.robots.ox.ac.uk/~vgg/software/via/downloads/via-2.0.11.zip)

Given the 32k images to use as train/validation set, only 8,500 images have been annotated. The annotation tool used in this project is the Visual Geometry Group Image Annotator (VIA), a software that runs as an offline application in most modern web browsers.

During the annotation phase, images have been discarded following these criteria: 
1. Skip images/instances if clothes are not human recognizable; 
2. Skip images/instances if clothes are too much occluded;
3. Skip images/instances if clothes are too small (precise segmentation is impossible);
4. As per DeepFashion2, include in the segmentation polygon even the area of the item that is occluded by another item/human body part.

The VIA annotation tool returns annotations in VIA format. However, the segmentation model needs as input annotations in COCO format. To convert the annotation file from VIA to COCO use the function in ```converters.ipynb```.

#### <font color='red'>Following cells contain code to track the progress made during annotation.</font>

#### USERS

In [2]:
# AIM: TRACK - Track annotation's progress
c = '/Users/francescabianchessi/THESIS/ANNOTATING_DF2/df2_user_6510.json'
jcon = json.load(open(c)) #'_via_img_metadata', '_via_attributes', '_via_data_format_version', '_via_image_id_list'
print(len(jcon['_via_image_id_list']), len(jcon['_via_img_metadata']))

masks_count = []
limit = 6510 # ID of last image you have annotated
removed = 0
for k,v in tqdm.tqdm(jcon['_via_img_metadata'].items()):
    if int(k) <= limit:
        try: # if mask_0 has no clothing, neither the others
            v['regions'][0]['region_attributes']['clothing']
            masks_count.append((int(k.split('.jpg')[0]), len(jcon['_via_img_metadata'][k]['regions'])))
        except:
            removed += 1

num_masks = sum([t[1] for t in masks_count])
done = limit - removed
print("""
| Tot    | {}   |
| No Use | {}   |
| Done   | {}   | {:.2%} | {:.2%}

| # Masks      | {}
| AVG(# Masks) | {}
""".format(limit, removed, done, done/limit, done/5500, num_masks, num_masks/done ))

5511 5511


100%|██████████| 5511/5511 [00:00<00:00, 181449.17it/s]


| Tot    | 6510   |
| No Use | 6   |
| Done   | 6504   | 99.91% | 118.25%

| # Masks      | 8822
| AVG(# Masks) | 1.3563960639606396






#### SHOP

In [12]:
# AIM: TRACK - Track annotation's progress
s = '/Users/francescabianchessi/THESIS/ANNOTATING_DF2/post_delete.json'
jshop = json.load(open(s)) #'_via_img_metadata', '_via_attributes', '_via_data_format_version', '_via_image_id_list'
print(len(jshop['_via_image_id_list']), len(jshop['_via_img_metadata']))

masks_count = []
limit = 3001 # ID of last image you have annotated
removed = 0
for k,v in tqdm.tqdm(jshop['_via_img_metadata'].items()):
    if int(k) <= limit:
        try: # if mask_0 has no clothing, neither the others
            v['regions'][0]['region_attributes']['clothing']
            masks_count.append((int(k.split('.jpg')[0]), len(jshop['_via_img_metadata'][k]['regions'])))
        except:
            removed += 1

num_masks = sum([t[1] for t in masks_count])
done = limit - removed
print("""
| Tot    | {}   |
| No Use | {}   |
| Done   | {}   | {:.2%} | {:.2%}

| # Masks      | {}
| AVG(# Masks) | {}
""".format(limit, removed, done, done/limit, done/5500, num_masks, num_masks/done ))

3001 3001


100%|██████████| 3001/3001 [00:00<00:00, 347940.80it/s]


| Tot    | 3001   |
| No Use | 0   |
| Done   | 3001   | 100.00% | 54.56%

| # Masks      | 6902
| AVG(# Masks) | 2.299900033322226




