# Prepare Mask Datasets From Kaggle

## 1. Convert Dataset Format to VIA

- **Face Mask Detection Dataset**  
<https://www.kaggle.com/wobotintelligence/face-mask-detection-dataset>

In [1]:
!python "./datasets/Non-standard/Mask/Face Mask Detection Dataset/to_via.py"
!ls -l "./datasets/Non-standard/Mask/Face Mask Detection Dataset/images/via_region_data.json"

Convert completed.
-rw-rw-r-- 1 wzt wzt 5902672 2月  22 21:58 './datasets/Non-standard/Mask/Face Mask Detection Dataset/images/via_region_data.json'


- **Face Mask Detection**  
https://www.kaggle.com/andrewmvd/face-mask-detection

In [2]:
!python "./datasets/Non-standard/Mask/Face Mask Detection/to_via.py"
!ls -l "./datasets/Non-standard/Mask/Face Mask Detection Dataset/images/via_region_data.json"

Convert completed.
-rw-rw-r-- 1 wzt wzt 5902672 2月  22 21:58 './datasets/Non-standard/Mask/Face Mask Detection Dataset/images/via_region_data.json'


## 2. Divide the Dataset into Training Set and Validation Set

- **Face Mask Detection Dataset**  
<https://www.kaggle.com/wobotintelligence/face-mask-detection-dataset>

In [5]:
!python "./datasets/Non-standard/Mask/Face Mask Detection Dataset/train_val_split.py"
!ls -l "./datasets/tmp/00_VIA-mask"

Split completed.
总用量 104
drwxrwxr-x 2 wzt wzt 81920 4月   8 23:44 train
drwxrwxr-x 2 wzt wzt 20480 4月   8 23:44 val


- **Face Mask Detection**  
https://www.kaggle.com/andrewmvd/face-mask-detection

In [6]:
!python "./datasets/Non-standard/Mask/Face Mask Detection/train_val_split.py"
!ls -l "./datasets/tmp/01_VIA-mask"

Split completed.
总用量 40
drwxrwxr-x 2 wzt wzt 36864 4月   8 23:46 train
drwxrwxr-x 2 wzt wzt  4096 4月   8 23:46 val


## 3. Merge Two Datasets

In [7]:
import os
import via


src_root1 = "./datasets/tmp/00_VIA-mask"
src_root2 = "./datasets/tmp/01_VIA-mask"
dst_root = "./datasets/tmp/02_VIA-mask"

for train_or_val in ["train", "val"]:
    # src
    src_img_dir1 = os.path.join(src_root1, train_or_val)
    src_img_dir2 = os.path.join(src_root2, train_or_val)
    
    src_via_dataset1 = via.ViaDataset(os.path.join(src_img_dir1, 'via_region_data.json'))
    src_via_dataset2 = via.ViaDataset(os.path.join(src_img_dir2, 'via_region_data.json'))
    
    # dst
    dst_img_dir = os.path.join(dst_root, train_or_val)
    print("Merging...")
    os.makedirs(dst_img_dir, exist_ok=True)
    via.merge_images(src_img_dir1, src_via_dataset1, src_img_dir2, src_via_dataset2, dst_img_dir)
    print("Merge completed")

Merging...
Merge completed
Merging...
Merge completed


In [32]:
!ls -l "./datasets/tmp/02_VIA-mask"

总用量 152
drwxrwxr-x 2 wzt wzt 135168 2月  22 23:51 train
drwxrwxr-x 2 wzt wzt  20480 2月  22 23:51 val


## 4. Add Person Class 

In [3]:
import os
import cv2
import numpy as np
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
import via


ckpt_dir = '/home/wzt/PFD/person_detection/checkpoint'

def add_person_class(img_dir, via_dataset):
    cfg = get_cfg()
    cfg.merge_from_file(model_zoo.get_config_file("COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x.yaml"))
    cfg.MODEL.WEIGHTS = os.path.join(ckpt_dir, "keypoint_rcnn_X_101_32x8d_FPN_3x.pkl")
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.9
    predictor = DefaultPredictor(cfg)
    
    for anno in via_dataset.annotations:
        filepath = os.path.join(img_dir, anno.filename)
        if os.path.exists(filepath):
            im = cv2.imread(filepath)
        else:
#             print(filepath)
            continue
        outputs= predictor(im)
        
        regions = anno.regions
        for box in outputs["instances"].pred_boxes:
            box = box.cpu().tolist()
            regions.append({
                'shape_attributes': {
                    'name': 'rect',
                    'x': box[0],
                    'y': box[1],
                    'width': box[2] - box[0],
                    'height': box[3] - box[1]
                },
                'region_attributes': {'class': 'person'}
            })
    return via_dataset

In [4]:
# root
src_root = "./datasets/tmp/02_VIA-mask"
dst_root = "./datasets/tmp/03_VIA-person_and_mask"

for train_or_val in ["train", "val"]:
    # src
    src_img_dir = os.path.join(src_root, train_or_val)
    src_json_file = os.path.join(src_img_dir, 'via_region_data.json')
    
    # dst
    dst_img_dir = os.path.join(dst_root, train_or_val)
    dst_json_file = os.path.join(dst_img_dir, 'via_region_data.json')
    os.makedirs(dst_img_dir, exist_ok=True)

    # add
    print("Adding...")
#     via_dataset = via.ViaDataset(src_json_file)
#     add_person_class(src_img_dir, via_dataset)
#     via_dataset.save(dst_json_file)
#     via.copy_images(src_img_dir, via_dataset, dst_img_dir)
    print("Completed")

Adding...
Completed
Adding...
Completed


In [5]:
!ls -l "./datasets/tmp/03_VIA-person_and_mask"
!ls -l "./datasets/tmp/03_VIA-person_and_mask/train/via_region_data.json"
!ls -l "./datasets/tmp/03_VIA-person_and_mask/val/via_region_data.json"

总用量 152
drwxrwxr-x 2 wzt wzt 131072 4月  20 02:10 train
drwxrwxr-x 2 wzt wzt  20480 4月  20 02:13 val
-rw-rw-r-- 1 wzt wzt 10937763 4月  20 02:10 ./datasets/tmp/03_VIA-person_and_mask/train/via_region_data.json
-rw-rw-r-- 1 wzt wzt 1934510 4月  20 02:13 ./datasets/tmp/03_VIA-person_and_mask/val/via_region_data.json


## 5. Filter Out Useless Categories 

In [36]:
import via


# root
src_root = "./datasets/tmp/03_VIA-person_and_mask"
dst_root = "./datasets/tmp/04_VIA-person_and_mask"

categories_map = {
    "face_with_mask": "face_with_mask",
    "with_mask": "face_with_mask",
    "mask_weared_incorrect": "face_with_mask",    
    "person": "person",
}

for train_or_val in ["train", "val"]:
    # src
    src_img_dir = os.path.join(src_root, train_or_val)
    src_json_file = os.path.join(src_img_dir, 'via_region_data.json')
    
    # dst
    dst_img_dir = os.path.join(dst_root, train_or_val)
    dst_json_file = os.path.join(dst_img_dir, 'via_region_data.json')
    
    print("Filtering...")
    via_dataset = via.ViaDataset(src_json_file)
    via_dataset.map_class(categories_map)
    
    os.makedirs(dst_img_dir, exist_ok=True)
    via_dataset.save(dst_json_file)
    via.copy_images(src_img_dir, via_dataset, dst_img_dir)
    print("Filter completed")


Filtering...
Filter completed
Filtering...
Filter completed


In [37]:
!ls -l "./datasets/tmp/04_VIA-person_and_mask/train/via_region_data.json"
!ls -l "./datasets/tmp/04_VIA-person_and_mask/val/via_region_data.json"

-rw-rw-r-- 1 wzt wzt 2603603 2月  23 00:09 ./datasets/tmp/04_VIA-person_and_mask/train/via_region_data.json
-rw-rw-r-- 1 wzt wzt 1315751 2月  23 00:09 ./datasets/tmp/04_VIA-person_and_mask/val/via_region_data.json


## 6. Convert Person Dataset(COCO) to Person Dataset(VIA) 

In [3]:
import os
import via

dst_root = './datasets/tmp/05_VIA-person/'
category_to_class = {0: 'person'}

for train_or_val in ["train", "val"]:
    # src
    coco_img_dir = "/home/wzt/PFD/COCO/{}2017".format(train_or_val)
    coco_json_file = "/home/wzt/PFD/COCO/{}2017_person.json".format(train_or_val)

    # dst
    dst_img_dir = os.path.join(dst_root, train_or_val)
    dst_json_file = os.path.join(dst_img_dir, 'via_region_data.json')
    
    # 读取COCO Json文件
    print("Loading...")
    via_dataset = via.ViaDataset()
    via_dataset.load_coco(coco_json_file, coco_img_dir, category_to_class)
    print("COCO json file loaded")
    
    # 复制该Json文件下包含的图片到目标目录
    print("Copying...")
    os.makedirs(dst_img_dir, exist_ok=True)
    via_dataset.save(dst_json_file)
    via.copy_images(coco_img_dir, via_dataset, dst_img_dir)
    print("COCO images copied")


Loading...
COCO json file loaded
Copying...
COCO images copied
Loading...
COCO json file loaded
Copying...
COCO images copied


In [4]:
!ls -l "./datasets/tmp/05_VIA-person/train/via_region_data.json"
!ls -l "./datasets/tmp/05_VIA-person/val/via_region_data.json"

-rw-rw-r-- 1 wzt wzt 602967362 4月   5 12:10 ./datasets/tmp/05_VIA-person/train/via_region_data.json
-rw-rw-r-- 1 wzt wzt 25037769 4月   5 12:10 ./datasets/tmp/05_VIA-person/val/via_region_data.json


## 7. Merge Person Dataset(VIA) and Person_and_Mask Dataset(VIA) 

In [44]:
import os
import via


src_root1 = "./datasets/tmp/04_VIA-person_and_mask"
src_root2 = "./datasets/tmp/05_VIA-person"
dst_root = "./datasets/tmp/06_VIA-person_and_mask"

for train_or_val in ["train", "val"]:
    # src
    src_img_dir1 = os.path.join(src_root1, train_or_val)
    src_img_dir2 = os.path.join(src_root2, train_or_val)
    
    src_via_dataset1 = via.ViaDataset(os.path.join(src_img_dir1, 'via_region_data.json'))
    src_via_dataset2 = via.ViaDataset(os.path.join(src_img_dir2, 'via_region_data.json'))
    
    # dst
    dst_img_dir = os.path.join(dst_root, train_or_val)
    print("Merging...")
    os.makedirs(dst_img_dir, exist_ok=True)
    via.merge_images(src_img_dir1, src_via_dataset1, src_img_dir2, src_via_dataset2, dst_img_dir)
    print("Merge completed")

Merging...
Merge completed
Merging...
Merge completed


In [47]:
!ls -l "./datasets/tmp/06_VIA-person_and_mask/train/via_region_data.json"
!ls -l "./datasets/tmp/06_VIA-person_and_mask/val/via_region_data.json"

-rw-rw-r-- 1 wzt wzt 605570963 2月  23 00:43 ./datasets/tmp/06_VIA-person_and_mask/train/via_region_data.json
-rw-rw-r-- 1 wzt wzt 26353518 2月  23 00:43 ./datasets/tmp/06_VIA-person_and_mask/val/via_region_data.json
