Skip to content

gsisaoki/FSErasing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

facesyn_sample

FSErasing: Improving Face Recognition with Data Augmentation Using Face Parsing

Introduction

We proposes a data augmentation method, called Face Semantic Erasing (FSErasing), for face recognition using face parsing. Face recognition models are trained with face images erased random face semantic regions such as hair, cheek, forehead, nose, and eye. We also propose the original face semantic labels with 25 classes, which include 9 additional classes: right_cheek, left_cheek, right_chin, left_chin, right_forehead, left_forehead, middle_forehead, around_right_eye, around_left_eye.

This repository contains the following used for the results in our paper:

  • implementation of FSErasing
  • implementation of the visualization method for face recognition models using face parsing, which called Face Semantic Class Activation Mapping (FS-CAM) in this repositoty
  • our original semantic labels with 25 classes for detailed face parsing

Requirements

  • Python 3.x (recommended >= 3.8.8)
  • numpy (recommended >= 1.19.2)
  • pytorch (recommemded >= 1.8.1)
  • torchvision (recommended >= 0.9.1)
  • pandas (recommended >= 1.2.4)
  • opencv-python (recommended >= 4.5.1.48)
  • scipy (recommended >= 1.6.2)
  • scikit-learn (recommended >= 0.24.1)
  • tqdm (recommended >= 4.60.0)
  • matplotlib (recommended >= 3.4.1)
  • scikit-image (recommended >= 0.18.1, only required for cam.rise.RISE)
  • kornia (recommended >= 0.5.8, only required for cam.groupcam.GroupCAM )

Dataset

Downloading the dataset

You can download the detailed face semantic labels (with 25 classes) for FaceSynthetics dataset 1 from the link below.

Google Drive (739MB, unzip: 1.1GB)

Note that the face images and landmark labels are NOT included in our distributed files. They are available for download at the official GitHub repository of FaceSynthetics (full dataset of 100,000 images).

Dataset layout

The detailed face semantic labels are contained in a single .zip file. We recommend that move the unzipped files and folders: labels_25, labels_10, anno_list.csv into the folder downloaded at the official repository of FaceSynthetics.

detailed_facesynthetics.zip
├── labels_25
|   └── {frame_id}_seg.png   # Segmentation image, where each pixel has an integer value mapping to the categories below (0 to 24)
├── labels_10
|   └── {frame_id}_seg.png   # Segmentation image, where each pixel has an integer value mapping to the categories below (0 to 9)
└── anno_list.csv            # .csv file, described the frame ID and abailability of our detailed labels with 25 classes

The .csv file has table data with 2 columns and 100,000 rows, like the following.

frame_id with_25
0 1
1 1
... ...
10 0
11 1
12 0
... ...
99999 1

Our detailed labels are automatically annotated based on the 468 landmarks estimated using Face Mesh (Google Mediapipe) 2, and there are 11,916 images for which annotation failed due to landmark detection errors or other reasons. More information is available from our paper. Then, you can get the list of paths of images and detailed semantic labels with 25 classes by running the following commands.

import pandas as pd

df = pd.read_csv('anno_list.csv')
id_list = df[df['with_25'] == 1]['frame_id'].values

image_paths = [f'./images/{x:06d}.png' for x in id_list]         # list of paths of 88,084 images
label_paths = [f'./labels_25/{x:06d}_seg.png' for x in id_list]   # list of paths of 88,084 detailed labels with 25 classes

Class index assignment

The int value of each pixel in the segmentation image assigned accorsing to the following table.

value (class ID) class name
0 Background
1 Right_cheek
2 Left_cheek
3 Right_chin
4 Left_chin
5 Right_forehead
6 Left_forehead
7 Middle_forehead
8 Around_right_eye
9 Around_left_eye
10 Nose
11 Right_eye
12 Left_eye
13 Right_blow
14 Left_blow
15 Right_ear
16 Left_ear
17 Mouth
18 Upper_lip
19 Lower_lip
20 Neck
21 Hair
22 Clothing
23 Glasses
24 Headware

Pre-trained models

Face parsing model

The pre-trained face parsing model is available from the link below.

Google Drive (443MB)

The network architecture is based on U-Net 3, which encoder is replaced ResNet-18 4. The model is trained using 88,084 face images in FaceSythetics dataset 1 with our detailed semantic labels. If you want to obtain more details of experimental conditions, please check Section 5.1.2 in our paper.

Note that the images and labels used for training are aligned using similarity transformation based on 5 facial landmarks and size of 112 × 112 pixels. The alignment method is followed the general one for face recognition, such as introduced at insightface/recognition/arcface_torch/eval_ijbc.py.

Face recognition model

The pre-trained face recognition model is available from the link below.

Google Drive (309MB)

We use ResNet-34 as the network architecture, which is improved version of ResNet by the authors of ArcFace paper 5 and suitable for face recognition using a smaller input image than that in general image recognition tasks.

Sample codes

FSErasing sample

coming soon

FS-CAM sample

coming soon

Footnotes

  1. E. Wood, T. Baltrusaitis, C. Hewitt, S. Dziadzio, T.J. Cashman, and J. Shotton, "Fake It Till You Make It: Face analysis in the wild using synthetic data alone," Proc. Int'l Conf. Computer Vision (ICCV), pp. 3681--3691, Oct. 2021. 2

  2. Y. Kartynnik, A, Ablavatski, I. Grishchenko, and M. Grundmann, "Real-time facial surface geometry from monocular video on mobile GPUs," arXiv, abs/1907.06724, Jun. 2019.

  3. O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional networks for biomedical image segmentation," Proc. Int'l Conf. Medical Image Computing and Computer Assisted Intervention, Springer, LNCS, vol. 9351, pp. 234--241, Oct. 2015.

  4. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 770--778, Jun. 2016.

  5. J. Deng, J. Guo, and S. Zafeiriou, "ArcFace: Additive angular margin loss for deep face recognition," arXiv, abs/1801.07698v1, Jan. 2018.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages