FAIR(Facebook AI Research)에서 제공하는 Detectron2 라이브러리를 사용하여 전체 학습과 추론 과정을 구현했습니다.  
사용한 모델은 Detectron2에서 사전 학습된 keypoint-RCNN을 가져와 추가로 학습하여 사용하였으며,  
최종 결과는 이렇게 생성된 여러 모델 중 Public 점수가 34점대인 모델을 앙상블해서 제출하였습니다.  

Coco 키포인트 데이터셋 형식이 키포인트의 x 좌표와 y 좌표에 추가로 이미지 내에 보여지는지 visible 값도 사용하기 때문에, augmentation을 적용했을 때 키포인트가 잘리는 경우 visible 값이 0이 되도록 변환하여 사용했습니다. 추가로 Detectron2에서 keypoint detection task를 수행할 때 바운딩 박스 좌표도 사용하기 때문에 주어진 키포인트에서 최소 x, y 좌표와 최대 x, y 좌표로 바운딩 박스 영역을 잡아 사용했습니다.  

여기에 오리지널 데이터셋을 적용했을 때 점수가 상당히 잘 나와서 적절한 learning rate와 iteration을 찾은 뒤 augmenation에 초점을 맞춰 성능을 올렸습니다.

[학습]
1. keypoint_rcnn_X_101_32x8d_FPN_3x(Pretrained)
2. Learning Rate: 0.001
3. Iteration: 10,000

[전처리]
1. Albumentations 라이브러리를 사용하여 Crop과 Rotate 위주로 다양하게 전처리하여 사용.
2. CoCo keypoint Dataset Annotation에 맞게 csv 파일에서 x, y 좌표를 읽어온 뒤 x, y, v로 변환.
3. Augmentation을 적용했을 때 특정 키포인트가 잘려 이미지 내에 없을 때 v 값이 0이 되도록 처리.
4. Bounding Box 영역도 학습에 사용하기 때문에 키포인트의 최소 x, y 좌표와 최대 x, y 좌표로 바운딩 박스를 넣어줌.

[후처리]
1. 간혹 추론 과정에서 키포인트가 측정되지 않는 이미지가 존재해 먼저 0으로 채운 뒤, 앙상블 과정에서 다른 모델의 값으로 채워서 사용.


## 환경

Detectron2의 경우 현재 windows os의 경우 정식 지원하지 않아, windows 환경에 맞게 수정된 깃헙 레포지토리를 클론하여 사용하거나 리눅스 환경에서 사용해야 합니다.

## 코드

### Augmentation

Albumentations 라이브러리를 사용했으며, pytorch에서 제공하는 transform 등과 비교하여 속도가 빠르다고 합니다. 거기에 키포인트도 함께 변환해주는 기능도 있어 사용했습니다.  
transform_dict에서 미리 정의해놓고 아래에서 문자열 값을 리스트에 간단히 추가해서 다양하게 augmentation을 적용해서 모델 성능을 측정했습니다.
Augmentation된 이미지들은 따로 폴더를 생성하여 저장해놓고 사용했습니다.

In [None]:
import os

import numpy as np
import pandas as pd
from tqdm import tqdm
import cv2

import albumentations as A

keypoint_params = A.KeypointParams(format="xy", label_fields=["class_labels"], remove_invisible=False, angle_in_degrees=True)
transform_dict = {
    "Original": A.Compose([A.RandomCrop(height=1080, width=1920, p=1)], keypoint_params=keypoint_params),
    "CenterCrop_1": A.Compose([A.CenterCrop(height=720, width=1280, p=1)], keypoint_params=keypoint_params),
    "CenterCrop_2": A.Compose([A.CenterCrop(height=960, width=960, p=1)], keypoint_params=keypoint_params),
    "RandomCrop_1": A.Compose([A.RandomCrop(height=540, width=720, p=1)], keypoint_params=keypoint_params),
    "RandomCrop_2": A.Compose([A.RandomCrop(height=720, width=960, p=1)], keypoint_params=keypoint_params),
    "RandomCrop_3": A.Compose([A.RandomCrop(height=960, width=1280, p=1)], keypoint_params=keypoint_params),
    "RandomCrop_4": A.Compose([A.RandomCrop(height=720, width=1280, p=1)], keypoint_params=keypoint_params),
    "RandomSquare_1": A.Compose([A.RandomCrop(height=960, width=960, p=1)], keypoint_params=keypoint_params),
    "RandomSquare_2": A.Compose([A.RandomCrop(height=720, width=720, p=1)], keypoint_params=keypoint_params),
    "RandomSquare_3": A.Compose([A.RandomCrop(height=540, width=540, p=1)], keypoint_params=keypoint_params),
    "RandomSquare_4": A.Compose([A.RandomCrop(height=420, width=420, p=1)], keypoint_params=keypoint_params),
    "Rotate45": A.Compose([A.Rotate(limit=45, p=1)], keypoint_params=keypoint_params),
    "Rotate45_CenterCrop": A.Compose([A.Rotate(limit=45, p=1), A.CenterCrop(height=720, width=1280, p=1)], keypoint_params=keypoint_params),
    "Rotate45_RandomCrop_1": A.Compose([A.Rotate(limit=45, p=1), A.RandomCrop(height=540, width=720, p=1)], keypoint_params=keypoint_params),
    "Rotate45_RandomCrop_2": A.Compose([A.Rotate(limit=45, p=1), A.RandomCrop(height=720, width=960, p=1)], keypoint_params=keypoint_params),
    "Rotate45_RandomCrop_3": A.Compose([A.Rotate(limit=45, p=1), A.RandomCrop(height=960, width=960, p=1)], keypoint_params=keypoint_params),
    "Rotate45_RandomCrop_4": A.Compose([A.Rotate(limit=45, p=1), A.RandomCrop(height=720, width=1280, p=1)], keypoint_params=keypoint_params),
    "Random_ScaleCrop": A.Compose([A.RandomCrop(height=960, width=960, p=1), A.RandomScale(scale_limit=0.35, always_apply=True)], keypoint_params=keypoint_params,),
    "RandomBrightnessContrast": A.Compose([A.RandomBrightnessContrast(always_apply=True)], keypoint_params=keypoint_params),
    "Rescale_RandomCrop_1": A.Compose([A.RandomScale(scale_limit=0.3, p=1), A.RandomCrop(height=720, width=960, p=1)], keypoint_params=keypoint_params),
    "Rescale_RandomCrop_2": A.Compose([A.RandomScale(scale_limit=0.3, p=1), A.RandomCrop(height=540, width=720, p=1)], keypoint_params=keypoint_params),
    "Rescale_RandomCrop_3": A.Compose([A.RandomScale(scale_limit=0.3, p=1), A.RandomCrop(height=720, width=1280, p=1)], keypoint_params=keypoint_params),
    "Rescale_CenterCrop": A.Compose([A.RandomScale(scale_limit=0.3, p=1), A.CenterCrop(height=720, width=1280, p=1)], keypoint_params=keypoint_params),
    "Rescale_Rotate45_RandomCrop_1": A.Compose([A.Rotate(limit=45, p=1), A.RandomScale(scale_limit=0.3, p=1), A.RandomCrop(height=720, width=960, p=1)], keypoint_params=keypoint_params,),
    "Rescale_Rotate45_RandomCrop_2": A.Compose([A.Rotate(limit=45, p=1), A.RandomScale(scale_limit=0.3, p=1), A.RandomCrop(height=540, width=720, p=1)], keypoint_params=keypoint_params,),
    "Rescale_Rotate45_RandomCrop_3": A.Compose([A.Rotate(limit=45, p=1), A.RandomScale(scale_limit=0.3, p=1), A.RandomCrop(height=720, width=1280, p=1)], keypoint_params=keypoint_params,),
    "Rescale_Rotate45_CenterCrop": A.Compose([A.Rotate(limit=45, p=1), A.RandomScale(scale_limit=0.3, p=1), A.CenterCrop(height=720, width=1280, p=1)], keypoint_params=keypoint_params,),
}


def main():
    data_path = "./data"
    src_path = os.path.join(data_path, "original")
    src_image_path = os.path.join(src_path, "train_imgs")
    src_df = pd.read_csv(os.path.join(src_path, "original.csv"))

    keypoints_labels = list(map(lambda x: x[:-2], src_df.columns[1:].tolist()[::2]))
    image_list = src_df.iloc[:, 0].to_numpy()
    keypoints_list = src_df.iloc[:, 1:].to_numpy()
    paired_keypoints_list = []
    for keypoint in keypoints_list:
        a_keypoints = []
        for i in range(0, keypoint.shape[0], 2):
            a_keypoints.append((float(keypoint[i]), float(keypoint[i + 1])))
        paired_keypoints_list.append(a_keypoints)
    paired_keypoints_list = np.array(paired_keypoints_list)

    dst_name = "augmented_3"
    dst_path = os.path.join(data_path, dst_name)
    dst_image_path = os.path.join(dst_path, "train_imgs")

    os.makedirs(dst_path, exist_ok=True)
    os.makedirs(dst_image_path, exist_ok=True)

    augmented_image_list = []
    augmented_keypoints_list = []
    for image_name, paired_keypoints in tqdm(zip(image_list, paired_keypoints_list)):
        src_image = cv2.imread(os.path.join(src_image_path, image_name))

        transform_names = [
            "Original",
            "RandomCrop_1",
            "RandomCrop_2",
            "RandomSquare_1",
            "CenterCrop_1",
            "Rotate45",
            "Rotate45_CenterCrop",
            "Rotate45_RandomCrop_1",
            "Rotate45_RandomCrop_2",
            "Rotate45_CenterCrop",
            "Rescale_RandomCrop_1",
            "Rescale_RandomCrop_2",
            "Rescale_RandomCrop_3",
            "Rescale_CenterCrop",
            "Rescale_Rotate45_RandomCrop_1",
            "Rescale_Rotate45_RandomCrop_2",
            "Rescale_Rotate45_RandomCrop_3",
            "Rescale_Rotate45_CenterCrop",
        ]

        for transform_name in transform_names:
            augmented = transform_dict[transform_name](
                image=src_image, keypoints=paired_keypoints, class_labels=keypoints_labels
            )
            augmented_image = augmented["image"]
            augmented_keypoints = np.array(augmented["keypoints"]).flatten()
            augmented_name = f"{transform_name}_{image_name}"

            cv2.imwrite(os.path.join(dst_image_path, augmented_name), augmented_image)
            augmented_image_list.append(augmented_name)
            augmented_keypoints_list.append(augmented_keypoints)

    dst_df = pd.DataFrame(columns=src_df.columns)
    dst_df["image"] = augmented_image_list
    dst_df.iloc[:, 1:] = augmented_keypoints_list
    dst_df.to_csv(os.path.join(dst_path, dst_name + ".csv"), index=False)


if __name__ == "__main__":
    main()

3250it [17:30,  3.06it/s]

### Utils

데이터셋을 만들 때 사용되거나 결과를 저장하는 등에 필요해 구현된 함수들을 한 곳에 모아놓고 import 해서 사용했습니다.  

**train_val_split**: 학습 데이터셋과 검증 데이터 셋을 나눌 때 사용되며 augmentation된 이미지와 original 이미지들이 섞여있기 때문에 검증 데이터셋에 original 이미지만 포함되도록 구현했습니다.

**get_data_dicts**: Detectron2에서 데이터셋을 생성할 때 사용되는 함수입니다. Detectron2의 경우 데이터를 딕셔너리 형태로 제공받아 사용하기 때문에 해당 타입에 맞게 알맞은 키 값을 할당해서 해당 키에 적절한 데이터를 할당하여 return하도록 구현했습니다.

**draw_keypoints**와 **save_samples**: 학습이 끝난 모델로 테스트 이미지를 추론할 때 생긴 결과에서 랜덤으로 이미지를 뽑아 키포인트를 그려서 시각화한 뒤 저장하도록 구현된 함수입니다.

**fix_random_seed**: random seed를 고정하기 위한 함수입니다.

In [None]:
import os
import random
import numpy as np
import pandas as pd
from tqdm import tqdm
import cv2

import neptune

import torch

from detectron2.structures import BoxMode
from detectron2.engine import HookBase
from detectron2.utils.events import get_event_storage


def train_val_split(imgs, keypoints, random_state=42):
    d = dict()
    for file in imgs:
        key = ''.join(file.split('-')[:-1])

        if key not in d.keys():
            d[key] = [file]
        else:
            d[key].append(file)
            
    np.random.seed(random_state)
    trains = []
    validations = []
    for key, value in d.items():
        r = np.random.randint(len(value), size=2)
        for i in range(len(value)):
            if "Origin" in key and i in r:
                validations.append(np.where(imgs == value[i])[0][0])
            else:
                trains.append(np.where(imgs == value[i])[0][0])
    return (
        imgs[trains], imgs[validations],
        keypoints[trains], keypoints[validations]
    )


def get_data_dicts(data_dir, imgs, keypoints):
    # train_dir = os.path.join(data_dir, "augmented" if phase=="train" else "train_imgs")
    train_dir = os.path.join(data_dir, "train_imgs")
    dataset_dicts = []

    for idx, item in tqdm(enumerate(zip(imgs, keypoints))):
        img, keypoint = item[0], item[1]

        record = {}
        filepath = os.path.join(train_dir, img)
        record["height"], record["width"] = cv2.imread(filepath).shape[:2]
        record["file_name"] = filepath
        record["image_id"] = idx

        keypoints_v = []
        flag = True
        for i, keypoint_ in enumerate(keypoint):
            keypoints_v.append(keypoint_)  # if coco set, should be added 0.5
            if keypoint_ < 0:
                flag = False
            if i % 2 == 1:
                if flag:
                    keypoints_v.append(2)
                else:
                    keypoints_v.append(0)
                flag = True

        x = keypoint[0::2]
        y = keypoint[1::2]
        x_min, x_max = min(x), max(x)
        y_min, y_max = min(y), max(y)

        obj = {"bbox": [x_min, y_min, x_max, y_max], "bbox_mode": BoxMode.XYXY_ABS, "category_id": 0, "keypoints": keypoints_v}

        record["annotations"] = [obj]
        dataset_dicts.append(record)

    return dataset_dicts


def draw_keypoints(image, keypoints, color=(0, 0, 255), diameter=5):
    keypoints_ = keypoints.copy()
    if len(keypoints_) == 48:
        keypoints_ = [[keypoints_[i], keypoints_[i + 1]] for i in range(0, len(keypoints_), 2)]

    assert isinstance(image, np.ndarray), "image argument does not numpy array."
    image_ = np.copy(image)
    for x, y in keypoints_:
        cv2.circle(image_, (int(x), int(y)), diameter, color, -1)

    return image_


def save_samples(dst_path, image_path, csv_path, mode="random", size=None, index=None):
    df = pd.read_csv(csv_path)

    if mode == "random":
        assert size is not None, "mode argument is random, but size argument is not given."
        choice_idx = np.random.choice(len(df), size=size, replace=False)
    if mode == "choice":
        assert index is not None, "mode argument is choice, but index argument is not given."
        choice_idx = index

    for idx in choice_idx:
        image_name = df.iloc[idx, 0]
        keypoints = df.iloc[idx, 1:]
        image = cv2.imread(os.path.join(image_path, image_name), cv2.IMREAD_COLOR)

        combined = draw_keypoints(image, keypoints)
        cv2.imwrite(os.path.join(dst_path, "sample" + image_name), combined)


def fix_random_seed(random_seed=423):
    torch.manual_seed(random_seed)
    torch.cuda.manual_seed(random_seed)
    torch.cuda.manual_seed_all(random_seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(random_seed)
    random.seed(random_seed)

### Trainer

Detectron2에서 사용하는 Trainer로 DefaultTrainer 클래스를 상속받아 변경해서 사용했습니다.

In [None]:
# Some basic setup:
# Setup detectron2 logger
from detectron2.utils.logger import setup_logger

# import some common libraries
import os

# import some common detectron2 utilities
from detectron2.data import MetadataCatalog
from detectron2.engine import DefaultTrainer

from detectron2.evaluation import COCOEvaluator

setup_logger()


class Trainer(DefaultTrainer):
    """
    We use the "DefaultTrainer" which contains a number pre-defined logic for
    standard training workflow. They may not work for you, especially if you
    are working on a new research project. In that case you can use the cleaner
    "SimpleTrainer", or write your own training loop.
    """

    @classmethod
    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
        """
        Create evaluator(s) for a given dataset.
        This uses the special metadata "evaluator_type" associated with each builtin dataset.
        For your own dataset, you can simply create an evaluator manually in your
        script and do not have to worry about the hacky if-else logic here.
        """
        if output_folder is None:
            output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
        evaluator_list = []
        evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type
        if evaluator_type in ["coco", "coco_panoptic_seg"]:
            evaluator_list.append(COCOEvaluator(dataset_name, cfg, True, output_folder))
        if len(evaluator_list) == 0:
            raise NotImplementedError("no Evaluator for the dataset {} with the type {}".format(dataset_name, evaluator_type))
        if len(evaluator_list) == 1:
            return evaluator_list[0]
        return DatasetEvaluators(evaluator_list)

### Train

학습이 이뤄지고 마친뒤 추론하는 과정이 담긴 코드입니다.  
Detectron2에서 기본으로 horizontal flip transform을 포함하고 있기 때문에 좌우가 뒤집혔을 때 제대로 반영되도록 keypoint_flip_map을 메타데이터에 추가해서 사용했습니다. 
Learning Rate는 0.001, iteration은 10000으로 설정하여 학습했습니다.
Coco 키포인트 데이터셋에서는 검증 과정시 스코어 계산을 위해 OKS(Object Keypoint Similarity)를 사용하는데 기존 coco 키포인트에서 사용한 oks sigma 값을 보고 근사해서 넣은 값과 1로 사용했을 때 결과에서 차이가 없어서 1로 사용했습니다.  
학습이 끝난 모델로 테스트 이미지를 추론했을 때 간혹 키포인트가 제대로 나오지 않는 이미지가 발생해서 해당 이미지 발생시 0으로 먼저 채워넣고 다른 모델의 추론 값으로 채워넣는 과정을 거쳤습니다.

In [None]:
import os

import numpy as np
import pandas as pd
import cv2
from tqdm import tqdm

from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog, DatasetCatalog

from utils import train_val_split, get_data_dicts, save_samples
from Trainer import Trainer


def main():
    data_name = "augmented_2"
    data_path = os.path.join("./data", data_name)
    csv_name = data_name + ".csv"
    train_df = pd.read_csv(os.path.join(data_path, csv_name))

    keypoint_names = list(map(lambda x: x[:-2], train_df.columns.to_list()[1::2]))
    keypoint_flip_map = [
        ("left_eye", "right_eye"),
        ("left_ear", "right_ear"),
        ("left_shoulder", "right_shoulder"),
        ("left_elbow", "right_elbow"),
        ("left_wrist", "right_wrist"),
        ("left_hip", "right_hip"),
        ("left_knee", "right_knee"),
        ("left_ankle", "right_ankle"),
        ("left_palm", "right_palm"),
        ("left_instep", "right_instep"),
    ]

    image_list = train_df.iloc[:, 0].to_numpy()
    keypoints_list = train_df.iloc[:, 1:].to_numpy()
    train_imgs, valid_imgs, train_keypoints, valid_keypoints = train_val_split(image_list, keypoints_list, random_state=42)

    image_set = {"train": train_imgs, "valid": valid_imgs}
    keypoints_set = {"train": train_keypoints, "valid": valid_keypoints}

    hyper_params = {
        "augmented_ver": data_name,
        "learning_rate": 0.001,
        "num_epochs": 10000,
        "batch_size": 256,
        "description": "Final training"
    }

    for phase in ["train", "valid"]:
        DatasetCatalog.register(
            "keypoints_" + phase, lambda phase=phase: get_data_dicts(data_path, image_set[phase], keypoints_set[phase])
        )
        MetadataCatalog.get("keypoints_" + phase).set(thing_classes=["human"])
        MetadataCatalog.get("keypoints_" + phase).set(keypoint_names=keypoint_names)
        MetadataCatalog.get("keypoints_" + phase).set(keypoint_flip_map=keypoint_flip_map)
        MetadataCatalog.get("keypoints_" + phase).set(evaluator_type="coco")

    cfg = get_cfg()
    cfg.merge_from_file(model_zoo.get_config_file("COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x.yaml"))
    cfg.DATASETS.TRAIN = ("keypoints_train",)
    cfg.DATASETS.TEST = ("keypoints_valid",)
    cfg.DATALOADER.NUM_WORKERS = 16  # On Windows environment, this value must be 0.
    cfg.SOLVER.IMS_PER_BATCH = 2  # mini batch size would be (SOLVER.IMS_PER_BATCH) * (ROI_HEADS.BATCH_SIZE_PER_IMAGE).
    cfg.SOLVER.BASE_LR = hyper_params["learning_rate"]  # Learning Rate.
    cfg.SOLVER.MAX_ITER = hyper_params["num_epochs"]  # Max iteration.
    cfg.SOLVER.GAMMA = 0.8
    cfg.SOLVER.STEPS = [3000, 4000, 5000, 6000, 7000, 8000]  # The iteration number to decrease learning rate by GAMMA.
    # cfg.SOLVER.LR_SCHEDULER_NAME = "WarmupMultiStepLR"
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x.yaml")
    cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = hyper_params["batch_size"]  # Use to calculate RPN loss.
    cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
    cfg.MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS = 24
    cfg.TEST.KEYPOINT_OKS_SIGMAS = np.ones((24, 1), dtype=float).tolist()
    cfg.TEST.EVAL_PERIOD = 5000  # Evaluation would occur for every cfg.TEST.EVAL_PERIOD value.
    cfg.OUTPUT_DIR = os.path.join("./output", data_name)

    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
    trainer = Trainer(cfg)
    trainer.resume_or_load(resume=False)
    trainer.train()

    # Inference should use the config with parameters that are used in training
    # cfg now already contains everything we've set previously. We changed it a little bit for inference:
    cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7  # set a custom testing threshold
    predictor = DefaultPredictor(cfg)

    test_dir = os.path.join("data", "test_imgs")
    test_list = os.listdir(test_dir)
    test_list.sort()
    except_list = []

    files = []
    preds = []
    for file in tqdm(test_list):
        filepath = os.path.join(test_dir, file)
        # print(filepath)
        im = cv2.imread(filepath)
        outputs = predictor(im)
        outputs = outputs["instances"].to("cpu").get("pred_keypoints").numpy()
        files.append(file)
        pred = []
        try:
            for out in outputs[0]:
                pred.extend([float(e) for e in out[:2]])
        except IndexError:
            pred.extend([0] * 48)
            except_list.append(filepath)
        preds.append(pred)

    df_sub = pd.read_csv("./data/sample_submission.csv")
    df = pd.DataFrame(columns=df_sub.columns)
    df["image"] = files
    df.iloc[:, 1:] = preds

    df.to_csv(os.path.join(cfg.OUTPUT_DIR, f"{data_name}_submission.csv"), index=False)
    if except_list:
        print(
            "The following images are not detected keypoints. The row corresponding that images names would be filled with 0 value."
        )
        print(*except_list)
    save_samples(cfg.OUTPUT_DIR, test_dir, os.path.join(cfg.OUTPUT_DIR, f"{data_name}_submission.csv"), mode="random", size=5)


if __name__ == "__main__":
    main()
