CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

Yunyao Mao, Wengang Zhou, Zhenbo Lu, Jiajun Deng, and Houqiang Li

Accepted by ECCV 2022 (Oral). [Paper Link]

This repository includes Python (PyTorch) implementation of the CMD.

Abstract

In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the crossmodal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other hand, asymmetrical configurations are used for teacher and student to stabilize the distillation process and to transfer high-confidence information between modalities. By derivation, we find that the cross-modal positive mining in previous works can be regarded as a degenerated version of our CMD. We perform extensive experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II datasets. Our approach outperforms existing self-supervised methods and sets a series of new records.

Requirements

python==3.8.13
torch==1.8.1+cu111
torchvision==0.9.1+cu111
tensorboard==2.9.0
scikit-learn==1.1.1
tqdm==4.64.0
numpy==1.22.4

Data Preprocessing

Please refer to skeleton-contrst

Training and Testing

Please refer to the bash scripts

Pretrained Models

NTU-60 and NTU-120: pretrained_models

Citation

If you find this work useful for your research, please consider citing our work:

@inproceedings{Mao_2022_CMD,
    title={CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation},
    author={Mao, Yunyao and Zhou, Wengang and Lu, Zhenbo and Deng, Jiajun and Li, Houqiang},
    booktitle={European Conference on Computer Vision (ECCV)},
    year={2022}
}

Acknowledgment

The framework of our code is based on skeleton-contrast.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data_gen		data_gen
feeder		feeder
graph		graph
images		images
moco		moco
options		options
.gitignore		.gitignore
README.md		README.md
action_classification_cmd.py		action_classification_cmd.py
action_classification_cmd_semi.py		action_classification_cmd_semi.py
action_retrieval_cmd.py		action_retrieval_cmd.py
dataset.py		dataset.py
pretrain_moco_cmd.py		pretrain_moco_cmd.py
script_action_classification_cmd.sh		script_action_classification_cmd.sh
script_action_classification_cmd_semi.sh		script_action_classification_cmd_semi.sh
script_action_classification_cmd_transfer.sh		script_action_classification_cmd_transfer.sh
script_action_retrieval_cmd.sh		script_action_retrieval_cmd.sh
script_pretrain_moco_cmd.sh		script_pretrain_moco_cmd.sh

maoyunyao/CMD

Folders and files

Latest commit

History

Repository files navigation

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

Accepted by ECCV 2022 (Oral). [Paper Link]

Abstract

Requirements

Data Preprocessing

Training and Testing

Pretrained Models

Citation

Acknowledgment

About

Resources

Stars

Watchers

Forks

Languages