Skeleton-based action recognition

Repository for learning pose-aware video representations for downstream tasks.

Respository structure

Code is stored in src/ and all runnable scripts are in this directory. This is an in-development repository and will contain multiple different experiments. The respective scripts for each type of experiment will be differentiated by the first word in the script. For example, all experiments related to masked autoencoders (mae) will be named mae_{script_name}.py.

Bash scripts that contain settings for each of these scripts of kept in scripts/. Some may be run python files that will submit SLURM jobs using submitit, some will call a script in scripts/launch_scripts which will submit the SLURM job.

Experiments will create and store all outputs in a new directory experiments/.

Current experiment types

(1) Transformer baseline action prediction (not under development)

Prefix: tsfmr_
Experiment files: src/tsfmr \

Code is written from scratch except for transformer model implementation which is borrowed from minGPT.

Predicts action labels given 3D pose information on NTURGB+D.

(2) Masked autoencoders for video representations

Prefix: mae_
Experiment files: src/mae

Code is adapted from D36893220 to run on AWS cluster and run without internal build tools.

Pretrains a representation used masked-autoencoders on video data that will then be used for action classification on the K400 dataset.

How to run:

bash scripts/mae_pretrain.sh

(3) PoseConv3D

Prefix: pc3d_
Experiment files: src/pc3d

Ideas taken from the PoseConv3D paper. Currently, only a data processing procedure where bounding boxes are extracted from a video dataset.

(4) Directed Masked Autoencoders

Prefix: dmae_
Experiment files: src/dmae

All new files, models, experiments and utilities that enable directed masking for a masked autoencoder approach for video pretraining.

Environment setup

Packages are all provided in requirements.txt.

Notes

torchvision needs to be build from source with ffmpeg backend to properly decode video data for Masked autoencoder data loading. Follow instructions here in the section regarding Video Backend.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
scripts		scripts
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
test_nturgbd_skeletons_pose_transformer_experiment.sh		test_nturgbd_skeletons_pose_transformer_experiment.sh
test_nturgbd_skeletons_transformer_experiment.sh		test_nturgbd_skeletons_transformer_experiment.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

src

src

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE.md

LICENSE.md

README.md

README.md

requirements.txt

requirements.txt

test_nturgbd_skeletons_pose_transformer_experiment.sh

test_nturgbd_skeletons_pose_transformer_experiment.sh

test_nturgbd_skeletons_transformer_experiment.sh

test_nturgbd_skeletons_transformer_experiment.sh

Repository files navigation

Skeleton-based action recognition

Respository structure

Current experiment types

(1) Transformer baseline action prediction (not under development)

(2) Masked autoencoders for video representations

(3) PoseConv3D

(4) Directed Masked Autoencoders

Environment setup

Notes

About

Releases

Packages

Contributors 2

Languages

License

facebookresearch/dmae_st

Folders and files

Latest commit

History

Repository files navigation

Skeleton-based action recognition

Respository structure

Current experiment types

(1) Transformer baseline action prediction (not under development)

(2) Masked autoencoders for video representations

(3) PoseConv3D

(4) Directed Masked Autoencoders

Environment setup

Notes

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages