[Project] Add Actionclip Project #2470

Dai-Wenxun · 2023-05-11T04:08:52Z

ActionCLIP Project

ActionCLIP: A New Paradigm for Video Action Recognition

Abstract

The canonical approach to video action recognition dictates a neural model to do a classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined categories, limiting their transferable ability on new datasets with unseen concepts. In this paper, we provide a new perspective on action recognition by attaching importance to the semantic information of label texts rather than simply mapping them into numbers. Specifically, we model this task as a video-text matching problem within a multimodal learning framework, which strengthens the video representation with more semantic language supervision and enables our model to do zero-shot action recognition without any further labeled data or parameters requirements. Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune". This paradigm first learns powerful representations from pre-training on a large amount of web image-text or video-text data. Then it makes the action recognition task to act more like pre-training problems via prompt engineering. Finally, it end-to-end fine-tunes on target datasets to obtain strong performance. We give an instantiation of the new paradigm, ActionCLIP, which not only has superior and flexible zero-shot/few-shot transfer ability but also reaches a top performance on general action recognition task, achieving 83.8% top-1 accuracy on Kinetics-400 with a ViT-B/16 as the backbone.

Usage

Setup Environment

Please refer to Installation to install MMAction2.

Assume that you are located at $MMACTION2/projects/actionclip.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Prepare the Kinetics400 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Testing commands

To test with single GPU:

mim test mmaction configs/actionclip_vit-base-p32-res224-clip-pre_1x1x8_k400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/actionclip_vit-base-p32-res224-clip-pre_1x1x8_k400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/actionclip_vit-base-p32-res224-clip-pre_1x1x8_k400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results

Kinetics400

frame sampling strategy	backbone	top1 acc	top5 acc	testing protocol	config	ckpt
1x1x8	ViT-B/32	77.6	93.8	8 clips x 1 crop	config	ckpt[1]
1x1x8	ViT-B/16	80.3	95.2	8 clips x 1 crop	config	ckpt[1]
1x1x16	ViT-B/16	81.1	95.6	16 clips x 1 crop	config	ckpt[1]
1x1x32	ViT-B/16	81.3	95.8	32 clips x 1 crop	config	ckpt[1]

[1] The models are ported from the repo ActionCLIP and tested on our data. Currently, we only support the testing of ActionCLIP models. Due to the variation in testing data, our reported test accuracy differs from that of the original repository (on average, it is lower by one point). Please refer to this issue for more details.

Zero-Shot Prediction

We offer two methods for zero-shot prediction as follows. The test.mp4 can be downloaded from here.

import torch
import clip
from models.load import init_actionclip
from mmaction.utils import register_all_modules

register_all_modules(True)

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = init_actionclip('ViT-B/32-8', device=device)

video_anno = dict(filename='test.mp4', start_index=0)
video = preprocess(video_anno).unsqueeze(0).to(device)

template = 'The woman is {}'
labels = ['singing', 'dancing', 'performing']
text = clip.tokenize([template.format(label) for label in labels]).to(device)

with torch.no_grad():
    video_features = model.encode_video(video)
    text_features = model.encode_text(text)

video_features /= video_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
similarity = (100 * video_features @ text_features.T).softmax(dim=-1)
probs = similarity.cpu().numpy()

print("Label probs:", probs)  # [[9.995e-01 5.364e-07 6.666e-04]]

import mmengine
from mmaction.utils import register_all_modules
from mmaction.apis import inference_recognizer, init_recognizer

register_all_modules(True)

config_path = 'configs/actionclip_vit-base-p32-res224-clip-pre_1x1x8_k400-rgb.py'
checkpoint_path = 'https://download.openmmlab.com/mmaction/v1.0/projects/actionclip/actionclip_vit-base-p32-res224-clip-pre_1x1x8_k400-rgb/vit-b-32-8f.pth'
template = 'The woman is {}'
labels = ['singing', 'dancing', 'performing']

# Update the labels, the default is the label list of K400.
config = mmengine.Config.fromfile(config_path)
config.model.labels_or_label_file = labels
config.model.template = template

model = init_recognizer(config=config, checkpoint=checkpoint_path)

pred_result = inference_recognizer(model, 'test.mp4')
probs = pred_result.pred_scores.item.cpu().numpy()
print("Label probs:", probs)  # [9.995e-01 5.364e-07 6.666e-04]

Citation

@article{wang2021actionclip,
  title={Actionclip: A new paradigm for video action recognition},
  author={Wang, Mengmeng and Xing, Jiazheng and Liu, Yong},
  journal={arXiv preprint arXiv:2109.08472},
  year={2021}
}

Conflicts: mmaction/apis/inference.py

codecov · 2023-05-11T04:21:30Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (30c3380) 76.96% compared to head (6c0391c) 76.96%.

❗ Current head 6c0391c differs from pull request most recent head ad1eb31. Consider uploading reports for the commit ad1eb31 to get more accurate results

Additional details and impacted files

@@           Coverage Diff            @@
##           dev-1.x    #2470   +/-   ##
========================================
  Coverage    76.96%   76.96%           
========================================
  Files          159      159           
  Lines        12598    12598           
  Branches      2116     2116           
========================================
  Hits          9696     9696           
  Misses        2393     2393           
  Partials       509      509

Flag	Coverage Δ
unittests	`76.96% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmaction/apis/inference.py	`42.04% <0.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Dai-Wenxun and others added 21 commits May 11, 2023 12:04

update

8aeab48

update configs

39bbfa5

update model

263a010

update

b394a35

fix lint

522046e

Conflicts: mmaction/apis/inference.py

update demo

0b50ddd

fix lint

1c8854a

fix

a049180

fix lint

43808dc

update configs

b58028f

fix

4daa058

delete

c56e15a

Update README.md

2abb0a8

Update README.md

a13b81d

Update README.md

6584262

fix lint

0c7a2cc

fix readme

6746f73

fix load.py

b625eee

Update README.md

3e75049

update README

af78d1b

update README

3a55e96

mm-assistant bot assigned Dai-Wenxun May 11, 2023

Update README.md

ad1eb31

cir7 approved these changes May 15, 2023

View reviewed changes

cir7 merged commit b94df49 into open-mmlab:dev-1.x May 15, 2023
13 of 15 checks passed

Dai-Wenxun deleted the cp branch May 18, 2023 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Project] Add Actionclip Project #2470

[Project] Add Actionclip Project #2470

Dai-Wenxun commented May 11, 2023

codecov bot commented May 11, 2023 •

edited

[Project] Add Actionclip Project #2470

[Project] Add Actionclip Project #2470

Conversation

Dai-Wenxun commented May 11, 2023

ActionCLIP Project

Abstract

Usage

Setup Environment

Data Preparation

Testing commands

Results

Kinetics400

Zero-Shot Prediction

Citation

codecov bot commented May 11, 2023 • edited

Codecov Report

codecov bot commented May 11, 2023 •

edited