Add Video Swin Transformer #2369

innat · 2024-03-01T14:03:57Z

What does this PR do?

Fixes #2262

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
Did you write any new necessary tests?
If this adds a new model, can you run a few training steps on TPU in Colab to ensure that no XLA incompatible OP are used?

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

tirthasheshpatel

Amazing work here 🎉 Thanks @innat! I still need to test presets.

The model looks very good overall, just some nits about exporting some layers and models.

keras_cv/layers/video_swin_layers.py

keras_cv/models/classification/video_classifier.py

keras_cv/layers/video_swin_layers.py

keras_cv/layers/__init__.py

innat · 2024-03-31T13:20:43Z

keras_cv/models/backbones/video_swin/video_swin_aliases.py

+        return {
+            "videoswin_base_kinetics400": copy.deepcopy(
+                backbone_presets["videoswin_base_kinetics400"]
+            ),


The backbone base model has more than one checkpints.

with kinetics-400-base (current)

with kinetics-400-base-imagenet22k

with kinetics-600-base-imagenet22k

with something-something-v2

How to facilitate the preset method for all of these?

def presets(cls): """Dictionary of preset names and configurations.""" return { "videoswin_base_kinetics400": copy.deepcopy( backbone_presets["videoswin_base_kinetics400"] ), "videoswin_base_kinetics400_imagenet22k": copy.deepcopy( backbone_presets["videoswin_base_kinetics400_imagenet22k"] ), ... }

innat · 2024-03-31T17:11:41Z

Summarizing weight check.

Backbones (tolerance 1e-4)

Classifier (tolerance 1e-5)

notebook-1 for kinetics-400 (tiny, small, base, base-imagenet22k)
notebook-2 for kinetics-600 (base-imagenet22k), something-something-v2

@tirthasheshpatel @divyashreepathihalli
Could you please verify the weight used in the above notebooks? I will remove this notebooks from kaggle workspace afterward.

Note, In notebook-1, torchvision lib is used to load video-swin api and the pytorch weights they offered, whereas in notebook-2, raw official code and weights are loaded.

innat · 2024-04-02T09:08:19Z

ONNX

I noticed others also tried to export this model to onnx format but failed and reported to the official repo, tickets. So, I tried with this implementation with torch backend and it works as expected.

model = VideoClassifier(
    backbone=backbone,
    num_classes=num_classes,
    activation=None,
    pooling='avg',
)
model.eval()

batch_size = 1

#Input to the model
x = torch.randn(batch_size, 32, 224, 224, 3, requires_grad=True)
torch_out = model(x)

Using the torch official guideline.

torch.onnx.export(
    model, # model being run
    x,  # model input (or a tuple for multiple inputs)
    "vswin.onnx", 
    export_params=True,       
    opset_version=10,       
    do_constant_folding=True, 
    input_names = ['input'],   # the model's input names
    output_names = ['output'], # the model's output names
    dynamic_axes={
        'input' : {0 : 'batch_size'}, 
        'output' : {0 : 'batch_size'}
    }
)

import onnx
import onnxruntime

def to_numpy(tensor):
    if tensor.requires_grad:
        tensor = tensor.detach()
    tensor = tensor.cpu()
    numpy_array = tensor.numpy()
    return numpy_array

onnx_model = onnx.load("vswin.onnx")
onnx.checker.check_model(onnx_model)

ort_session = onnxruntime.InferenceSession(
    "vswin.onnx", providers=["CPUExecutionProvider"]
)

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

Logit checking.

np.testing.assert_allclose(
    to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05
)

divyashreepathihalli

lets move the video_swin layers into the model folder itself. Everything else LGTM!

innat · 2024-04-02T20:49:34Z

lets move the video_swin layers into the model folder itself. Everything else LGTM!

Sorry, could u please elaborate?
Do u want this file to relocate to here? If so, wouldn't it be anti pattern from current standard? I mean, all of the layers supposed to be in this directory, or no?

divyashreepathihalli · 2024-04-02T21:59:00Z

lets move the video_swin layers into the model folder itself. Everything else LGTM!

Sorry, could u please elaborate? Do u want this file to relocate to here? If so, wouldn't it be anti pattern from current standard? I mean, all of the layers supposed to be in this directory, or no?

Nope! all model specific layers should be inside the model folder. Only generic layers will go under the layers folder.
The move locations linked are correct.

innat · 2024-04-03T08:54:14Z

I think the test is failling for other issue.

divyashreepathihalli · 2024-04-05T18:31:19Z

Thank you for this awesome contribution!!!

innat added 30 commits March 1, 2024 16:02

init video swin

f961e75

add: 3d window size computation

578205a

add: mlp layer

9817025

add: patch embedding layer

3343db1

add: patch merging layer

7ab5cab

add: window attention layer

f70a61b

add: basic layer for video swin

5472fc6

update: basic layer for video swin

76d444b

add: swin blocks for video swin

715b8a3

create and add: video swin backbone

3ca0042

rename: video swin layers to model specific

3d845c5

update module import

1af8bd4

update module import

ed2864d

set class method to private usage

bf70fa9

set init params for backbone

eca5023

rm redundant imports

420e229

add video swin layer test cases

f73e25b

add: videoswin backbone aliases

1ccf7ee

add: video swin backbone presets

c5d5fa2

add: video swin backbone presets test

27b6596

update: video swin backbone presets test

814db52

add: video classifier task

cc6ac21

add: video swin classifier presets

d2d883d

run formatters

125b2dc

rename module name/id"

9827302

add hard-coded normalization for include rescaling=true

89a715a

add docstring for videoswin backbone

36db030

update metadata: backbone presets no weights

7aa27a4

update: backbone presets no weights test

62a8703

update video swin aliases for no weights

aad5661

tirthasheshpatel reviewed Mar 31, 2024

View reviewed changes

innat added 6 commits March 31, 2024 12:06

remove kcv export decorator

f866d12

update keras.Layer import

bfb62a4

remove unused layer import

57f0012

replace keras.layers instead of layers

7602052

update keras.Layer to keras.layers.Layer for keras2

837286d

add window_size param to aliases

6d44eca

innat commented Mar 31, 2024

View reviewed changes

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Apr 2, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 2, 2024

divyashreepathihalli requested changes Apr 2, 2024

View reviewed changes

innat added 2 commits April 3, 2024 05:30

move vide swin layer to model specific directory

f5dce04

minor fix

0ba9fdf

innat requested review from divyashreepathihalli and tirthasheshpatel April 2, 2024 23:52

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Apr 3, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 3, 2024

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Apr 4, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 4, 2024

divyashreepathihalli approved these changes Apr 5, 2024

View reviewed changes

divyashreepathihalli merged commit bfeba12 into keras-team:master Apr 5, 2024
10 checks passed

innat deleted the video_swin branch April 5, 2024 20:17

innat-asj mentioned this pull request Apr 27, 2024

Steps to convert mmaction's video-swin-transformer to ONNX successfully SwinTransformer/Video-Swin-Transformer#89

Open

innat mentioned this pull request May 23, 2024

take over video swin checkpoints #2448

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Video Swin Transformer #2369

Add Video Swin Transformer #2369

innat commented Mar 1, 2024 •

edited

Loading

tirthasheshpatel left a comment

innat Mar 31, 2024

innat commented Mar 31, 2024 •

edited

Loading

innat commented Apr 2, 2024 •

edited

Loading

divyashreepathihalli left a comment

innat commented Apr 2, 2024

divyashreepathihalli commented Apr 2, 2024 •

edited

Loading

innat commented Apr 3, 2024

divyashreepathihalli commented Apr 5, 2024

Add Video Swin Transformer #2369

Add Video Swin Transformer #2369

Conversation

innat commented Mar 1, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

tirthasheshpatel left a comment

Choose a reason for hiding this comment

innat Mar 31, 2024

Choose a reason for hiding this comment

innat commented Mar 31, 2024 • edited Loading

innat commented Apr 2, 2024 • edited Loading

ONNX

divyashreepathihalli left a comment

Choose a reason for hiding this comment

innat commented Apr 2, 2024

divyashreepathihalli commented Apr 2, 2024 • edited Loading

innat commented Apr 3, 2024

divyashreepathihalli commented Apr 5, 2024

innat commented Mar 1, 2024 •

edited

Loading

innat commented Mar 31, 2024 •

edited

Loading

innat commented Apr 2, 2024 •

edited

Loading

divyashreepathihalli commented Apr 2, 2024 •

edited

Loading