Add support of MViTv2 video variants (#6373)

* Extending to support MViTv2 * Fix docs, mypy and linter * Refactor the relative positional code. * Code refactoring. * Rename vars. * Update docs. * Replace assert with exception. * Updat docs. * Minor refactoring. * Remove the square input limitation. * Moving methods around. * Modify the shortcut in the attention layer. * Add ported weights. * Introduce a `residual_cls` config on the attention layer. * Make the patch_embed kernel/padding/stride configurable. * Apply changes from code-review. * Remove stale todo.
pytorch · Aug 10, 2022 · 7e8186e · 7e8186e
1 parent 6908129
commit 7e8186e
Show file tree

Hide file tree

Showing 4 changed files with 369 additions and 34 deletions.
diff --git a/docs/source/models/video_mvit.rst b/docs/source/models/video_mvit.rst
@@ -12,7 +12,7 @@ The MViT model is based on the
 Model builders
 --------------
 
-The following model builders can be used to instantiate a MViT model, with or
+The following model builders can be used to instantiate a MViT v1 or v2 model, with or
 without pre-trained weights. All the model builders internally rely on the
 ``torchvision.models.video.MViT`` base class. Please refer to the `source
 code
@@ -24,3 +24,4 @@ more details about this class.
     :template: function.rst
 
     mvit_v1_b
+    mvit_v2_s
diff --git a/test/expect/ModelTester.test_mvit_v2_s_expect.pkl b/test/expect/ModelTester.test_mvit_v2_s_expect.pkl
diff --git a/test/test_models.py b/test/test_models.py
@@ -309,6 +309,9 @@ def _check_input_backprop(model, inputs):
     "mvit_v1_b": {
         "input_shape": (1, 3, 16, 224, 224),
     },
+    "mvit_v2_s": {
+        "input_shape": (1, 3, 16, 224, 224),
+    },
 }
 # speeding up slow models:
 slow_models = [