Add VideoModelZoo models #1130

bjuncek · 2019-07-16T17:03:05Z

Following up on #1077 in adding video capabilities, here is a port of the VMZ models, described in Du Tran's A Closer Look at Spatiotemporal Convolutions for Action Recognition. All of the models have been tested on Kinetics and they get within 1% of the reported results or better on Kinetics.

This is a very early commit, mostly looking for feedback from maintainers.

cc @fmassa @dutran

fmassa

This is looking good, thanks a lot for the PR Bruno!

I have done a first pass and made a few comments.
Also, lint is failing, can you have a look?

torchvision/models/video/video_stems.py

fmassa · 2019-07-18T09:58:52Z

torchvision/models/video/video_trunk.py

+        if stride != 1 or self.inplanes != planes * block.expansion:
+            ds_stride = stride
+            # 2D convolutions should not be downsampled along temporal axis
+            if conv_builder.__name__ == "video_2d_conv":


This is not a pattern that I generally see in the PyTorch codebase.

What about the following: instead of having conv_builder be a function that returns a nn.Module instance, what about making it a class that inherits from nn.Module.

So you could have something like

class Conv3dNoTemporal(nn.Conv3d): ... class Conv3dTheOtherOne(nn.Conv3d): ...

and then, in this check, you do

if isinstance(conv_builder, Conv3dNoTemporal): ds_stride = (1, stride, stride)

?

torchvision/models/video/video_trunk.py

fmassa · 2019-07-18T10:01:09Z

torchvision/models/video/video_trunk.py

+
+class VideoTrunkBuilder(nn.Module):
+
+    def __init__(self, block, conv_makers, model_depth, 


I wonder if we want to expose model_depth or just layers, as in the resnet equivalent? If we expose layers, then the BLOCK_CONFIG can probably go away, and we hard-code the layer configuration inside the function getters

but that would give us quite a 8-10 constructors for each architecture ... isn't this to an extent cleaner?

Well, it depends. When creating a simple training script, we will customize the training script to have a depth argument to be passed to the model, but not all models support that, so this will make experimenting with potentially other backbones harder.

But don't worry about this, I can make a pass afterwards and modify a few things.

torchvision/models/video/video_trunk.py

bjuncek · 2019-07-22T10:10:04Z

@fmassa I've taken a look and modified the diff to reflect the comments. There are two thinks that I'm not a 100% sure about - please check the unresolved comments for potential issues.

fmassa

I have a few more comments. Also, tests seems to be failing?

torchvision/models/video/_utils.py

fmassa

LGTM, thanks!

Can you also add some quick docstrings to the models, with paper references etc?

Thanks!

bjuncek · 2019-07-25T19:06:51Z

@fmassa added the docstrings - double check and merge if everything is ok :)

Thanks a lot for the help!

codecov-io · 2019-07-25T21:15:35Z

Codecov Report

Merging #1130 into master will increase coverage by 0.68%.
The diff coverage is 76.34%.

@@            Coverage Diff             @@
##           master    #1130      +/-   ##
==========================================
+ Coverage   64.89%   65.58%   +0.68%     
==========================================
  Files          68       79      +11     
  Lines        5413     5805     +392     
  Branches      835      888      +53     
==========================================
+ Hits         3513     3807     +294     
- Misses       1643     1731      +88     
- Partials      257      267      +10

Impacted Files	Coverage Δ
torchvision/models/video/_utils.py	`100% <100%> (ø)`
torchvision/models/__init__.py	`100% <100%> (ø)`	⬆️
torchvision/models/video/__init__.py	`100% <100%> (ø)`
torchvision/models/video/mixed_conv.py	`59.25% <59.25%> (ø)`
torchvision/models/video/video_stems.py	`63.63% <63.63%> (ø)`
torchvision/models/video/video_trunk.py	`74.48% <74.48%> (ø)`
torchvision/models/video/r3d.py	`85.71% <85.71%> (ø)`
torchvision/models/video/r2plus1d.py	`85.71% <85.71%> (ø)`
torchvision/datasets/svhn.py	`66% <0%> (-1.31%)`	⬇️
torchvision/datasets/cifar.py	`78.16% <0%> (-0.5%)`	⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8837e0e...9cee7d0. Read the comment docs.

fmassa

Thanks a lot!

[0.4_video] models - initial commit

7dcfaba

bjuncek mentioned this pull request Jul 16, 2019

how to convert pretrianed caffe2 R2Plus1D models to pytorch? facebookresearch/VMZ#41

Closed

fmassa requested changes Jul 18, 2019

View reviewed changes

Bruno Korbar added 2 commits July 22, 2019 03:03

addressing fmassas inline comments

6f55546

pep8 and flake8

0463145

simplify "hacks"

9b7dbe7

fmassa requested changes Jul 24, 2019

View reviewed changes

torchvision/models/video/_utils.py Outdated Show resolved Hide resolved

torchvision/models/video/_utils.py Outdated Show resolved Hide resolved

torchvision/models/video/_utils.py Show resolved Hide resolved

Bruno Korbar added 3 commits July 24, 2019 08:41

sorting out latest comments

7556f7c

nitpick

1df7309

Updated tests and constructors

34cbcc5

fmassa approved these changes Jul 25, 2019

View reviewed changes

Added docstrings - ready to merge

9cee7d0

fmassa approved these changes Jul 26, 2019

View reviewed changes

fmassa merged commit 7c95f97 into pytorch:master Jul 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VideoModelZoo models #1130

Add VideoModelZoo models #1130

bjuncek commented Jul 16, 2019

fmassa left a comment

fmassa Jul 18, 2019

fmassa Jul 18, 2019

bjuncek Jul 19, 2019

fmassa Jul 24, 2019

bjuncek commented Jul 22, 2019

fmassa left a comment

fmassa left a comment

bjuncek commented Jul 25, 2019

codecov-io commented Jul 25, 2019 •

edited

Loading

fmassa left a comment


		class VideoTrunkBuilder(nn.Module):

		def __init__(self, block, conv_makers, model_depth,

Add VideoModelZoo models #1130

Add VideoModelZoo models #1130

Conversation

bjuncek commented Jul 16, 2019

fmassa left a comment

Choose a reason for hiding this comment

fmassa Jul 18, 2019

Choose a reason for hiding this comment

fmassa Jul 18, 2019

Choose a reason for hiding this comment

bjuncek Jul 19, 2019

Choose a reason for hiding this comment

fmassa Jul 24, 2019

Choose a reason for hiding this comment

bjuncek commented Jul 22, 2019

fmassa left a comment

Choose a reason for hiding this comment

fmassa left a comment

Choose a reason for hiding this comment

bjuncek commented Jul 25, 2019

codecov-io commented Jul 25, 2019 • edited Loading

Codecov Report

fmassa left a comment

Choose a reason for hiding this comment

codecov-io commented Jul 25, 2019 •

edited

Loading