Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VideoModelZoo models #1130

Merged
merged 8 commits into from
Jul 26, 2019
Merged

Add VideoModelZoo models #1130

merged 8 commits into from
Jul 26, 2019

Conversation

bjuncek
Copy link
Contributor

@bjuncek bjuncek commented Jul 16, 2019

Following up on #1077 in adding video capabilities, here is a port of the VMZ models, described in Du Tran's A Closer Look at Spatiotemporal Convolutions for Action Recognition. All of the models have been tested on Kinetics and they get within 1% of the reported results or better on Kinetics.

This is a very early commit, mostly looking for feedback from maintainers.

cc @fmassa @dutran

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, thanks a lot for the PR Bruno!

I have done a first pass and made a few comments.
Also, lint is failing, can you have a look?

torchvision/models/video/video_stems.py Outdated Show resolved Hide resolved
torchvision/models/video/video_stems.py Outdated Show resolved Hide resolved
if stride != 1 or self.inplanes != planes * block.expansion:
ds_stride = stride
# 2D convolutions should not be downsampled along temporal axis
if conv_builder.__name__ == "video_2d_conv":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a pattern that I generally see in the PyTorch codebase.

What about the following: instead of having conv_builder be a function that returns a nn.Module instance, what about making it a class that inherits from nn.Module.

So you could have something like

class Conv3dNoTemporal(nn.Conv3d):
    ...

class Conv3dTheOtherOne(nn.Conv3d):
    ...

and then, in this check, you do

if isinstance(conv_builder, Conv3dNoTemporal):
    ds_stride = (1, stride, stride)

?

torchvision/models/video/video_trunk.py Outdated Show resolved Hide resolved

class VideoTrunkBuilder(nn.Module):

def __init__(self, block, conv_makers, model_depth,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to expose model_depth or just layers, as in the resnet equivalent? If we expose layers, then the BLOCK_CONFIG can probably go away, and we hard-code the layer configuration inside the function getters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that would give us quite a 8-10 constructors for each architecture ... isn't this to an extent cleaner?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it depends. When creating a simple training script, we will customize the training script to have a depth argument to be passed to the model, but not all models support that, so this will make experimenting with potentially other backbones harder.

But don't worry about this, I can make a pass afterwards and modify a few things.

torchvision/models/video/video_trunk.py Outdated Show resolved Hide resolved
@bjuncek
Copy link
Contributor Author

bjuncek commented Jul 22, 2019

@fmassa I've taken a look and modified the diff to reflect the comments. There are two thinks that I'm not a 100% sure about - please check the unresolved comments for potential issues.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more comments. Also, tests seems to be failing?

torchvision/models/video/_utils.py Outdated Show resolved Hide resolved
torchvision/models/video/_utils.py Outdated Show resolved Hide resolved
torchvision/models/video/_utils.py Show resolved Hide resolved
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Can you also add some quick docstrings to the models, with paper references etc?

Thanks!

@bjuncek
Copy link
Contributor Author

bjuncek commented Jul 25, 2019

@fmassa added the docstrings - double check and merge if everything is ok :)

Thanks a lot for the help!

@codecov-io
Copy link

codecov-io commented Jul 25, 2019

Codecov Report

Merging #1130 into master will increase coverage by 0.68%.
The diff coverage is 76.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1130      +/-   ##
==========================================
+ Coverage   64.89%   65.58%   +0.68%     
==========================================
  Files          68       79      +11     
  Lines        5413     5805     +392     
  Branches      835      888      +53     
==========================================
+ Hits         3513     3807     +294     
- Misses       1643     1731      +88     
- Partials      257      267      +10
Impacted Files Coverage Δ
torchvision/models/video/_utils.py 100% <100%> (ø)
torchvision/models/__init__.py 100% <100%> (ø) ⬆️
torchvision/models/video/__init__.py 100% <100%> (ø)
torchvision/models/video/mixed_conv.py 59.25% <59.25%> (ø)
torchvision/models/video/video_stems.py 63.63% <63.63%> (ø)
torchvision/models/video/video_trunk.py 74.48% <74.48%> (ø)
torchvision/models/video/r3d.py 85.71% <85.71%> (ø)
torchvision/models/video/r2plus1d.py 85.71% <85.71%> (ø)
torchvision/datasets/svhn.py 66% <0%> (-1.31%) ⬇️
torchvision/datasets/cifar.py 78.16% <0%> (-0.5%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8837e0e...9cee7d0. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@fmassa fmassa merged commit 7c95f97 into pytorch:master Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants