Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video transforms #1353

Merged
merged 5 commits into from
Sep 24, 2019
Merged

Video transforms #1353

merged 5 commits into from
Sep 24, 2019

Conversation

stephenyan1231
Copy link
Contributor

@stephenyan1231 stephenyan1231 commented Sep 19, 2019

This PR replaces #1306 because the commit history of that one is polluted.

New features

Implement the following transforms for video clips

  • RandomCropVideo
  • RandomResizedCropVideo
  • CenterCropVideo
  • NormalizeVideo
  • ToTensorVideo
  • RandomHorizontalFlipVideo

Unit test

  • affected image tranfsorms
    • test/test_transforms.py
  • new unit test of video transforms
    • test/test_transforms_video

@codecov-io
Copy link

codecov-io commented Sep 19, 2019

Codecov Report

Merging #1353 into master will increase coverage by 0.51%.
The diff coverage is 90.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1353      +/-   ##
==========================================
+ Coverage   65.47%   65.98%   +0.51%     
==========================================
  Files          75       77       +2     
  Lines        5827     5932     +105     
  Branches      892      900       +8     
==========================================
+ Hits         3815     3914      +99     
- Misses       1742     1746       +4     
- Partials      270      272       +2
Impacted Files Coverage Δ
torchvision/transforms/__init__.py 100% <100%> (ø) ⬆️
torchvision/transforms/transforms.py 80.94% <84.61%> (+0.55%) ⬆️
torchvision/transforms/transforms_video.py 88.88% <88.88%> (ø)
torchvision/transforms/functional_video.py 95.23% <95.23%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f677ea3...161848b. Read the comment docs.

@@ -434,17 +434,17 @@ def __init__(self, size, padding=None, pad_if_needed=False, fill=0, padding_mode
self.padding_mode = padding_mode

@staticmethod
def get_params(img, output_size):
def get_params(w, h, output_size):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some reservations with respect to changing the API of the existing transforms, but I wonder how often this particular one is used externally.

Should we issue a warning maybe (cc @fmassa)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we should not be doing a BC-breaking change here. There are ways of achieving the same thing without breaking BC, see for example https://github.com/pytorch/vision/pull/1104/files#diff-fc1f220b470714d05cf3ea6acf9fed59R34

Copy link
Contributor

@bjuncek bjuncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One high-level reservation that i have is the fact that @fmassa et al were looking into introducing batched tensors, which would render this unnecessary, but I don't know what is the status on that.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR Zhicheng!

I'm thinking about a way of unifying the video and image cases. I'll come back with a proposal in the next day or so

@@ -434,17 +434,17 @@ def __init__(self, size, padding=None, pad_if_needed=False, fill=0, padding_mode
self.padding_mode = padding_mode

@staticmethod
def get_params(img, output_size):
def get_params(w, h, output_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we should not be doing a BC-breaking change here. There are ways of achieving the same thing without breaking BC, see for example https://github.com/pytorch/vision/pull/1104/files#diff-fc1f220b470714d05cf3ea6acf9fed59R34

_is_tensor_video_clip(clip)
if not clip.dtype == torch.uint8:
raise TypeError("clip tensor should have data type uint8. Got %s" % str(clip.dtype))
return clip.float().permute(3, 0, 1, 2) / 255.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll be using memory_format in the data reading functionality, so that this permutation is maybe handled automatically for us, in a safer way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm also thinking about creating a new transform for performing image type conversions, like https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/image/convert_image_dtype , which would let us perform the scaling for different dtypes

@fmassa
Copy link
Member

fmassa commented Sep 24, 2019

I will be merging this PR as is for now to unblock @stephenyan1231 , but I'll be making changes to how things are structured in a follow-up PR.

@fmassa fmassa merged commit 64917bc into pytorch:master Sep 24, 2019
facebook-github-bot pushed a commit to facebookresearch/ClassyVision that referenced this pull request Oct 16, 2019
Summary:
Pull Request resolved: #62

Current dependency torchvision 0.4.0 was released in August.
It missed quite a few PRs that are merged after that, and that are needed for video classification, such as

- pytorch/vision#1437
- pytorch/vision#1431
- pytorch/vision#1423
- pytorch/vision#1418
- pytorch/vision#1408
- pytorch/vision#1376
- pytorch/vision#1363
- pytorch/vision#1353
- pytorch/vision#1303

This will fail the CI test when a diff uses changes made in those PRs.
Before a new official version of TorchVision is released, we can temporarily use the nightly torchvision to get all the recent PRs, and unblock the PR merging.
We plan to use a fixed version of TorchVision later.

Reviewed By: vreis

Differential Revision: D17944239

fbshipit-source-id: 86ff540e3fc4f08ef767e84ef103525db5158201
@fmassa fmassa mentioned this pull request Oct 31, 2019
fmassa pushed a commit that referenced this pull request Oct 31, 2019
* video transforms

* [video transforms]in ToTensorVideo, divide value by 255.0

* [video transforms] fix a bug

* fix linting

* Make changes backwards-compatible
@fepegar
Copy link
Contributor

fepegar commented May 15, 2020

Are these documented?

@fepegar
Copy link
Contributor

fepegar commented May 15, 2020

I suppose that not yet but they will be :) #1429

@fmassa
Copy link
Member

fmassa commented May 22, 2020

@fepegar exactly, the video transforms will probably be unified with the image transforms, so that you can seamlessly use the same transform for both data types.

@pulkitkumar95
Copy link

Hey @fmassa, any update on the unification and doc updation for video transform?

@fmassa
Copy link
Member

fmassa commented Jun 22, 2020

@pulkitkumar95 unification is happening, but a bit slower than initially planned. See #2282 for the approach we will be tackling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants