Implement RandomCrop transform #1070

scotts · 2025-11-21T03:35:14Z

Implements torchcodec.transforms.RandomCrop and also accepts torchvision.transforms.v2.RandomCrop. The key difference between this capability and Resize is that we need to:

Compute a random location in the image to crop.
And it must match exactly what TorchVision does.

Short version of how we accomplish this:

If you give us the TorchVision object, we call make_params() on it to get the computed location.
If you don't, we do the same calculation in TorchCodec. We'll need to use testing and code review to make sure these stay aligned.

Working on this transform also made me realize that DecoderTransform and its subclasses are not dataclasses. I initially thought they would just be bags of values. But they're growing to have significant methods and internal state not exposed to users. In a follow-up PR, I'll refactor these into normal classes, much like the TorchVision versions. I felt that was too disruptive to do in this PR.

scotts · 2025-11-22T03:04:33Z

src/torchcodec/_core/custom_ops.cpp

-  int x = checkedToPositiveInt(cropTransformSpec[3]);
-  int y = checkedToPositiveInt(cropTransformSpec[4]);
+  int x = checkedToNonNegativeInt(cropTransformSpec[3]);
+  int y = checkedToNonNegativeInt(cropTransformSpec[4]);


The location (0, 0) is a valid image location. 🤦

src/torchcodec/transforms/_decoder_transforms.py

src/torchcodec/decoders/_video_decoder.py

NicolasHug

Thanks @scotts , this looks great!

src/torchcodec/decoders/_video_decoder.py

src/torchcodec/transforms/_decoder_transforms.py

test/test_transform_ops.py

NicolasHug · 2025-11-28T14:29:22Z

test/test_transform_ops.py

+        ]:
+            frame_random_crop = decoder_random_crop[frame_index]
+            frame_random_crop_tv = decoder_random_crop_tv[frame_index]
+            assert_frames_equal(frame_random_crop, frame_random_crop_tv)


I forgot to mention it earlier but almost everywhere in this file we might want to use torch.testing.assert_close instead of our own assert_frames_equal, because assert_frames_equal is less strict on other platforms like GPU (not supported yet by transforms) or Mac, Windows, and potentially soon aarch64.

But since we want bitwise equality against TV, we should probable use torch.testing.assert_close. It ensures strict bitwise equality on uint8.

(this can be a TODO)

scotts · 2025-12-01T19:09:57Z

test/test_transform_ops.py

+                left=tc_random_crop._left,
+                height=tc_random_crop.size[0],
+                width=tc_random_crop.size[1],
+            )


[Accessing RandomCrop's internal state]

We want to know what the random values that RandomCrop used so we can do the exact same logic with the TorchVision functional, but the actual values are stored deep in the C++ layer in SingleStreamDecoder's copy of the transform objects. Just storing the values in the DecoderTransform object seems simpler with less consequences.

I'm happy to consider cleaner solutions here, but I can't think of one. Because we only access _top and _left during testing, the TorchCodec version of RandomCrop is still stateless in the ways that matter, same as the TorchVision transform.

We have two alternative options:

Create a seeded v2.RandomCrop() object and apply it to the full_frame, in each iteration of the loop. This makes sure the RNG is consistently the same for all frames.

for frame_index in ...: # ... torch.manual_seed(seed) frame_tv = v2.RandomCrop(size=(height, width))(decoder_full[frame_index]) assert_frames_equal(frame_random_crop, frame_tv)

Sa

torch.manual_seed(seed) tv_crop = v2.RandomCrop(size=(height, width)) params = tv_crop.make_params(torch.empty(3, video.get_height(), video.get_width())) for frame_index in ...: # ... frame_full = decoder_full[frame_index] frame_tv = v2.functional.crop( frame_full, top=params["top"], left=params["left"], height=params["height"], width=params["width"], ) assert_frames_equal(frame_random_crop, frame_tv)

I far prefer the first one as it's simpler. The second one is kinda replicating the internal implementation of a TV transform.

Both suggestions above are stricter tests than the current one. The current one is merely testing that TV's crop will crop at the same place as TC's crop, when provided the same crop parameters (_top and _left).
But because _top_ and _left come from TC, this test isn't validating that the RNG of TC is the same as the RNG of TV! Both options above ensure that.

scotts · 2025-12-01T19:10:29Z

src/torchcodec/transforms/_decoder_transforms.py

+    # record them for testing purposes only.
+    _top: Optional[int] = None
+    _left: Optional[int] = None
+


For where these values are used, see my comment with the text [Accessing RandomCrop's internal state].

src/torchcodec/transforms/_decoder_transforms.py

src/torchcodec/decoders/_video_decoder.py

NicolasHug · 2025-12-03T11:52:29Z

test/test_transform_ops.py

+                left=tc_random_crop._left,
+                height=tc_random_crop.size[0],
+                width=tc_random_crop.size[1],
+            )


We have two alternative options:

Create a seeded v2.RandomCrop() object and apply it to the full_frame, in each iteration of the loop. This makes sure the RNG is consistently the same for all frames.

for frame_index in ...: # ... torch.manual_seed(seed) frame_tv = v2.RandomCrop(size=(height, width))(decoder_full[frame_index]) assert_frames_equal(frame_random_crop, frame_tv)

Sa

torch.manual_seed(seed) tv_crop = v2.RandomCrop(size=(height, width)) params = tv_crop.make_params(torch.empty(3, video.get_height(), video.get_width())) for frame_index in ...: # ... frame_full = decoder_full[frame_index] frame_tv = v2.functional.crop( frame_full, top=params["top"], left=params["left"], height=params["height"], width=params["width"], ) assert_frames_equal(frame_random_crop, frame_tv)

I far prefer the first one as it's simpler. The second one is kinda replicating the internal implementation of a TV transform.

Both suggestions above are stricter tests than the current one. The current one is merely testing that TV's crop will crop at the same place as TC's crop, when provided the same crop parameters (_top and _left).
But because _top_ and _left come from TC, this test isn't validating that the RNG of TC is the same as the RNG of TV! Both options above ensure that.

NicolasHug · 2025-12-03T11:54:59Z

test/test_transform_ops.py

+        # location.
+        _ = random_crop._make_transform_spec((1000, 1000))
+        assert first_top != random_crop._top
+        assert first_left != random_crop._left


I think we'll be able to remove _top and _left, so we may need to re-implement this test in another way.

Separately, I think we will also want to make sure that changing the input dims changes the output behavior. The current test passes (1000, 1000) twice.

NicolasHug

Thanks @scotts !

NicolasHug · 2025-12-03T17:25:53Z

src/torchcodec/transforms/_decoder_transforms.py

+                we want that to still work.
+
+                Note: This method is the moral equivalent of TorchVision's
+                `Transformer.make_params()`.


Nit, transformers are a different thing

Suggested change

`Transformer.make_params()`.

`Transform.make_params()`.

scotts added 3 commits November 17, 2025 06:17

Committing to move on

a8a8cea

Merge branch 'main' of github.com:pytorch/torchcodec into random_crop

af2e1ab

It... works?

aa15765

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 21, 2025

scotts added 3 commits November 20, 2025 19:37

Lint

fd8f7a5

Docstrings, better error checking, better testing

7e43313

Way more defensive programming

8e6a8f2

scotts commented Nov 22, 2025

View reviewed changes

scotts marked this pull request as ready for review November 22, 2025 03:05

scotts changed the title ~~[WIP] Implement RandomCrop transform~~ Implement RandomCrop transform Nov 22, 2025

Dan-Flores reviewed Nov 24, 2025

View reviewed changes

src/torchcodec/transforms/_decoder_transforms.py Outdated Show resolved Hide resolved

Dan-Flores reviewed Nov 24, 2025

View reviewed changes

src/torchcodec/decoders/_video_decoder.py Outdated Show resolved Hide resolved

NicolasHug reviewed Nov 28, 2025

View reviewed changes

Refactor all the things

d8b7ed0

scotts commented Dec 1, 2025

View reviewed changes

scotts added 3 commits December 1, 2025 11:18

Comment formatting pedantry

705d1ef

Type checking

62eb585

Merge branch 'main' of github.com:pytorch/torchcodec into random_crop

817b1f8

scotts mentioned this pull request Dec 1, 2025

Refactor decoder transforms #1081

Merged

NicolasHug reviewed Dec 2, 2025

View reviewed changes

src/torchcodec/transforms/_decoder_transforms.py Show resolved Hide resolved

Simplify; handle pipelines

f8844f4

NicolasHug reviewed Dec 3, 2025

View reviewed changes

scotts added 3 commits December 3, 2025 08:18

Moare refactor pleasze

2fed9c3

Better doc strings

e736742

Comment

6432727

NicolasHug approved these changes Dec 3, 2025

View reviewed changes

Transformer -> Transform

be8ed26

scotts merged commit 38fa96c into meta-pytorch:main Dec 3, 2025
64 checks passed

scotts deleted the random_crop branch December 3, 2025 20:27

Implement RandomCrop transform #1070

Implement RandomCrop transform #1070

Conversation

scotts commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scotts Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scotts commented Nov 21, 2025 •

edited

Loading

NicolasHug Nov 28, 2025 •

edited

Loading