Skip to content

Conversation

@scotts
Copy link
Contributor

@scotts scotts commented Nov 21, 2025

Implements torchcodec.transforms.RandomCrop and also accepts torchvision.transforms.v2.RandomCrop. The key difference between this capability and Resize is that we need to:

  1. Compute a random location in the image to crop.
  2. And it must match exactly what TorchVision does.

Short version of how we accomplish this:

  1. If you give us the TorchVision object, we call make_params() on it to get the computed location.
  2. If you don't, we do the same calculation in TorchCodec. We'll need to use testing and code review to make sure these stay aligned.

Working on this transform also made me realize that DecoderTransform and its subclasses are not dataclasses. I initially thought they would just be bags of values. But they're growing to have significant methods and internal state not exposed to users. In a follow-up PR, I'll refactor these into normal classes, much like the TorchVision versions. I felt that was too disruptive to do in this PR.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 21, 2025
int x = checkedToPositiveInt(cropTransformSpec[3]);
int y = checkedToPositiveInt(cropTransformSpec[4]);
int x = checkedToNonNegativeInt(cropTransformSpec[3]);
int y = checkedToNonNegativeInt(cropTransformSpec[4]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location (0, 0) is a valid image location. 🤦

@scotts scotts marked this pull request as ready for review November 22, 2025 03:05
@scotts scotts changed the title [WIP] Implement RandomCrop transform Implement RandomCrop transform Nov 22, 2025
Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scotts , this looks great!

]:
frame_random_crop = decoder_random_crop[frame_index]
frame_random_crop_tv = decoder_random_crop_tv[frame_index]
assert_frames_equal(frame_random_crop, frame_random_crop_tv)
Copy link
Contributor

@NicolasHug NicolasHug Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention it earlier but almost everywhere in this file we might want to use torch.testing.assert_close instead of our own assert_frames_equal, because assert_frames_equal is less strict on other platforms like GPU (not supported yet by transforms) or Mac, Windows, and potentially soon aarch64.

But since we want bitwise equality against TV, we should probable use torch.testing.assert_close. It ensures strict bitwise equality on uint8.

(this can be a TODO)

left=tc_random_crop._left,
height=tc_random_crop.size[0],
width=tc_random_crop.size[1],
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Accessing RandomCrop's internal state]

We want to know what the random values that RandomCrop used so we can do the exact same logic with the TorchVision functional, but the actual values are stored deep in the C++ layer in SingleStreamDecoder's copy of the transform objects. Just storing the values in the DecoderTransform object seems simpler with less consequences.

I'm happy to consider cleaner solutions here, but I can't think of one. Because we only access _top and _left during testing, the TorchCodec version of RandomCrop is still stateless in the ways that matter, same as the TorchVision transform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two alternative options:

  1. Create a seeded v2.RandomCrop() object and apply it to the full_frame, in each iteration of the loop. This makes sure the RNG is consistently the same for all frames.
        for frame_index in ...:
           # ...
            torch.manual_seed(seed)
            frame_tv = v2.RandomCrop(size=(height, width))(decoder_full[frame_index])
            assert_frames_equal(frame_random_crop, frame_tv)
  1. Sa
        torch.manual_seed(seed)
        tv_crop = v2.RandomCrop(size=(height, width))
        params = tv_crop.make_params(torch.empty(3, video.get_height(), video.get_width()))

        for frame_index in ...:
        # ...
            frame_full = decoder_full[frame_index]
            frame_tv = v2.functional.crop(
                frame_full,
                top=params["top"],
                left=params["left"],
                height=params["height"],
                width=params["width"],
            )
            assert_frames_equal(frame_random_crop, frame_tv)

I far prefer the first one as it's simpler. The second one is kinda replicating the internal implementation of a TV transform.


Both suggestions above are stricter tests than the current one. The current one is merely testing that TV's crop will crop at the same place as TC's crop, when provided the same crop parameters (_top and _left).
But because _top_ and _left come from TC, this test isn't validating that the RNG of TC is the same as the RNG of TV! Both options above ensure that.

# record them for testing purposes only.
_top: Optional[int] = None
_left: Optional[int] = None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For where these values are used, see my comment with the text [Accessing RandomCrop's internal state].

left=tc_random_crop._left,
height=tc_random_crop.size[0],
width=tc_random_crop.size[1],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two alternative options:

  1. Create a seeded v2.RandomCrop() object and apply it to the full_frame, in each iteration of the loop. This makes sure the RNG is consistently the same for all frames.
        for frame_index in ...:
           # ...
            torch.manual_seed(seed)
            frame_tv = v2.RandomCrop(size=(height, width))(decoder_full[frame_index])
            assert_frames_equal(frame_random_crop, frame_tv)
  1. Sa
        torch.manual_seed(seed)
        tv_crop = v2.RandomCrop(size=(height, width))
        params = tv_crop.make_params(torch.empty(3, video.get_height(), video.get_width()))

        for frame_index in ...:
        # ...
            frame_full = decoder_full[frame_index]
            frame_tv = v2.functional.crop(
                frame_full,
                top=params["top"],
                left=params["left"],
                height=params["height"],
                width=params["width"],
            )
            assert_frames_equal(frame_random_crop, frame_tv)

I far prefer the first one as it's simpler. The second one is kinda replicating the internal implementation of a TV transform.


Both suggestions above are stricter tests than the current one. The current one is merely testing that TV's crop will crop at the same place as TC's crop, when provided the same crop parameters (_top and _left).
But because _top_ and _left come from TC, this test isn't validating that the RNG of TC is the same as the RNG of TV! Both options above ensure that.

# location.
_ = random_crop._make_transform_spec((1000, 1000))
assert first_top != random_crop._top
assert first_left != random_crop._left
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll be able to remove _top and _left, so we may need to re-implement this test in another way.

Separately, I think we will also want to make sure that changing the input dims changes the output behavior. The current test passes (1000, 1000) twice.

Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scotts !

we want that to still work.
Note: This method is the moral equivalent of TorchVision's
`Transformer.make_params()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, transformers are a different thing

Suggested change
`Transformer.make_params()`.
`Transform.make_params()`.

@scotts scotts merged commit 38fa96c into meta-pytorch:main Dec 3, 2025
64 checks passed
@scotts scotts deleted the random_crop branch December 3, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants