-
Notifications
You must be signed in to change notification settings - Fork 75
Implement RandomCrop transform #1070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| int x = checkedToPositiveInt(cropTransformSpec[3]); | ||
| int y = checkedToPositiveInt(cropTransformSpec[4]); | ||
| int x = checkedToNonNegativeInt(cropTransformSpec[3]); | ||
| int y = checkedToNonNegativeInt(cropTransformSpec[4]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The location (0, 0) is a valid image location. 🤦
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scotts , this looks great!
| ]: | ||
| frame_random_crop = decoder_random_crop[frame_index] | ||
| frame_random_crop_tv = decoder_random_crop_tv[frame_index] | ||
| assert_frames_equal(frame_random_crop, frame_random_crop_tv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention it earlier but almost everywhere in this file we might want to use torch.testing.assert_close instead of our own assert_frames_equal, because assert_frames_equal is less strict on other platforms like GPU (not supported yet by transforms) or Mac, Windows, and potentially soon aarch64.
But since we want bitwise equality against TV, we should probable use torch.testing.assert_close. It ensures strict bitwise equality on uint8.
(this can be a TODO)
test/test_transform_ops.py
Outdated
| left=tc_random_crop._left, | ||
| height=tc_random_crop.size[0], | ||
| width=tc_random_crop.size[1], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Accessing RandomCrop's internal state]
We want to know what the random values that RandomCrop used so we can do the exact same logic with the TorchVision functional, but the actual values are stored deep in the C++ layer in SingleStreamDecoder's copy of the transform objects. Just storing the values in the DecoderTransform object seems simpler with less consequences.
I'm happy to consider cleaner solutions here, but I can't think of one. Because we only access _top and _left during testing, the TorchCodec version of RandomCrop is still stateless in the ways that matter, same as the TorchVision transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have two alternative options:
- Create a seeded v2.RandomCrop() object and apply it to the
full_frame, in each iteration of the loop. This makes sure the RNG is consistently the same for all frames.
for frame_index in ...:
# ...
torch.manual_seed(seed)
frame_tv = v2.RandomCrop(size=(height, width))(decoder_full[frame_index])
assert_frames_equal(frame_random_crop, frame_tv)- Sa
torch.manual_seed(seed)
tv_crop = v2.RandomCrop(size=(height, width))
params = tv_crop.make_params(torch.empty(3, video.get_height(), video.get_width()))
for frame_index in ...:
# ...
frame_full = decoder_full[frame_index]
frame_tv = v2.functional.crop(
frame_full,
top=params["top"],
left=params["left"],
height=params["height"],
width=params["width"],
)
assert_frames_equal(frame_random_crop, frame_tv)I far prefer the first one as it's simpler. The second one is kinda replicating the internal implementation of a TV transform.
Both suggestions above are stricter tests than the current one. The current one is merely testing that TV's crop will crop at the same place as TC's crop, when provided the same crop parameters (_top and _left).
But because _top_ and _left come from TC, this test isn't validating that the RNG of TC is the same as the RNG of TV! Both options above ensure that.
| # record them for testing purposes only. | ||
| _top: Optional[int] = None | ||
| _left: Optional[int] = None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For where these values are used, see my comment with the text [Accessing RandomCrop's internal state].
test/test_transform_ops.py
Outdated
| left=tc_random_crop._left, | ||
| height=tc_random_crop.size[0], | ||
| width=tc_random_crop.size[1], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have two alternative options:
- Create a seeded v2.RandomCrop() object and apply it to the
full_frame, in each iteration of the loop. This makes sure the RNG is consistently the same for all frames.
for frame_index in ...:
# ...
torch.manual_seed(seed)
frame_tv = v2.RandomCrop(size=(height, width))(decoder_full[frame_index])
assert_frames_equal(frame_random_crop, frame_tv)- Sa
torch.manual_seed(seed)
tv_crop = v2.RandomCrop(size=(height, width))
params = tv_crop.make_params(torch.empty(3, video.get_height(), video.get_width()))
for frame_index in ...:
# ...
frame_full = decoder_full[frame_index]
frame_tv = v2.functional.crop(
frame_full,
top=params["top"],
left=params["left"],
height=params["height"],
width=params["width"],
)
assert_frames_equal(frame_random_crop, frame_tv)I far prefer the first one as it's simpler. The second one is kinda replicating the internal implementation of a TV transform.
Both suggestions above are stricter tests than the current one. The current one is merely testing that TV's crop will crop at the same place as TC's crop, when provided the same crop parameters (_top and _left).
But because _top_ and _left come from TC, this test isn't validating that the RNG of TC is the same as the RNG of TV! Both options above ensure that.
test/test_transform_ops.py
Outdated
| # location. | ||
| _ = random_crop._make_transform_spec((1000, 1000)) | ||
| assert first_top != random_crop._top | ||
| assert first_left != random_crop._left |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll be able to remove _top and _left, so we may need to re-implement this test in another way.
Separately, I think we will also want to make sure that changing the input dims changes the output behavior. The current test passes (1000, 1000) twice.
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scotts !
| we want that to still work. | ||
| Note: This method is the moral equivalent of TorchVision's | ||
| `Transformer.make_params()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, transformers are a different thing
| `Transformer.make_params()`. | |
| `Transform.make_params()`. |
Implements
torchcodec.transforms.RandomCropand also acceptstorchvision.transforms.v2.RandomCrop. The key difference between this capability andResizeis that we need to:Short version of how we accomplish this:
make_params()on it to get the computed location.Working on this transform also made me realize that
DecoderTransformand its subclasses are not dataclasses. I initially thought they would just be bags of values. But they're growing to have significant methods and internal state not exposed to users. In a follow-up PR, I'll refactor these into normal classes, much like the TorchVision versions. I felt that was too disruptive to do in this PR.