refactor prototype transforms functional tests #5879

pmeier · 2022-04-26T06:48:47Z

This PR refactors our prototype transforms functional tests. They are currently located in test/test_prototype_transforms_functional.py. To ease the reviewing process I've added a new test/test_prototype_transforms_kernels.py module that contains the refactored tests from this PR. In the end that should replace most parts of the old file, but doing it in a separate module avoids GH diff hell.

Status quo

Our current implementation was the first attempt to automate our tests. I took some inspiration from the OpInfo framework from PyTorch core. The basic idea is to define a FunctionalInfo's for each functional that stores some metadata about it.

The most important info (and for now the only metadata we store) is the sample_inputs_fn. It yields call arguments

With that we can write common tests that can be @pytest.mark.parametrize'd over the kernel-call-args combinations. For example, a test that checks the torch.jit.script'ed output against its eager counterpart looks like this

vision/test/test_prototype_transforms_functional.py

Lines 583 to 595 in f36f351

    
           @pytest.mark.parametrize( 
        
               ("functional_info", "sample_input"), 
        
               [ 
        
                   pytest.param(functional_info, sample_input, id=f"{functional_info.name}-{idx}") 
        
                   for functional_info in FUNCTIONAL_INFOS 
        
                   for idx, sample_input in enumerate(functional_info.sample_inputs()) 
        
               ], 
        
           ) 
        
           def test_eager_vs_scripted(functional_info, sample_input): 
        
               eager = functional_info(sample_input) 
        
               scripted = jit.script(functional_info.functional)(*sample_input.args, **sample_input.kwargs) 
        
               torch.testing.assert_close(eager, scripted)

Pros / Cons

This architecture has two main benefits over manually writing these tests:

It is trivial to add a new common test for all functionals.
Writing comprehensive tests for a new functional is reduced to defining call args.

Plus, and in contrast to the OpInfo's from PyTorch core, the test are easier to debug, since we use @pytest.mark.parametrize over for loops in the test body to iterate over the call args. If one of our tests fails, we can reproduce the parametrization from the log, whereas in PyTorch core one needs manually find which call args are responsible for the failure.

However, there are also downsides:

Using @pytest.mark.parametrize instantiates everything upfront. Especially with tensor inputs that can quickly become a big chunk of memory. Right now test/test_prototype_transforms_functional.py includes ~23k tests that instantiate tensors during collection (pytest --co test/test_prototype_transforms_functional.py::test_eager_vs_scripted). They all come from a common test, namely eager vs scripted. If we add more tests or more ops, this number will grow fast. This is the reason why PyTorch core does not rely on parametrization for their sample inputs, but rather fall back to for loops inside the test.
Having only one kind of sample inputs. Although each test might have different needs with regard to the sample inputs, we only have one generating function. This means that we need to support all needs with it and in turn unnecessary blow up the time to run all tests. For example, to test whether the output of the scripted affine_image_tensor kernel matches its eager counterpart, we only need to test a single set of affine parameters (as long as there is no branching based on them). However, for reference testing, we should test multiple parameter sets to make sure the kernel actually behaves like its reference.

Design goal

This PR sets out to solve the problems detailed above while retaining all the positive aspects of the current implementation.

Introduce the TensorLoader class: it wraps another callable that in the end will create the tensor, but it knows the shape, dtype and possible other feature metadata ahead of time. The device will only be passed at runtime to allow us to parametrize over different devices. With this we can continue to rely on the tensor attributes during sample input generation, e.g.

vision/test/test_prototype_transforms_functional.py

Line 248 in b5c961d

height, width = image.shape[-2:]

vision/test/test_prototype_transforms_functional.py

Line 264 in b5c961d

height, width = bounding_box.image_size

without actually instantiating the tensors.

At test time, the tensor can simply instantiated with TensorLoader(...).load(device). For convenience, ArgsKwargs was made aware of TensorLoader and got a .load(device) method as well. With these, the common tests will look somewhat like this:
```
@pytest.mark.parametrize(
    ("info", "args_kwargs"), [
        (info, args_kwargs) 
        for info in KERNEL_INFOS 
        for args_kwargs in info.sample_inputs
    ]
)
@pytest.mark.parametrize("device", ["cpu", "cuda"])
def test_smoke(info, args_kwargs, device):
    args, kwargs = args_kwargs.load(device)

    result = info.kernel(*args, **kwargs)
```
This approach of "lazy loading" is similar to the concept of lazy tensors although stripping everything we don't need. To avoid confusion, I preferred to use the term "load" over "lazy" here.
Introduce a reference_inputs_fn alongside the sample_inputs_fn. As the name implies, the former will only be used for reference tests and should be comprehensive with respect to the tested values. In contrast, the sample_inputs_fn should only cover all valid code paths. This is on par with PyTorch core does with their OpInfo framework although they have even more diverse sample inputs functions, like the error_inputs_func.

Limitations

There are two things that are not included in the current design:

Reference test against fixed inputs. The reference tests explained above work by using one function that generates sample inputs and passing them to the kernel as well as a reference function. Thus, we only need to parametrize over one set of sample inputs and the reference outputs only get computed at runtime. To support fixed references, we would need a map of sample inputs to their fixed reference and thus defeating the whole part of the design to nto instantiate tensors at collection. Fortunately, we don't need to force everything into the framework proposed here and can simply have separate "manual" tests for this. Of course this also applies to testing error / warning inputs although in the future we could also integrate that into the framework.
Tests for high level functionals aka dispatchers. As the name implies, they only dispatch, so it would be quite a waste of resources to test them again with sample inputs if we already tested the kernels. I think we should implement something like "if I put in a PIL Image, the PIL kernel gets called and its output gets returned". However, this is out of scope for this PR.

Todo

This PR mostly introduces the new framework while adding some kernels as examples. There are three ways to add the remaining ones:

Finish everything in this PR.
Merge this PR and add follow-up PRs for the remaining kernels. This is possible without ripping large gaps in our CI since I've added the new tests in a new module. The last PR of this series could be a clean up to remove all the then duplicated tests in the old module.
Use this PR as feature branch and add more PRs against this until we are finished.

My preference is 3. -> 2. -> 1. but I'll leave that up to the reviewers. Here is the list of what kernels are done or missing:

clamp_bounding_box
convert_bounding_box_format
convert_color_space
- image
adjust_brightness
- image
adjust_contrast
- image
adjust_gamma
- image
adjust_hue
- image
adjust_saturation
- image
adjust_sharpness
- image
autocontrast
- image
equalize
- image
invert
- image
posterize
- image
solarize
- image
affine
- bounding_box
- image
- segmentation_mask
center_crop
- bounding_box
- image
- segmentation_mask
crop
- bounding_box
- image
- segmentation_mask
elastic
- bounding_box
- image
- segmentation_mask
five_crop
- image
horizontal_flip
- bounding_box
- image
- segmentation_mask
pad
- bounding_box
- image
- segmentation_mask
perspective
- bounding_box
- image
- segmentation_mask
resize
- bounding_box
- image
- segmentation_mask
resized_crop
- bounding_box
- image
- segmentation_mask
rotate
- bounding_box
- image
- segmentation_mask
ten_crop
- bounding_box
- image
- segmentation_mask
vertical_flip
- bounding_box
- image
- segmentation_mask

test/test_prototype_transforms_functional.py

pmeier · 2022-09-12T08:25:46Z

test/prototype_common_utils.py

+__all__ = ["assert_close"]
+
+
+class PILImagePair(TensorLikePair):


This is a superset of what the old ImagePair did. It includes the options to only test the aggregated difference or check the percentage of differing pixels. That is on par with what we are currently doing in our stable functional tests:

vision/test/common_utils.py

Lines 172 to 174 in a67cc87

def _assert_approx_equal_tensor_to_pil(

tensor, pil_image, tol=1e-5, msg=None, agg_method="mean", allowed_percentage_diff=None

):

test/prototype_common_utils.py

pmeier · 2022-09-12T08:29:30Z

test/prototype_common_utils.py

+def from_loader(loader_fn):
+    def wrapper(*args, **kwargs):
+        loader = loader_fn(*args, **kwargs)
+        return loader.load(kwargs.get("device", "cpu"))
+
+    return wrapper
+
+
+def from_loaders(loaders_fn):
+    def wrapper(*args, **kwargs):
+        loaders = loaders_fn(*args, **kwargs)
+        for loader in loaders:
+            yield loader.load(kwargs.get("device", "cpu"))
+
+    return wrapper


These functions are mostly for "BC" with our current tests. With them we can turn make_*_loader{s} back into make_*. For example, make_images = from_loaders(make_image_loaders). This makes the transition period easer, since we don't need to touch the old files.

In the future, most tests should use the loader architecture. Those that don't could simply invoke TensorLoader(...).load(device) manually.

test/prototype_common_utils.py

test/test_prototype_transforms_kernels.py

pmeier · 2022-09-12T08:37:54Z

test/test_prototype_transforms_kernels.py

+        return self.kernel.__name__
+
+
+def pil_reference_wrapper(pil_kernel):


The reference defined in the KernelInfo will be passed the same inputs as the kernel. Since we use the PIL kernel as reference for its tensor counterpart, this is simple wrapper to avoid defining the same kind of reference function over and over.

pmeier · 2022-09-12T08:38:47Z

test/test_prototype_transforms_kernels.py

+
+def sample_inputs_horizontal_flip_image_tensor():
+    for image_loader in make_image_loaders(dtypes=[torch.float32]):
+        yield ArgsKwargs(image_loader.unwrap())


Similar to make_image_loaders, these functions don't need to yield, but it makes the definition less verbose.

test/test_prototype_transforms_kernels.py

pmeier · 2022-09-12T10:06:06Z

@vfdev-5 The failure comes from a test you wanted to fix:

=================================== FAILURES ===================================
_ test_correctness_elastic_image_or_mask_tensor[elastic_segmentation_mask-make_segmentation_masks-cpu] _
Traceback (most recent call last):
  File "/home/runner/work/vision/vision/test/test_prototype_transforms_functional.py", line 1643, in test_correctness_elastic_image_or_mask_tensor
    sample[..., in_box[3] - 1, in_box[0]] = torch.arange(20, 20 + c)
  File "/home/runner/work/vision/vision/torchvision/prototype/features/_feature.py", line 87, in __torch_function__
    output = func(*args, **kwargs)
IndexError: index 34 is out of bounds for dimension 1 with size 25
----------------------------- Captured stdout call -----------------------------
torch.Size([1, 64, 76])
torch.Size([4, 1, 64, 76])
torch.Size([0, 64, 76])
torch.Size([4, 0, 64, 76])
torch.Size([9, 64, 76])
torch.Size([4, 4, 64, 76])
torch.Size([1, 25, 18])

Given that this PR does not touch this test at all, my best guess is that it was flaky before and depended on a random seed. Plus, we seem to have missed a debug statement:

vision/test/test_prototype_transforms_functional.py

Line 1659 in cac4e22

print(sample.shape)

vfdev-5 · 2022-09-12T17:17:44Z

The failure comes from a test you wanted to fix:

@pmeier it was fixed. The issue is with input image size that is too small for given bounding boxes. Probably, you changed that again somewhere.

test/test_prototype_transforms_kernels.py

pmeier · 2022-09-13T08:26:32Z

@vfdev-5 You were right and I fixed that in a49f0db. However, now we get this failure:

=================================== FAILURES ===================================
_ test_correctness_perspective_segmentation_mask[startpoints0-endpoints0-cpu] __
Traceback (most recent call last):
  File "/Users/runner/work/vision/vision/test/test_prototype_transforms_functional.py", line 1496, in test_correctness_perspective_segmentation_mask
    torch.testing.assert_close(output_mask, expected_masks)
  File "/Users/runner/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/torch/testing/_comparison.py", line 1359, in assert_close
    msg=msg,
  File "/Users/runner/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/torch/testing/_comparison.py", line 1093, in assert_equal
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not equal!

Mismatched elements: 1 / 558 (0.2%)
Greatest absolute difference: 1 at index (0, 2, 17)
Greatest relative difference: inf at index (0, 2, 17)

In CI this is only failing on macOS, but this also fails for me locally. Given that we have only a single mismatched element, the test is probably flaky.

vfdev-5 · 2022-09-13T08:41:15Z

I do not see from your commit where the size was fixed. More general question, how to find the code of failing test ? It is totally obfuscated to me :)

pmeier · 2022-09-13T09:14:01Z

I do not see from your commit where the size was fixed.

The issue was that while refactoring the make_*s functions I didn't made sure that the "data dimensions", i.e. everything that is not a batch dimension, are constant. For example, before the last commit in this PR, setting size parameter in make_segmentation_masks only applied to a part of the generated samples. This is why the failing test popped back up: you set the size and depended on it being at least that, but some samples were smaller.

More general question, how to find the code of failing test ? It is totally obfuscated to me :)

Not sure what you mean. Could you elaborate? Right now there is no failing test in the new tests, so I'll construct one to show what it looks like. Imagine horizontal_flip_bounding_box is not torch.jit.script'able. The failing test will look like this

________________ TestCommon.test_scripted_vs_eager[cpu-horizontal_flip_bounding_box35] _________________

Traceback

Traceback (most recent call last):
  File "/home/philip/git/pytorch/torchvision/test/test_prototype_transforms_kernels.py", line 343, in test_scripted_vs_eager
    kernel_scripted = torch.jit.script(kernel_eager)
  File "/home/philip/.local/opt/mambaforge/envs/torchvision-dev/lib/python3.7/site-packages/torch/jit/_script.py", line 1344, in script
    qualified_name, ast, _rcb, get_default_args(obj)
RuntimeError: 
Ellipses followed by tensor indexing is currently not supported:
  File "/home/philip/git/pytorch/torchvision/torchvision/prototype/transforms/functional/_geometry.py", line 42
    )

    bounding_box[..., [0, 2]] = image_size[1] - bounding_box[..., [2, 0]]
                                                ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

    return convert_bounding_box_format(


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/philip/git/pytorch/torchvision/test/test_prototype_transforms_kernels.py", line 345, in test_scripted_vs_eager
    raise AssertionError("Trying to `torch.jit.script` the kernel raised the error above.") from error
AssertionError: Trying to `torch.jit.script` the kernel raised the error above.

From the test name and parametrization you should find everything:

The test is defined in the class TestCommon and is named test_scripted_vs_eager
The test was run on the CPU
It checked the kernel horizontal_flip_bounding_box and specifically the 35th sample input. (In this example all sample inputs fail since the scriptability of the kernel is not dependent on the input).

To debug, you can run this exact test with

pytest 'test/test_prototype_transforms_kernels.py::TestCommon::test_scripted_vs_eager[cpu-horizontal_flip_bounding_box35]'

vfdev-5 · 2022-09-13T11:41:18Z

I was talking about test_correctness_perspective_segmentation_mask[startpoints0-endpoints0-cpu] from failed macosx job.

pmeier · 2022-09-13T11:59:03Z

Top-most error in the traceback:

File "/Users/runner/work/vision/vision/test/test_prototype_transforms_functional.py", line 1496, in test_correctness_perspective_segmentation_mask
    torch.testing.assert_close(output_mask, expected_masks)

To reproduce:

pytest 'test/test_prototype_transforms_functional.py::test_correctness_perspective_segmentation_mask[startpoints0-endpoints0-cpu]'

vfdev-5

Good to me

vfdev-5 · 2022-09-15T09:13:33Z

test/prototype_common_utils.py



-def make_bounding_box(*, format, image_size=(32, 32), extra_dims=(), dtype=torch.int64):
+def make_bounding_box_loader(*, extra_dims=(), format, image_size=None, dtype=torch.float32):


Why bounding box does not have num_objects arg ?

Because we defined a bounding box as (*, 4), i.e. features.BoundingBox([0, 0, 10, 10], ...) is valid although it only has a single dimension. Thus, if you want to have multiple boxes, set extra_dims=(num_objects,)

Reviewed By: jdsgomes Differential Revision: D39543278 fbshipit-source-id: 413bc5160188c5423d39d9f73387e9a5f25d8af7

refactor prototype transforms functional tests

2622353

pmeier added module: transforms prototype labels Apr 26, 2022

facebook-github-bot added the cla signed label Apr 26, 2022

pmeier mentioned this pull request Apr 26, 2022

remove magic comma in some places #5880

Merged

vfdev-5 reviewed May 9, 2022

View reviewed changes

test/test_prototype_transforms_functional.py Outdated Show resolved Hide resolved

pmeier mentioned this pull request May 17, 2022

add tests for F.pad_bounding_box #6038

Merged

pmeier added 10 commits September 8, 2022 11:15

rename functionals -> kernels

8c74be4

Merge branch 'main' into prototype-functional-test

3824e30

factor out common utils

9183351

[SKIP CI] only CircleCI

d0f6d74

cleanup

54d06ff

[SKIP CI] only CircleCI

1937de9

[SKIP CI] revert unrelated

be020e8

Merge branch 'main' into prototype-functional-test

3bb6e80

[SKIP CI] more cleanup

59ccb05

init loader architecture

cd1e3e3

pmeier mentioned this pull request Sep 9, 2022

extract common utils for prototype transform tests #6552

Merged

pmeier added 3 commits September 9, 2022 14:19

Merge branch 'main' into prototype-functional-test

dd46ca0

add more examples

e7ee205

Merge branch 'main' into prototype-functional-test

5762405

pmeier commented Sep 12, 2022

View reviewed changes

pmeier added 5 commits September 12, 2022 10:41

cleanup

50679d6

more cleanup

5a87a08

fix batched_vs_single for arbitrary batch shapes

338523d

remove unwrap again

7e280a2

[SKIP CI] only CircleCI

1ea1d5b

pmeier added the module: tests label Sep 12, 2022

pmeier requested review from datumbox and vfdev-5 September 12, 2022 09:24

pmeier added 4 commits September 12, 2022 12:09

[SKIP CI] only CircleCI

3a2f371

add example for segmentation masks

18ae6b5

remove all repr behavior since it is more distracting than helping

5c4cc65

[SKIP CI] only CircleCI

98717d5

pmeier marked this pull request as ready for review September 12, 2022 16:41

[SKIP CI] fix loaders to always have constant data shape

a49f0db

pmeier commented Sep 13, 2022

View reviewed changes

test/test_prototype_transforms_kernels.py Show resolved Hide resolved

remove rogue print

21ee7c3

pmeier added 3 commits September 14, 2022 14:55

[SKIP CI] Merge branch 'main' into prototype-functional-test

a102630

cleanup

4c683f5

[SKIP CI] use dataclasses

58288aa

pmeier mentioned this pull request Sep 15, 2022

[proto] Added consistency tests for detection transforms #6566

Merged

pmeier and others added 5 commits September 15, 2022 09:40

move kernel infos into separate module

81ad66b

add test for coverage

3cac4c8

remove ported tests from old framework

c8a9f57

disable failing reference test

220cfe1

Merge branch 'main' into prototype-functional-test

0cf41df

vfdev-5 approved these changes Sep 15, 2022

View reviewed changes

fix convert box

1f19351

vfdev-5 reviewed Sep 15, 2022

View reviewed changes

pmeier merged commit 0b5ebae into pytorch:main Sep 15, 2022

pmeier deleted the prototype-functional-test branch September 15, 2022 09:30

facebook-github-bot pushed a commit that referenced this pull request Sep 15, 2022

[fbsync] refactor prototype transforms functional tests (#5879)

3673e45

Reviewed By: jdsgomes Differential Revision: D39543278 fbshipit-source-id: 413bc5160188c5423d39d9f73387e9a5f25d8af7

	@pytest.mark.parametrize(
	("functional_info", "sample_input"),
	[
	pytest.param(functional_info, sample_input, id=f"{functional_info.name}-{idx}")
	for functional_info in FUNCTIONAL_INFOS
	for idx, sample_input in enumerate(functional_info.sample_inputs())
	],
	)
	def test_eager_vs_scripted(functional_info, sample_input):
	eager = functional_info(sample_input)
	scripted = jit.script(functional_info.functional)(sample_input.args, *sample_input.kwargs)

	torch.testing.assert_close(eager, scripted)

		__all__ = ["assert_close"]


		class PILImagePair(TensorLikePair):

	def _assert_approx_equal_tensor_to_pil(
	tensor, pil_image, tol=1e-5, msg=None, agg_method="mean", allowed_percentage_diff=None
	):

		return self.kernel.__name__


		def pil_reference_wrapper(pil_kernel):



		def make_bounding_box(*, format, image_size=(32, 32), extra_dims=(), dtype=torch.int64):
		def make_bounding_box_loader(*, extra_dims=(), format, image_size=None, dtype=torch.float32):

refactor prototype transforms functional tests #5879

refactor prototype transforms functional tests #5879

Uh oh!

Conversation

pmeier commented Apr 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status quo

Pros / Cons

Design goal

Limitations

Todo

Uh oh!

Uh oh!

pmeier Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pmeier Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pmeier Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pmeier commented Sep 12, 2022

Uh oh!

vfdev-5 commented Sep 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pmeier commented Sep 13, 2022

Uh oh!

vfdev-5 commented Sep 13, 2022

Uh oh!

pmeier commented Sep 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vfdev-5 commented Sep 13, 2022

Uh oh!

pmeier commented Sep 13, 2022

Uh oh!

vfdev-5 left a comment

Choose a reason for hiding this comment

Uh oh!

vfdev-5 Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pmeier commented Apr 26, 2022 •

edited

Loading

vfdev-5 commented Sep 12, 2022 •

edited

Loading

pmeier commented Sep 13, 2022 •

edited

Loading