[proto] Enable GPU tests on prototype #6665

vfdev-5 · 2022-09-29T08:00:03Z

…u-tests

.github/workflows/prototype-tests-gpu.yml

osalpekar · 2022-10-17T17:59:43Z

Also there might be an opportunity to simplify this using the generic Linux jobs (with GPU support) that @seemethere has built. You can find the documentation here: https://github.com/pytorch/test-infra/wiki/Writing-generic-linux-jobs

huydhn · 2022-10-17T18:06:40Z

Also there might be an opportunity to simplify this using the generic Linux jobs (with GPU support) that @seemethere has built. You can find the documentation here: https://github.com/pytorch/test-infra/wiki/Writing-generic-linux-jobs

Ohh, this is nice and a much better option. TIL

vfdev-5 · 2022-10-17T21:08:09Z

@huydhn @osalpekar thanks for the review, I'll try to add --gpus=all flag to see if this enables the CI. Using generic linux jobs could be potentially fine, the only limitation I see from the example is that we should now create a single script vs multiple steps...

vfdev-5 · 2022-10-18T07:56:07Z

It is expected to have failures in cuda vs cpu tests, I'll xfail them in the PR that enables cuda tests and we'll fix the inconsistency in a follow-up PR

pmeier · 2022-10-18T07:59:27Z

Unless I have missed something, there are all just closeness related. Thus, we only have to adjust the tolerances in our test suite. Or to put it differently: there is likely no bug in our implementation. I'm ok with doing that in a follow-up.

…u-tests

pmeier

Given that there are very few differences between the GPU workflow and the CPU one, can we maybe merge the files?

pmeier · 2022-10-18T08:44:27Z

test/test_prototype_transforms_functional.py

+        try:
+            assert_close(output_cuda, output_cpu, check_device=False, **info.closeness_kwargs)
+        except AssertionError:
+            pytest.xfail("CUDA vs CPU tolerance issue to be fixed")


This effectively disables this test. Either we should add proper xfails to the KernelInfo's or simply comment out this test with a FXIME note. Otherwise we are wasting resources.

This is a temporary fix with 3 lines. What you suggest if I understand correctly is to mark specific tests which can vary on GPU etc. Taking into account that you wanted to fix the problem we can keep things like that.

I agree, fixing the individual tests is overkill here. But as is, this test is running with no information gain. assert_close will either pass or raise an AssertionError. Since we catch that and turn it into an xfail, there is no way this test can fail at all. Thus, we are better off just disabling the test completely, e.g. by commenting it out as I suggested, to get the same information but without wasted CI resources.

I see your point but I think it is ok to keep like as here as it still shows that majority of ops are passing on cuda.
As for wasted resources, run cuda_vs_cpu tests takes around 7 seconds.

.github/workflows/prototype-tests-gpu.yml

vfdev-5 · 2022-10-18T09:00:40Z

Given that there are very few differences between the GPU workflow and the CPU one, can we maybe merge the files?

IMO, this is complicated to refactor configuration for self-hosted and GHA runners. Can be done by someone else with better GHA knowledge.

pmeier · 2022-10-18T10:07:51Z

IMO, this is complicated to refactor configuration for self-hosted and GHA runners. Can be done by someone else with better GHA knowledge.

Agreed. I can take that up in a follow-up.

pmeier

Stamping to unblock.

github-actions · 2022-10-21T11:28:30Z

Hey @vfdev-5!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

vfdev-5 · 2022-10-21T11:58:45Z

@osalpekar I merged this PR to enable GPU tests for prototype module and now on another PR I have an issue with starting the container with --gpus=all option:

Do you have any ideas why this could happen ? Thanks

This reverts commit d0de55d.

Summary: * [proto][WIP] Enable GPU tests on prototype * Update prototype-tests.yml * tests on gpu as separate file * Removed matrix setup * Update prototype-tests-gpu.yml * Update prototype-tests-gpu.yml * Added --gpus=all flag * Added xfail for cuda vs cpu tolerance issue * Update prototype-tests-gpu.yml Reviewed By: YosuaMichael Differential Revision: D40588168 fbshipit-source-id: 884a4045b343f93517b27cc3303c5eb6131a8895

Summary: This reverts commit d0de55d. Reviewed By: YosuaMichael Differential Revision: D40588158 fbshipit-source-id: 877a172d7dd807b0c90255bd14129a90768bcc76

[proto][WIP] Enable GPU tests on prototype

32c49d6

facebook-github-bot added the cla signed label Sep 29, 2022

vfdev-5 marked this pull request as draft September 29, 2022 08:08

vfdev-5 added 6 commits October 13, 2022 09:36

Update prototype-tests.yml

ecfd329

Merge branch 'main' of github.com:pytorch/vision into proto-enable-gp…

052d177

…u-tests

tests on gpu as separate file

cf2db23

Removed matrix setup

587cfaa

Update prototype-tests-gpu.yml

f3b2107

Update prototype-tests-gpu.yml

f6d3955

vfdev-5 mentioned this pull request Oct 17, 2022

[Nova] Add GHA Linux CPU Unittests for Torchvision #6759

Merged

huydhn reviewed Oct 17, 2022

View reviewed changes

.github/workflows/prototype-tests-gpu.yml Show resolved Hide resolved

Added --gpus=all flag

b5fa1c0

vfdev-5 changed the title ~~[proto][WIP] Enable GPU tests on prototype~~ [proto] Enable GPU tests on prototype Oct 18, 2022

vfdev-5 marked this pull request as ready for review October 18, 2022 07:54

vfdev-5 added 2 commits October 18, 2022 08:16

Added xfail for cuda vs cpu tolerance issue

ee5151b

Merge branch 'main' of github.com:pytorch/vision into proto-enable-gp…

91f0166

…u-tests

vfdev-5 requested a review from pmeier October 18, 2022 08:18

pmeier reviewed Oct 18, 2022

View reviewed changes

vfdev-5 added 2 commits October 18, 2022 12:42

Update prototype-tests-gpu.yml

337e849

Merge branch 'main' into proto-enable-gpu-tests

c1e43be

vfdev-5 requested a review from pmeier October 20, 2022 10:46

pmeier approved these changes Oct 20, 2022

View reviewed changes

Merge branch 'main' into proto-enable-gpu-tests

65fba59

Merge branch 'main' into proto-enable-gpu-tests

ea07e34

vfdev-5 merged commit d0de55d into main Oct 21, 2022

vfdev-5 deleted the proto-enable-gpu-tests branch October 21, 2022 11:28

vfdev-5 added module: ci prototype labels Oct 21, 2022

vfdev-5 mentioned this pull request Oct 21, 2022

Revert "[proto] Enable GPU tests on prototype" #6809

Merged

vfdev-5 added a commit that referenced this pull request Oct 21, 2022

Revert "[proto] Enable GPU tests on prototype (#6665)"

6edd5d1

This reverts commit d0de55d.

vfdev-5 added a commit that referenced this pull request Oct 21, 2022

Revert "[proto] Enable GPU tests on prototype (#6665)" (#6809)

5421f12

This reverts commit d0de55d.

vfdev-5 mentioned this pull request Nov 7, 2022

[proto][ci] Try add GPU ci for prototype transforms #6919

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proto] Enable GPU tests on prototype #6665

[proto] Enable GPU tests on prototype #6665

vfdev-5 commented Sep 29, 2022 •

edited by pytorch-bot bot

Loading

osalpekar commented Oct 17, 2022

huydhn commented Oct 17, 2022

vfdev-5 commented Oct 17, 2022

vfdev-5 commented Oct 18, 2022

pmeier commented Oct 18, 2022

pmeier left a comment

pmeier Oct 18, 2022

vfdev-5 Oct 18, 2022

pmeier Oct 18, 2022

vfdev-5 Oct 18, 2022

vfdev-5 commented Oct 18, 2022

pmeier commented Oct 18, 2022

pmeier left a comment

github-actions bot commented Oct 21, 2022

vfdev-5 commented Oct 21, 2022

[proto] Enable GPU tests on prototype #6665

[proto] Enable GPU tests on prototype #6665

Conversation

vfdev-5 commented Sep 29, 2022 • edited by pytorch-bot bot Loading

osalpekar commented Oct 17, 2022

huydhn commented Oct 17, 2022

vfdev-5 commented Oct 17, 2022

vfdev-5 commented Oct 18, 2022

pmeier commented Oct 18, 2022

pmeier left a comment

Choose a reason for hiding this comment

pmeier Oct 18, 2022

Choose a reason for hiding this comment

vfdev-5 Oct 18, 2022

Choose a reason for hiding this comment

pmeier Oct 18, 2022

Choose a reason for hiding this comment

vfdev-5 Oct 18, 2022

Choose a reason for hiding this comment

vfdev-5 commented Oct 18, 2022

pmeier commented Oct 18, 2022

pmeier left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 21, 2022

vfdev-5 commented Oct 21, 2022

vfdev-5 commented Sep 29, 2022 •

edited by pytorch-bot bot

Loading