Skip to content

Conversation

janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented May 3, 2021

This change temporarily disables CUDA testing on PRs, but keeps it on master.
This is likely to increase the number of reverts, but this is necessary as a stop-gap measure to cap the CI costs growth.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 3, 2021

💊 CI failures summary and remediations

As of commit bf65dd8 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@seemethere seemethere added module: ci Related to continuous integration module: windows Windows support for PyTorch labels May 3, 2021
@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@janeyx99 janeyx99 force-pushed the swap-windows-ci branch from 1b53f42 to d10b701 Compare May 3, 2021 21:44
@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ngimel
Copy link
Collaborator

ngimel commented May 3, 2021

Hm, given how many cuda-related PRs and especially builds are broken on windows only, I'm afraid it will increase churn a lot.

@zasdfgbnm
Copy link
Collaborator

Does this mean no CI will be running for CUDA+Windows for PR? I feel worried about it as in the past Windows CI has been useful. (I saw Windows only failures ~twice last month)

@ptrblck
Copy link
Collaborator

ptrblck commented May 3, 2021

This change temporarily disables CUDA testing on PRs, but keeps it on master.

If I understand it correctly, Windows CI would only be triggered after the PR was merged into the master branch?
How temporary would it be, as this could indeed increase the churn of reverting faulty PRs a lot, as already mentioned, and could additionally make fixes to PRs hard (they won't be tested before the re-merge of the "fixed PR" in CI).

@seemethere
Copy link
Member

I think based on our current cost for running Windows gpu executors it's just not tenable to run Windows CUDA tests on every PR.

After this PR lands I'll do a follow up PR to add a CI label that we can use to trigger windows CUDA testing on PRs if requested.

It'll be similar to #54018

@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@janeyx99 merged this pull request in 1d3a9bf.

@imaginary-person
Copy link
Contributor

imaginary-person commented May 7, 2021

If failures on Windows CUDA checks do not usually result from the fact that Windows CUDA CI checks use a Tesla T4 GPU (SM_75), whereas Linux CUDA CI checks use an SM_52 GPU, how about also having a (regular) Windows CI check with an SM_52 GPU, if an executor with an SM_52 GPU is significantly cheaper than an executor with an SM_75 GPU?

In that case, @seemethere can make two labels for Windows CUDA CI checks - one each for executors with an SM_52 & an SM_75 GPU respectively, so that one can choose the SM_52 one (cheaper), if the difference in CUDA Compute Capability would be deemed to not be a source of potential failures for an in-progress PR.

krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
This change temporarily disables CUDA testing on PRs, but keeps it on master.
This is likely to increase the number of reverts, but this is necessary as a stop-gap measure to cap the CI costs growth.

Pull Request resolved: pytorch#57493

Reviewed By: seemethere

Differential Revision: D28162697

Pulled By: janeyx99

fbshipit-source-id: 1bc529a405f7d63c07f4bd9f8ceca8da450743fc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: ci Related to continuous integration module: windows Windows support for PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants