Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[T145005253] Make Tests More Stable #1606

Closed
wants to merge 4 commits into from

Conversation

q10
Copy link
Contributor

@q10 q10 commented Feb 22, 2023

Summary:

  • Add support for retries in build steps that are known to fail due to the occasional network connection failures
  • Add support for installing ROCm tooling and testing ROCm builds in the build scripts framework
  • Update the existing FBGEMM_GPU CI / build_amd_gpu job to use the build scripts framework
  • Fix the annotations to tests in jagged_tensor_ops_test.py to run correctly on CPU-only mode
  • Impose timeouts of 10 minutes for running the test suites (in practice, they generally complete within 3 minutes)
  • Add ability to conditionally disable tests depending on whether or not they are running inside the GitHub runner
  • Disable the test_jagged_index_select_2d test on GitHub until we figure out the root cause of it hanging whenever it is run on GitHub (regardless of GPU or CPU variant)

@netlify
Copy link

netlify bot commented Feb 22, 2023

Deploy Preview for pytorch-fbgemm-docs canceled.

Name Link
🔨 Latest commit af195ad
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/63fbcbd0fc68a900087d652e

@q10 q10 changed the title wip [T145005253] Make Tests More Stable Feb 23, 2023
@q10 q10 force-pushed the bm/T145005253/build-scripts-3 branch 8 times, most recently from 3579cba to 923f36c Compare February 25, 2023 10:08
- Add support for installing ROCm tooling in the build scripts framework

- Update the existing FBGEMM_GPU CI / build_amd_gpu job
- Fix the annotations to tests in `jagged_tensor_ops_test.py` to run
correctly on CPU-only mode
- Add support for retries in build steps that are known to fail due to the
occasional network connection failures

- Impose timeouts of 15 minutes for running the test suites (in practice,
they generally complete within 3 minutes)

- Add ability to conditionally disable tests depending on whether or not
they are running inside the GitHub runner

- Disable the `test_jagged_index_select_2d` test on GitHub until we figure
out the root cause of it hanging whenever it is run on GitHub (regardless
of GPU or CPU variant)
@q10 q10 force-pushed the bm/T145005253/build-scripts-3 branch 2 times, most recently from 84f2141 to 1be5c4d Compare February 25, 2023 20:52
@facebook-github-bot
Copy link
Contributor

@q10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

- Consolidate ROCm testing under `run_fbgemm_gpu_tests`

- Annotate bash variables with `local` keyword where applicable
@q10 q10 force-pushed the bm/T145005253/build-scripts-3 branch from 1be5c4d to af195ad Compare February 26, 2023 21:14
@facebook-github-bot
Copy link
Contributor

@q10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@q10 merged this pull request in eec6fd2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants