Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove "NotImplemented" flags if the test is limited by hardware capacity #781

Closed
wants to merge 4 commits into from

Conversation

xuzhao9
Copy link
Contributor

@xuzhao9 xuzhao9 commented Mar 9, 2022

When a test is flagged as "NotImplemented", there are actually two cases:

  1. The test itself doesn't implement or handle the configs, e.g., unsupervised-learning models like pytorch_struct doesn't have eval() tests, or the pyhpc models don't have train() tests.
  2. The test doesn't support running on our T4 CI GPU machine, but it runs totally fine on other GPUs, such as V100 or A100.

This PR is to eliminate the second case, so that the test can still run through run.py or run_sweep.py interfaces. Instead, we flag the test to be not_implemented in the metadata.yaml, and the CI script test.py or test_bench.py will read from the metadata and determine they are not suitable to run on the CI machine.

This fixes #688, #626, and #598

@facebook-github-bot
Copy link
Contributor

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@xuzhao9 xuzhao9 requested a review from Chillee March 10, 2022 13:17
Copy link
Member

@aaronenyeshi aaronenyeshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, this will keep the model code more clean!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gate CUDA failures behind some kind of flag?
3 participants