-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ci] delete generate-test-matrix #73001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot read from a workflow yaml what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 0bf0ef1 (more details on the Dr. CI page):
🕵️ 7 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Unknown | 🔁 rerun |
This comment was automatically generated by Dr. CI (expand for details).
Please report bugs/suggestions to the (internal) Dr. CI Users group.
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot read from a workflow yaml what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. ghstack-source-id: 749ff94 Pull Request resolved: #73001
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. [ghstack-poisoned]
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. ghstack-source-id: 9f387b3 Pull Request resolved: #73001
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. [ghstack-poisoned]
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. [ghstack-poisoned]
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. [ghstack-poisoned]
| test_jobs.append( | ||
| { | ||
| "id": f"test_default_{shard}_{config['num_shards']}", | ||
| "name": f"test (default, {shard}, {self.num_test_shards}, {self.test_runner_type})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be beneficial to simplify the names as well here to not include the test_runner_type? I find that information might not be very useful to a majority of people and we can derive it from the logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, at the moment I am just trying replicate the same job name, to avoid churning metrics and the HUD. We can definitely change it if it we want though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually a long term improvement I wanted to have in HUD - when it can combine history over renames (for example, today we have old XLA jobs names and new XLA job names and there are no continuation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, one thing that might be interesting is that we are allowed to control the display name separately from the ID. So we can try to come up with some stable scheme for the ID and use that to identify things in HUD rather than the display name
|
Shellcheck failures are real |
| enable_xla_test: YamlShellBool = "''" | ||
| enable_noarch_test: YamlShellBool = "''" | ||
| enable_force_on_cpu_test: YamlShellBool = "''" | ||
| enable_default_test: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need explicit type annotation here, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea lol, the rules for .github are different and stricter than other folders, I can try to remove
| if self.enable_nogpu_no_avx_test: | ||
| configs["nogpu_NO_AVX"] = {"num_shards": 1, "runner": NOGPU_RUNNER_TYPE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated - isn't NO_AVX dead (as we only have AVX2 and AVX512 now?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still run it on linux-bionic-cuda10.2-py3.9-gcc7 at least, so not sure.
| test_jobs.append( | ||
| { | ||
| "id": f"test_default_{shard}_{config['num_shards']}", | ||
| "name": f"test (default, {shard}, {self.num_test_shards}, {self.test_runner_type})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually a long term improvement I wanted to have in HUD - when it can combine history over renames (for example, today we have old XLA jobs names and new XLA job names and there are no continuation)
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. [ghstack-poisoned]
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. ghstack-source-id: e21c0cb Pull Request resolved: #73001
|
@pytorchbot merge this |
Summary: Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. Pull Request resolved: #73001 Test Plan: automation Reviewed By: malfet, seemethere Differential Revision: D34315415 fbshipit-source-id: 164281a10b0692312e90edebdda174c5175cdfdd
These accidentally got turned on by #73001. Turn them off. [ghstack-poisoned]
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. Pull Request resolved: pytorch/pytorch#73001
Summary: Pull Request resolved: pytorch/pytorch#73064 These accidentally got turned on by pytorch/pytorch#73001. Turn them off. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D34332530 Pulled By: suo fbshipit-source-id: a6493b7d94465fa9141f1527648dbbec09c5706d (cherry picked from commit b18c95e4a68e7d96e617edfb83a3e55780b49f4c)
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. Pull Request resolved: pytorch/pytorch#73001
Summary: Pull Request resolved: pytorch/pytorch#73064 These accidentally got turned on by pytorch/pytorch#73001. Turn them off. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D34332530 Pulled By: suo fbshipit-source-id: a6493b7d94465fa9141f1527648dbbec09c5706d (cherry picked from commit b18c95e4a68e7d96e617edfb83a3e55780b49f4c)
Today, we have two pieces that conspire to determine what workflows we run: - `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file - `generate-test-matrix`, which runs at CI time to dynamically generate test jobs. This is bad: - Having one layer of code generation is unfortunate, having two is confusing. - You cannot tell from a workflow yaml file what test jobs will be run. - We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness. - In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats. - A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal. As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI. The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past. Pull Request resolved: pytorch/pytorch#73001
Summary: Pull Request resolved: pytorch/pytorch#73064 These accidentally got turned on by pytorch/pytorch#73001. Turn them off. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D34332530 Pulled By: suo fbshipit-source-id: a6493b7d94465fa9141f1527648dbbec09c5706d (cherry picked from commit b18c95e4a68e7d96e617edfb83a3e55780b49f4c)
Stack from ghstack:
Today, we have two pieces that conspire to determine what workflows we run:
generate_ci_workflows.py, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml filegenerate-test-matrix, which runs at CI time to dynamically generate test jobs.This is bad:
generate-test-matrixthrough setting env vars and other such ugliness.generate-test-matrixfrom running, a ghosttestjob that doesn't actually exist noises up the HUD and our stats.generate-test-matrixjobs (8 on PRs) noise up our signal.As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing
generate-test-matrixto simplify the CI.The only place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past.