Migrate windows runners to non-ephemeral instances

### 🚀 The feature, motivation and pitch

Due to increased pressure over our windows runners, and the elevated cost of instantiating and bringing down those instances, we want to migrate instances from ephemeral to not ephemeral.

Possible impacts are related to breakages in or misbehaves on CI jobs that puts the runners in a bad state. Other possible impacts are related to exhaustion of resources, especially disk space, but memory might be a contender, as CI trash piles up on those instances.

As a somewhat middle of the road approach to this, currently nonephemeral instances are stochastically rotated as older instances get higher priority to be terminated when demand is lower.

Instances definition can be found here: https://github.com/pytorch/test-infra/pull/4072

* ✅ migrate `windows.4xlarge` to `windows.4xlarge.nonephemeral` instances under `pytorch/pytorch` (#100377)
* 📣 migrate `windows.8xlarge.nvidia.gpu` to `windows.8xlarge.nvidia.gpu.nonephemeral` instances under `pytorch/pytorch` (#104404)
* ⏳ submit PRs to all repositories under `pytorch/` organization to migrate `windows.4xlarge` to `windows.4xlarge.nonephemeral`
* ⏳ submit PRs to all repositories under `pytorch/` organization to migrate `windows.8xlarge.nvidia.gpu` to `windows.8xlarge.nvidia.gpu.nonephemeral`
* ⏳ terminate the existence of `windows.4xlarge` and `windows.8xlarge.nvidia.gpu`
* ⏳ evaluate and start the work related to the adoption of `windows.g5.4xlarge.nvidia.gpu` to replace `windows.8xlarge.nvidia.gpu.nonephemeral` in other repositories and use cases (proposed by @huydhn)

The reasoning for this phased approach is to reduce the scope of possible contenders to investigate in case of misbehave of particular CI jobs.

### Alternatives

_No response_

### Additional context

_No response_

cc @seemethere @malfet @pytorch/pytorch-dev-infra

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate windows runners to non-ephemeral instances #101209

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migrate windows runners to non-ephemeral instances #101209

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions