AWS was down, GHA infrastructure effected / recovering

> NOTE: Remember to label this issue with "`ci: sev`"
>       If you want autorevert to be disabled, keep the ci: disable-autorevert label

 

## Current Status

Mitigated, queues are recovering.

AWS experienced a big outage (https://health.aws.amazon.com/health/status) this morning resulting in most of our GHA infra going down with them.

We are still in the process of recovering and will update as soon as our services are able to recover.

## Error looks like
*Provide some way users can tell that this SEV is causing their issue.*

## Incident timeline (all times pacific)
*Include when the incident began, when it was detected, mitigated, root caused, and finally closed.*

## User impact
*How does this affect users of PyTorch CI?*

## Root cause
*What was the root cause of this issue?*

## Mitigation
*How did we mitigate the issue?*

## Prevention/followups
*How do we prevent issues like this in the future?*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AWS was down, GHA infrastructure effected / recovering #165909

Current Status

Error looks like

Incident timeline (all times pacific)

User impact

Root cause

Mitigation

Prevention/followups

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AWS was down, GHA infrastructure effected / recovering #165909

Description

Current Status

Error looks like

Incident timeline (all times pacific)

User impact

Root cause

Mitigation

Prevention/followups

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions