Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taskcluster: use Windows instances is multiple AWS regions #23652

Open
SimonSapin opened this issue Jun 27, 2019 · 1 comment
Open

Taskcluster: use Windows instances is multiple AWS regions #23652

SimonSapin opened this issue Jun 27, 2019 · 1 comment

Comments

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Jun 27, 2019

The CI queue is slow right now because Windows tasks for the PR being tested keep being killed and restarted.

https://tools.taskcluster.net/groups/UowzfhVIR7q_1bl9hagSeQ/tasks/HA3OnWLmS-yP0wDfJiZyLA/runs/3/logs/public%2Flogs%2Flive.log

[taskcluster:error] AWS has issued a spot termination - need to abort task

The above is the fourth run of this particular task.

https://tools.taskcluster.net/aws-provisioner/servo-win2016/errors shows many errors with code Server.SpotInstanceTermination and InsufficientInstanceCapacity.

People on #taskcluster on IRC suggest that this problem is made worse by our worker type configuration at https://tools.taskcluster.net/aws-provisioner/servo-win2016/edit only specifying the us-west-2 region. With our usage spread over multiple AWS regions, we would be less subject to low availability in a given region.

However, AMIs are region-local and we manage the AMI for Windows CI ourselves, and it’d be nice to not increase the number of steps needed every time we want to update it.

Possible steps to fix this:

  • In etc/taskcluster/windows/build-ami.py, after the AMI is built:
  • (While we’re touching this file, also save the AMI’s admin password through taskcluster api secrets set so that manual copy-paste into 1Password is not needed.)
  • Make another script that queue a task for the servo-win2016-staging worker type.
  • Make a third script that obtains the ImageId from the servo-win2016-staging definition, copies the AMI into multiple regions, and updates servo-win2016 to use them.
@SimonSapin
Copy link
Member Author

@SimonSapin SimonSapin commented Jun 27, 2019

Docs for aws-provisioner are gone from https://docs.taskcluster.net/ because it is “deprecated”, even though its replacement won’t be ready for months :/

In the meantime, I was pointed at the source for the exact set of API and their schema:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.