Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upConsider moving to reserved EC2 instances? #190
Comments
|
I'm not sure what changed, but this seems like it used to be much more reliable. I'm not opposed to reserved instances (I assume for 1yr periods), but this is likely to make our AWS bill quite a bit higher potentially. |
|
Or in other words, my only objection would be financial. If it fits in budget, let's do it. |
|
@metajack Do you remember what the speed difference was between c4.4xlarge and c4.2xlarge? Should we re-run some tests? The 2xlarge is a little under half the price of 4xlarge, and before we do some reserved instances, I was just curious how much time it's saving us on our runs :-) |
|
We should definitely do this ASAP. Perusing the buildbot bug tracker it looks like these issues have been known and unfixed for 5 years, and also that slave attach and detach is non-atomic and has all sorts of bad edge cases as a result. Moving to reserved instances would eliminate slave churn and probably get rid of all the trouble. Here's what I think should happen:
|
|
@larsbergstrom I don't think I ever did tests. I did try various |
|
OK, I've done #1 and #2 from jack's steps above. It looks like getting some 4xlarge instances would save a ton of money, not cost much time, and also leave room for adding more of them as we add more tests. But we can also just get the 8xlarges reserved, too :-) Price info is from https://aws.amazon.com/ec2/instance-types/, assuming we stay with US-WEST-2. Raw data below: c4.8xlarge (36vCPUs, info from linux-rel, $2.10/hour on-demand, $1.34 1year reserved)compile - 11mins, 56sec c4.4xlarge (16vCPUs, $0.67 / hour reserved 1year upfront)compile(scratch) 16.28min test-wpt w/ 16 processes - 5min test-css w/ 16 processes - 8m49s c4.2xlarge (8vCPUs, $0.33/hour reserved upfront)compile(scratch) 19.45min test-wpt w/ 8 processes - 9.6min test-css w/ 8 processes - 15m26s c4.xlarge (4vCPUs, $0.11/hour reserved upfront)compile (rm -rf target) 20m29s |
|
Here's our usage data for the last two weeks on the linux builders w/ on-demand usage. Note that we're still having a LOT of trouble getting it to spin up a second builder reliably, so the 23/24/25 hour usage days are probably underrporting what we would have used if things were working properly. 1/19/16 - 24 |
|
One extra note - the default Ubuntu 14.04 images on amazon only have 7.5GB, which is barely enough for a single-flavor Servo build. The image brought over from Daala has ~400GB, which seems sufficient for our several different flavors/builds that each of the images bring up (a casual inspection of one of the builders being shared across targets showed about 55GB free). |
Add EC2 reserved instances r? @metajack @Manishearth Closes #190 (this is live) <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/saltfs/214) <!-- Reviewable:end -->
|
We decided to pay for a full year upfront. The hourly cost for that is $0.526. With half up front, the cost is $0.537. Nothing up front is $0.621, on a c4.4xlarge Note that we currently pay $2.098/hour for our on-demand c4.8xlarge |
|
Does this affect the buildbot restarting instructions? In particular, I believe there are no more on-demand EC2 instances so is using the "graceful shutdown" still necessary? |
|
@aneeshusa Thanks! I've cleaned up the instructions a bit and removed the bit about the latent builders. |
cc @edunham @metajack
We're finding that, particularly after any network hiccups, buildbot is really bad about spinning up a new EC2 latent instance. I also can't figure out how to connect one - if you do the obvious thing of spinning one up and starting buildbot on it, you get an error because the master is not in the process of starting it up (the only time when the latent ones can connect). And the master doesn't seem to spin up new ones unless you restart it, which is really hard to time between homu runs w/o breaking other things.
Is there some command I'm missing here that I could be using instead?
Or, should we consider moving to EC2 reserved instances? Even using a smaller instance type would help, as we're basically running with only one EC2 instance anyway most of the time.
We didn't see this as much before because we always had my linode instance to "pick up the slack".