forked from codalab/codalab-worksheets
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Better handle EC2 spot interruptions in AWS Batch WorkerManager
- Loading branch information
1 parent
32cd704
commit 265f875
Showing
3 changed files
with
26 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
#!/usr/bin/env bash | ||
|
||
while true | ||
do | ||
# This IP address comes from: | ||
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html#spot-instance-termination-notices | ||
# It's a special endpoint set up by AWS whereby AWS instances can view metadata about themselves. | ||
# One such piece of metadata is the termination time, which is only set when the spot instance is to be | ||
# pre-empted (you get a 404 otherwise). | ||
# This script was partially taken from https://stackoverflow.com/q/32613600/14089059 . | ||
if [ -z $(curl -Is http://169.254.169.254/latest/meta-data/spot/termination-time | head -1 | grep 404 | cut -d \ -f 2) ] | ||
then | ||
echo "EC2 spot instance scheduled for shutdown." | ||
echo "Sending SIGTERM to CodaLab workers" | ||
# Kill all cl-workers in the EC2 instance. | ||
pgrep -f "cl-worker" | xargs kill | ||
else | ||
# Instance not yet marked for termination, so sleep and check again in 5 seconds. | ||
sleep 5 | ||
fi | ||
done |