-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4325] Use pssh. #80
Changes from all commits
658af93
29051d5
9a07db0
dc2a08a
8ee9e62
e308e55
8d0a903
ee5c085
a6c6b85
c0f60f6
8913fe1
658d88c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
#!/bin/bash | ||
|
||
yum install -y pssh | ||
|
||
# Make sure we are in the spark-ec2 directory | ||
cd /root/spark-ec2 | ||
|
||
|
@@ -9,6 +11,14 @@ source /root/.bash_profile | |
# Load the cluster variables set by the deploy script | ||
source ec2-variables.sh | ||
|
||
function approve_ssh_keys () { | ||
pssh --inline \ | ||
--host "localhost $(hostname) $MASTERS $SLAVES" \ | ||
--user root \ | ||
--extra-args "$SSH_OPTS" \ | ||
":" | ||
} | ||
|
||
# Set hostname based on EC2 private DNS name, so that it is set correctly | ||
# even if the instance is restarted with a different private DNS name | ||
PRIVATE_DNS=`wget -q -O - http://instance-data.ec2.internal/latest/meta-data/local-hostname` | ||
|
@@ -42,60 +52,28 @@ fi | |
echo "Setting executable permissions on scripts..." | ||
find . -regex "^.+.\(sh\|py\)" | xargs chmod a+x | ||
|
||
echo "Running setup-slave on master to mount filesystems, etc..." | ||
source ./setup-slave.sh | ||
|
||
echo "SSH'ing to master machine(s) to approve key(s)..." | ||
for master in $MASTERS; do | ||
echo $master | ||
ssh $SSH_OPTS $master echo -n & | ||
sleep 0.3 | ||
done | ||
ssh $SSH_OPTS localhost echo -n & | ||
ssh $SSH_OPTS `hostname` echo -n & | ||
wait | ||
|
||
# Try to SSH to each cluster node to approve their key. Since some nodes may | ||
# be slow in starting, we retry failed slaves up to 3 times. | ||
TODO="$SLAVES $OTHER_MASTERS" # List of nodes to try (initially all) | ||
TRIES="0" # Number of times we've tried so far | ||
echo "SSH'ing to other cluster nodes to approve keys..." | ||
while [ "e$TODO" != "e" ] && [ $TRIES -lt 4 ] ; do | ||
NEW_TODO= | ||
for slave in $TODO; do | ||
echo $slave | ||
ssh $SSH_OPTS $slave echo -n | ||
if [ $? != 0 ] ; then | ||
NEW_TODO="$NEW_TODO $slave" | ||
fi | ||
done | ||
TRIES=$[$TRIES + 1] | ||
if [ "e$NEW_TODO" != "e" ] && [ $TRIES -lt 4 ] ; then | ||
sleep 15 | ||
TODO="$NEW_TODO" | ||
echo "Re-attempting SSH to cluster nodes to approve keys..." | ||
else | ||
break; | ||
fi | ||
done | ||
# echo "SSH-ing to all cluster nodes to approve keys..." | ||
# approve_ssh_keys | ||
|
||
echo "RSYNC'ing /root/spark-ec2 to other cluster nodes..." | ||
for node in $SLAVES $OTHER_MASTERS; do | ||
echo $node | ||
rsync -e "ssh $SSH_OPTS" -az /root/spark-ec2 $node:/root & | ||
scp $SSH_OPTS ~/.ssh/id_rsa $node:.ssh & | ||
sleep 0.3 | ||
sleep 0.1 | ||
done | ||
wait | ||
|
||
# NOTE: We need to rsync spark-ec2 before we can run setup-slave.sh | ||
# on other cluster nodes | ||
echo "Running slave setup script on other cluster nodes..." | ||
for node in $SLAVES $OTHER_MASTERS; do | ||
echo $node | ||
ssh -t -t $SSH_OPTS root@$node "spark-ec2/setup-slave.sh" & sleep 0.3 | ||
done | ||
wait | ||
echo "Running setup-slave on all cluster nodes to mount filesystems, etc..." | ||
pssh --inline \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @shivaram Do you think echoing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK this was the old behavior too ? If so it should be fine. |
||
--host "$MASTERS $SLAVES" \ | ||
--user root \ | ||
--extra-args "-t -t $SSH_OPTS" \ | ||
"spark-ec2/setup-slave.sh" | ||
|
||
# echo "SSH-ing to all cluster nodes to re-approve keys..." | ||
# We do this again because setup-slave.sh clears out .ssh/known_hosts. | ||
# approve_ssh_keys | ||
|
||
# Always include 'scala' module if it's not defined as a work around | ||
# for older versions of the scripts. | ||
|
@@ -126,6 +104,6 @@ chmod u+x /root/spark/conf/spark-env.sh | |
for module in $MODULES; do | ||
echo "Setting up $module" | ||
source ./$module/setup.sh | ||
sleep 1 | ||
sleep 0.1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure why we are sleeping here, but I reduced the sleep time. I was able to launch and do some basic operations on a cluster without issue after making this change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this sleep call was just defensive. Sometimes services take time to come up and this just provides some buffer between starting them up. It should be fine to reduce this. |
||
cd /root/spark-ec2 # guard against setup.sh changing the cwd | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this line is for. Presumably, if we've SSHed to
$MASTERS
andlocalhost
, we don't needhostname
, no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On EC2 there are two hostnames, one internal (of the form ip-127-1-1-1.ec2.internal) and one external (of the form ec2-54-227-51-123.compute-1.amazonaws.com) -- We typically pass in the latter in
$MASTERS
andhostname
usually returns the former.Even though we try to only use the external hostname in all our configs, it is better practice to approve keys for both hostnames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll add it in to the
pssh
version of the call.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.