New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added CORS Support and EC2 Support #290

Merged
merged 8 commits into from Nov 3, 2015
View
@@ -13,4 +13,13 @@ config/*.conf
config/*.sh
job-server/config/*.conf
job-server/config/*.sh
metastore_db/
metastore_db/
#ignore generated config
bin/ec2_example.sh
# ignore spark-ec2 script
ec2Cluster/
# don't ignore the ec2 config and sh files
!job-server/config/ec2.sh
View
@@ -62,6 +62,10 @@ For release notes, look in the `notes/` directory. They should also be up on [l
The easiest way to get started is to try the [Docker container](doc/docker.md) which prepackages a Spark distribution with the job server and lets you start and deploy it.
## EC2 Start
Follow the instructions in [EC2](doc/EC2.md) to spin up a Spark cluster with job server and an example application.
## Development mode
The example walk-through below shows you how to use the job server with an included example job, by running the job server in local development mode in SBT. This is not an example of usage in production.
View
@@ -0,0 +1,43 @@
#!/bin/bash
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
. "$bin"/../config/user-ec2-settings.sh
#get spark deployment scripts if they haven't been downloaded and extracted yet
SPARK_DIR=ec2Cluster
if [ ! -d "$bin"/../$SPARK_DIR ]; then
mkdir "$bin"/../$SPARK_DIR
mkdir "$bin"/../$SPARK_DIR/deploy.generic/root/spark-ec2
wget -P "$bin"/../$SPARK_DIR/deploy.generic/root/spark-ec2 https://raw.githubusercontent.com/apache/spark/master/ec2/deploy.generic/root/spark-ec2/ec2-variables.sh
wget -P "$bin"/../$SPARK_DIR https://raw.githubusercontent.com/apache/spark/master/ec2/spark_ec2.py
wget -P "$bin"/../$SPARK_DIR https://raw.githubusercontent.com/apache/spark/master/ec2/spark-ec2
chmod u+x "$bin"/../$SPARK_DIR/*
fi
#run spark-ec2 to start ec2 cluster
EC2DEPLOY="$bin"/../$SPARK_DIR/spark-ec2
"$EC2DEPLOY" --copy-aws-credentials --key-pair=$KEY_PAIR --hadoop-major-version=yarn --identity-file=$SSH_KEY --region=us-east-1 --zone=us-east-1a --spark-version=$SPARK_VERSION --instance-type=$INSTANCE_TYPE --slaves $NUM_SLAVES launch $CLUSTER_NAME
#There is only 1 deploy host. However, the variable is plural as that is how Spark Job Server named it.
#To minimize changes, I left the variable name alone.
export DEPLOY_HOSTS=$("$EC2DEPLOY" get-master $CLUSTER_NAME | tail -n1)
#This line is a hack to edit the ec2.conf file so that the master option is correct. Since we are allowing Amazon to
#dynamically allocate a url for the master node, we must update the configuration file in between cluster startup
#and Job Server deployment
cp "$bin"/../config/ec2.conf.template "$bin"/../config/ec2.conf
sed -i -E "s/master = .*/master = \"spark:\/\/$DEPLOY_HOSTS:7077\"/g" "$bin"/../config/ec2.conf
#also get ec2_example.sh right
cp "$bin"/ec2_example.sh.template "$bin"/ec2_example.sh
sed -i -E "s/DEPLOY_HOSTS=.*/DEPLOY_HOSTS=\"$DEPLOY_HOSTS:8090\"/g" "$bin"/ec2_example.sh
#open all ports so the master for Spark Job Server to work and you can see the results of your jobs
aws ec2 authorize-security-group-ingress --group-name $CLUSTER_NAME-master --protocol tcp --port 0-65535 --cidr 0.0.0.0/0
cd "$bin"/..
bin/server_deploy.sh ec2
ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" root@$DEPLOY_HOSTS "(echo 'export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID' >> spark/conf/spark-env.sh)"
ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" root@$DEPLOY_HOSTS "(echo 'export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY' >> spark/conf/spark-env.sh)"
ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" root@$DEPLOY_HOSTS "(cd job-server; nohup ./server_start.sh < /dev/null &> /dev/null &)"
echo "The Job Server is listening at $DEPLOY_HOSTS:8090"
View
@@ -0,0 +1,7 @@
#!/bin/bash
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
. "$bin"/../config/user-ec2-settings.sh
"$bin"/../ec2Cluster/spark-ec2 destroy $CLUSTER_NAME
@@ -0,0 +1,16 @@
DEPLOY_HOSTS=ENTER_DEPLOY_HOST_HERE
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
. "$bin"/../config/ec2.sh
ssh_key_to_use=""
if [ -n "$SSH_KEY" ] ; then
ssh_key_to_use="-i $SSH_KEY"
fi
VERSION=$(sed -E 's/version in ThisBuild := "(.*)"/\1/' version.sbt)
wget -O- --post-file "$bin"/../job-server-extras/target/scala-2.10/job-server-extras_2.10-$VERSION.jar "$DEPLOY_HOSTS/jars/km"
scp -rp -o StrictHostKeyChecking=no $ssh_key_to_use "$bin"/../job-server-extras/src/main/KMeansExample/* ${APP_USER}@"${DEPLOY_HOSTS%:*}:/var/www/html/"
echo "The example is running at ${DEPLOY_HOSTS%:*}:5080"
View
@@ -18,7 +18,7 @@ if [ ! -f "$configFile" ]; then
echo "Could not find $configFile"
exit 1
fi
. $configFile
. "$configFile"
majorRegex='([0-9]+\.[0-9]+)\.[0-9]+'
if [[ $SCALA_VERSION =~ $majorRegex ]]
@@ -42,7 +42,6 @@ FILES="job-server-extras/target/scala-$majorVersion/spark-job-server.jar
bin/server_start.sh
bin/server_stop.sh
bin/kill-process-tree.sh
$CONFIG_DIR/$ENV.conf
config/shiro.ini
config/log4j-server.properties"
@@ -53,7 +52,9 @@ fi
for host in $DEPLOY_HOSTS; do
# We assume that the deploy user is APP_USER and has permissions
ssh $ssh_key_to_use ${APP_USER}@$host mkdir -p $INSTALL_DIR
scp $ssh_key_to_use $FILES ${APP_USER}@$host:$INSTALL_DIR/
scp $ssh_key_to_use $configFile ${APP_USER}@$host:$INSTALL_DIR/settings.sh
ssh -o StrictHostKeyChecking=no $ssh_key_to_use ${APP_USER}@$host mkdir -p $INSTALL_DIR
scp -o StrictHostKeyChecking=no $ssh_key_to_use $FILES ${APP_USER}@$host:$INSTALL_DIR/
scp -o StrictHostKeyChecking=no $ssh_key_to_use "$CONFIG_DIR/$ENV.conf" ${APP_USER}@$host:$INSTALL_DIR/
scp -o StrictHostKeyChecking=no $ssh_key_to_use "$configFile" ${APP_USER}@$host:$INSTALL_DIR/settings.sh
done
View
@@ -0,0 +1,24 @@
## Setting Up The EC2 Cluster
1. Sign up for an Amazon AWS account.
2. Assign your access key ID and secret access key to the bash variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
* I recommend doing this by placing the following export statements in your .bashrc file.
* export AWS_ACCESS_KEY_ID=accesskeyId
* export AWS_SECRET_ACCESS_KEY=secretAccessKey
3. Copy job-server/config/user-ec2-settings.sh.template to job-server/config/user-ec2-settings.sh and configure it. In particular, set KEY_PAIR to the name of your EC2 key pair and SSH_KEY to the location of the pair's private key.
* I recommend using an ssh key that does not require entering a password on every use. Otherwise, you will need to enter the password many times.
4. Run bin/ec2_deploy.sh to start the EC2 cluster. Go to the url printed at the end of the script to view the Spark Job Server frontend. Change the port from 8090 to 8080 to view the Spark Standalone Cluster frontend.
5. Run bin/ec2_example.sh to setup the example. Go to the url printed at the end of the script to view the example.
4. Run bin/ec2_destroy.sh to shutdown the EC2 cluster.
Note: To change the version of Spark on the cluster, set the SPARK_VERSION variable in both config/ec2.sh and config/user-ec2-settings.sh.template.
Note: The spark-ec2 script is unreliable. It may hang sometimes as it waits for every server in the cluster to come online. If you get an error message like "Warning: SSH connection error. (This could be temporary.)" for 20-30 min, just kill the script, run bin/ec2_destory.sh to kill your cluster, and restart the deploy with bin/ec2_deploy.sh.
## Using The Example
1. Start a Spark Context by pressing the "Start Context" button.
2. Load data by pressing the "Resample" button. The matrix of scatterplots and category selection dropdown will only appear after loading data from the server.
* It will take approximately 30-35 minutes the first time you press resample after starting a new context. The cluster spends 20 minutes pulling data from an S3 bucket. It spends the rest of the time running the k-means clustering algorithm.
* Subsequent presses will refresh the data in the scatterplots. These presses will take about 10 seconds as the data is reloaded from memory using a NamedRDD.
3. After performing the data analysis, shutdown the context by pressing the "Stop Context" button.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.
@@ -0,0 +1,58 @@
svg {
font: 10px sans-serif;
padding: 10px;
}
.axis,
.frame {
shape-rendering: crispEdges;
}
.axis line {
stroke: #ddd;
}
.axis path {
display: none;
}
.frame {
fill: none;
stroke: #aaa;
}
circle {
fill-opacity: .4;
}
circle.hidden {
fill: #ccc !important;
fill-opacity: .2;
}
.extent {
fill: #000;
fill-opacity: .125;
stroke: #fff;
}
.palette {
//cursor: pointer;
display: inline-block;
vertical-align: top;
margin: 200px 0 4px 6px;
padding: 4px;
background: #fff;
//border: solid 1px #aaa;
}
.swatch {
cursor: pointer;
display: block;
vertical-align: middle;
width: 40px;
color: white;
text-align: center;
padding-top: 8px;
padding-bottom: 8px;
}
@@ -0,0 +1,54 @@
<!DOCTYPE html>
<meta charset="utf-8">
<body>
<div>
<div>
<input name="startButton"
type="button"
value="Start Context"
onclick="startContext()"
class = "btn btn-default enableWhileStopped"
disabled />
<input name="updateButton"
type="button"
value="Resample"
onclick="runSampling()"
class = "btn btn-default enableWhileRunning"
disabled />
<input name="stopButton"
type="button"
value="Stop Context"
onclick="stopContext()"
class = "btn btn-default enableWhileRunning"
disabled />
<input name="filterButton"
type="button"
value="Filter Categories"
onclick="drawData()"
class = "btn btn-default" />
<select multiple="multiple" id="multiSelect">
</select>
</div>
<div id="state">
Syncing with server.
</div>
<div id="filter_options">
</div>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.6/d3.min.js"></script>
<script src="js/colorbrewer.v1.min.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/URI.js/1.16.0/URI.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0-alpha1/jquery.min.js"></script>
<link rel="stylesheet" property='stylesheet' href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/css/bootstrap.min.css" type="text/css"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.min.js"></script>
<script type="text/javascript" src="js/bootstrap-multiselect.js"></script>
<link rel="stylesheet" property='stylesheet' href="css/bootstrap-multiselect.css" type="text/css"/>
<link rel="stylesheet" property='stylesheet' href="css/scatterplot.css" type="text/css"/>
<script type="text/javascript" src="js/graphics.js"></script>
<script type="text/javascript" src="js/jobserver.js"></script>
</body>
Oops, something went wrong.
ProTip! Use n and p to navigate between commits in a pull request.