Skip to content
This repository has been archived by the owner on Jul 27, 2023. It is now read-only.

Examples/spark #1267

Merged
merged 4 commits into from Apr 6, 2016
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
33 changes: 33 additions & 0 deletions examples/spark-pi/README.md
@@ -0,0 +1,33 @@
#Spark-pi example
The aim of this example is to show how to run a Spark application in cluster mode on MANTL.

##Prerequisites
* Install Spark throug the mantl-api:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Install Spark (via|with) mantl-api:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch!

```bash
curl -k -i -L -X POST -d "{\"name\": \"spark\"}" https://admin:password@control-node/api/1/install
```
>**Note**: in the previous command, don't forget to substitute *password* and *control-node* with the actual password,
and the actual control node IP (or domain name) of your MANTL cluster.

* Install the GlusterFS addon: http://docs.mantl.io/en/latest/components/glusterfs.html

* SSH into one of your nodes, and create a configuration file `/mnt/container-volumes/spark-conf/spark-defaults.conf` that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be done on all nodes via Ansible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you could do that using the mantl-api. @ryane Am i right?
I can think 2 approaches to make this silmpler:

  1. Create the configuration directory on GlusterFS when the package is installed. This can be done via the mantl-api (I already tested it). Basically you mount the spark-conf directory in marathon.json, and you write the configuration.
  2. Using a custom mantl/spark docker image that reads the secret from zookeeper (assuming that is stored there), and that creates the configuration within the container. This would be the best option IMO. I can create this image when I have time, then if you feel like you can think to move it into your repository.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcapuccini Is there a reason not to use Ansible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddharthist if you have multiple copies of the conf file, and you want to change the configuration, then you will have to modify all of them. I find it easier to have only one copy on GlusterFS.

looks someting like this:
```
spark.mesos.principal=mantl-api
spark.mesos.secret=your_mantl_cluster_mesos_secret
spark.mesos.executor.docker.volumes=/mnt/container-volumes/spark-conf:/opt/spark/dist/conf:ro
spark.mesos.executor.docker.image=mesosphere/spark:1.6.0
```
You can figure out the mesos secret form the *security.yml* file, that you created usining the *security-setup* script:
```bash
cat security.yml | grep mantl_api_secret
```

##Schedule the job using Chronos
To schedule Spark-pi using Chronos, please run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Chronos role will soon be an addon, so we should include instructions similar to GlusterFS here when #1260 is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcapuccini Once this is added, we can merge this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddharthist Sure, just did it!

```
./schedule.sh
```
>**Note**: in the *spark-pi.json* file, the spark-pi job is scheduled for the new year eve of 2030, therefore you might
want to force its run from the Chronos API. In alternative, you can change this date in *spark-pi.json* before to run `./schedule.sh`.
11 changes: 11 additions & 0 deletions examples/spark-pi/schedule.sh
@@ -0,0 +1,11 @@
#!/bin/bash

echo "Please insert your MANTL control node IP address (or domain name)"
read -r MANTL_CONT
echo "Please instert your MANTL admin password:"
read -sr MANTL_PASS
CHRONOS="admin:$MANTL_PASS@$MANTL_CONT/chronos"

curl -k -i -L -X POST -H "Content-type: application/json" "https://$CHRONOS/scheduler/iso8601" -d@"spark-pi.json"
echo #prints a newline

20 changes: 20 additions & 0 deletions examples/spark-pi/spark-pi.json
@@ -0,0 +1,20 @@
{
"schedule" : "R0/2030-01-01T12:00:00Z/PT1H",
"cpus": "0.5",
"mem": "512",
"epsilon" : "PT30M",
"name" : "spark-pi",
"container": {
"type": "DOCKER",
"image": "mcapuccini/spark:1.6.0",
"volumes": [
{
"hostPath": "/mnt/container-volumes/spark-conf",
"containerPath": "/opt/spark/dist/conf",
"mode": "RO"
}
]
},
"command" : "MASTER_PORT=$(dig +short spark.service.consul SRV | awk '{print $3}' | sort | head -1) && bin/spark-submit --class org.apache.spark.examples.SparkPi --master mesos://spark.service.consul:$MASTER_PORT --deploy-mode cluster http://central.maven.org/maven2/org/apache/spark/spark-examples_2.10/1.1.1/spark-examples_2.10-1.1.1.jar 1000",
"owner" : "user@example.com"
}