From 07901fd85e56b9667d60c8681eff826a7ae50040 Mon Sep 17 00:00:00 2001
From: Gerard Maas <gerard.maas@virdata.com>
Date: Tue, 27 May 2014 18:31:24 +0200
Subject: [PATCH 1/3] [WIP] adds info on how to create a job and the named RDD
 feature

---
 README.md | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 30ed5a9c0..5e8e94fcc 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,7 @@ We deploy our job server off of this repo at Ooyala and it is tested against CDH
 - Asynchronous and synchronous job API.  Synchronous API is great for low latency jobs!
 - Works with Standalone Spark as well as Mesos
 - Job and jar info is persisted via a pluggable DAO interface
+- Named RDDs allow jobs to store and retrieve RDDs by name, improving RRD sharing and reuse among jobs. 
 
 ## Quick start / development mode
 
@@ -106,10 +107,43 @@ In your `build.sbt`, add this to use the job server jar:
 
 	resolvers += "Ooyala Bintray" at "http://dl.bintray.com/ooyala/maven"
 
-	libraryDependencies += "ooyala.cnd" % "job-server" % "0.3.1" % "provided"                                                                                                                                                                                                             
+	libraryDependencies += "ooyala.cnd" % "job-server" % "0.3.1" % "provided"                                                                                  
 
 For most use cases it's better to have the dependencies be "provided" because you don't want SBT assembly to include the whole job server jar.
 
+To create a job that can be submitted through the job server, the job must implement the `SparkJob` trait. 
+Your job will look like:
+
+  object SampleJob  extends SparkJob {
+    override def runJob(sc:SparkContext, jobConfig: Config): Any = ???
+    override def validate(sc:SparkContext, config: Contig): SparkJobValidation = ???
+  }
+
+`runJob` contains the implementation of the Job. The SparkContext is managed by the JobServer and will be provided to the job through this method.
+This releaves the developer from the boiler-plate configuration management that comes with the creation of a Spark job and allows the Job Server to
+manage and re-use contexts.
+`validate` allows for an initial validation of the context and any provided configuration. If the context and configuration are OK to run the job, returning `spark.jobserver.SparkJobValid` will let the job execute, while returning `spark.jobserver.SparkJobInvalid(reason)` prevents the job from running and provides means to convey the reason of failure. This way, you prevent running jobs that will eventually fail due to missing or wrong configuration and save both time and resources.  
+
+### Using Named RDDs
+Named RDDs are a way to easily share RDDs among job. Using this facility, computed RDDs can be cached with a given name and later on retrieved.
+To use this feature, the SparkJob needs to mixin `NamedRddSupport`:
+
+  object SampleNamedRDDJob  extends SparkJob with NamedRddSupport {
+    override def runJob(sc:SparkContext, jobConfig: Config): Any = ???
+    override def validate(sc:SparkContext, config: Contig): SparkJobValidation = ???
+  }
+
+Then in the implementation of the job, RDDs can be stored:
+
+  this.namedRdds.update("french_dictionary", frenchDictionaryRDD)
+
+While other job can retrieve this RDD later on:
+
+  val rdd = this.namedRdds.get[(String, String)]("french_dictionary").get 
+
+(note the explicit type provided to get. This will allow to cast the retrieved RDD that otherwise is of type RDD[_])
+
+
 ## Deployment
 
 1. Copy `config/local.sh.template` to `<environment>.sh` and edit as appropriate.

From 349b9af8d7318cf852432b180d1548ae6d391808 Mon Sep 17 00:00:00 2001
From: Gerard Maas <maasg@users.noreply.github.com>
Date: Tue, 27 May 2014 18:35:10 +0200
Subject: [PATCH 2/3] Markdown style changes

---
 README.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index 5e8e94fcc..f51da7dd4 100644
--- a/README.md
+++ b/README.md
@@ -13,7 +13,7 @@ We deploy our job server off of this repo at Ooyala and it is tested against CDH
 - Asynchronous and synchronous job API.  Synchronous API is great for low latency jobs!
 - Works with Standalone Spark as well as Mesos
 - Job and jar info is persisted via a pluggable DAO interface
-- Named RDDs allow jobs to store and retrieve RDDs by name, improving RRD sharing and reuse among jobs. 
+- Named RDDs to cache and retrieve RDDs by name, improving RRD sharing and reuse among jobs. 
 
 ## Quick start / development mode
 
@@ -114,15 +114,15 @@ For most use cases it's better to have the dependencies be "provided" because yo
 To create a job that can be submitted through the job server, the job must implement the `SparkJob` trait. 
 Your job will look like:
 
-  object SampleJob  extends SparkJob {
-    override def runJob(sc:SparkContext, jobConfig: Config): Any = ???
-    override def validate(sc:SparkContext, config: Contig): SparkJobValidation = ???
-  }
+        object SampleJob  extends SparkJob {
+                override def runJob(sc:SparkContext, jobConfig: Config): Any = ???
+                override def validate(sc:SparkContext, config: Contig): SparkJobValidation = ???
+        }
 
-`runJob` contains the implementation of the Job. The SparkContext is managed by the JobServer and will be provided to the job through this method.
-This releaves the developer from the boiler-plate configuration management that comes with the creation of a Spark job and allows the Job Server to
+- `runJob` contains the implementation of the Job. The SparkContext is managed by the JobServer and will be provided to the job through this method.
+  This releaves the developer from the boiler-plate configuration management that comes with the creation of a Spark job and allows the Job Server to
 manage and re-use contexts.
-`validate` allows for an initial validation of the context and any provided configuration. If the context and configuration are OK to run the job, returning `spark.jobserver.SparkJobValid` will let the job execute, while returning `spark.jobserver.SparkJobInvalid(reason)` prevents the job from running and provides means to convey the reason of failure. This way, you prevent running jobs that will eventually fail due to missing or wrong configuration and save both time and resources.  
+- `validate` allows for an initial validation of the context and any provided configuration. If the context and configuration are OK to run the job, returning `spark.jobserver.SparkJobValid` will let the job execute, while returning `spark.jobserver.SparkJobInvalid(reason)` prevents the job from running and provides means to convey the reason of failure. This way, you prevent running jobs that will eventually fail due to missing or wrong configuration and save both time and resources.  
 
 ### Using Named RDDs
 Named RDDs are a way to easily share RDDs among job. Using this facility, computed RDDs can be cached with a given name and later on retrieved.

From b7bc3cc25d41f3a7a2ac9fd4e2ecfbc486524345 Mon Sep 17 00:00:00 2001
From: Gerard Maas <maasg@users.noreply.github.com>
Date: Thu, 5 Jun 2014 23:00:48 +0200
Subject: [PATCH 3/3] review updated docs with NamedRDDs info

---
 README.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/README.md b/README.md
index f51da7dd4..7cc48cf74 100644
--- a/README.md
+++ b/README.md
@@ -114,10 +114,10 @@ For most use cases it's better to have the dependencies be "provided" because yo
 To create a job that can be submitted through the job server, the job must implement the `SparkJob` trait. 
 Your job will look like:
 
-        object SampleJob  extends SparkJob {
-                override def runJob(sc:SparkContext, jobConfig: Config): Any = ???
-                override def validate(sc:SparkContext, config: Contig): SparkJobValidation = ???
-        }
+    object SampleJob  extends SparkJob {
+        override def runJob(sc:SparkContext, jobConfig: Config): Any = ???
+        override def validate(sc:SparkContext, config: Contig): SparkJobValidation = ???
+    }
 
 - `runJob` contains the implementation of the Job. The SparkContext is managed by the JobServer and will be provided to the job through this method.
   This releaves the developer from the boiler-plate configuration management that comes with the creation of a Spark job and allows the Job Server to
@@ -128,18 +128,18 @@ manage and re-use contexts.
 Named RDDs are a way to easily share RDDs among job. Using this facility, computed RDDs can be cached with a given name and later on retrieved.
 To use this feature, the SparkJob needs to mixin `NamedRddSupport`:
 
-  object SampleNamedRDDJob  extends SparkJob with NamedRddSupport {
-    override def runJob(sc:SparkContext, jobConfig: Config): Any = ???
-    override def validate(sc:SparkContext, config: Contig): SparkJobValidation = ???
-  }
+    object SampleNamedRDDJob  extends SparkJob with NamedRddSupport {
+        override def runJob(sc:SparkContext, jobConfig: Config): Any = ???
+        override def validate(sc:SparkContext, config: Contig): SparkJobValidation = ???
+     }
 
-Then in the implementation of the job, RDDs can be stored:
+Then in the implementation of the job, RDDs can be stored with a given name:
 
-  this.namedRdds.update("french_dictionary", frenchDictionaryRDD)
+    this.namedRdds.update("french_dictionary", frenchDictionaryRDD)
 
-While other job can retrieve this RDD later on:
+Other job running in the same context can retrieve and use this RDD later on:
 
-  val rdd = this.namedRdds.get[(String, String)]("french_dictionary").get 
+    val rdd = this.namedRdds.get[(String, String)]("french_dictionary").get 
 
 (note the explicit type provided to get. This will allow to cast the retrieved RDD that otherwise is of type RDD[_])