Permalink
Browse files

Some doc improvements and build changes to the Java 8 patch.

  • Loading branch information...
1 parent 85a954e commit 95850e6e58b83b59e1f679c7b1cd8aaa7df854dc @pwendell pwendell committed with ScrapCodes Mar 3, 2014
@@ -17,36 +17,21 @@ which support the same methods as their Scala counterparts but take Java functio
Java data and collection types. The main differences have to do with passing functions to RDD
operations (e.g. map) and handling RDDs of different types, as discussed next.
-# Upgrading from pre-1.0 versions of Spark
-
-There are following API changes for codebases written in pre-1.0 versions of Spark.
-
-* All `org.apache.spark.api.java.function.*` abstract classes are now interfaces.
- So this means that concrete implementations of these `Function` abstract classes will
- have `implements` instead of extends.
-* APIs of map and flatMap in core and map, flatMap and transform in streaming
- are changed and are defined on the basis of the passed anonymous function's
- return type, for example mapToPair(...) or flatMapToPair returns
- [`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD),
- similarly mapToDouble and flatMapToDouble returns
- [`JavaDoubleRDD`](api/core/index.html#org.apache.spark.api.java.JavaDoubleRDD).
- Please check the API documentation for more details.
-
# Key Differences in the Java API
There are a few key differences between the Java and Scala APIs:
-* Java does not support anonymous or first-class functions, so functions must
- be implemented by extending the
+* Java does not support anonymous or first-class functions, so functions are passed
+ using anonymous classes that implement the
[`org.apache.spark.api.java.function.Function`](api/core/index.html#org.apache.spark.api.java.function.Function),
[`Function2`](api/core/index.html#org.apache.spark.api.java.function.Function2), etc.
- classes.
+ interfaces.
* To maintain type safety, the Java API defines specialized Function and RDD
classes for key-value pairs and doubles. For example,
[`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD)
stores key-value pairs.
-* To support java 8 lambda expression, methods are defined on the basis of
- the passed anonymous function's (a.k.a lambda expression) return type,
+* Some methods are defined on the basis of the passed anonymous function's
+ (a.k.a lambda expression) return type,
for example mapToPair(...) or flatMapToPair returns
[`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD),
similarly mapToDouble and flatMapToDouble returns
@@ -74,10 +59,10 @@ each specialized RDD class, so filtering a `PairRDD` returns a new `PairRDD`,
etc (this acheives the "same-result-type" principle used by the [Scala collections
framework](http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html)).
-## Function Classes
+## Function Interfaces
-The following table lists the function classes used by the Java API. Each
-class has a single abstract method, `call()`, that must be implemented.
+The following table lists the function interfaces used by the Java API. Each
+interface has a single abstract method, `call()`, that must be implemented.
<table class="table">
<tr><th>Class</th><th>Function Type</th></tr>
@@ -106,6 +91,21 @@ The Java API supports other Spark features, including
[broadcast variables](scala-programming-guide.html#broadcast-variables), and
[caching](scala-programming-guide.html#rdd-persistence).
+# Upgrading From Pre-1.0 Versions of Spark
+
+In version 1.0 of Spark the Java API was refactored to better support Java 8
+lambda expressions. Users upgrading from older versions of Spark should note
+the following changes:
+
+* All `org.apache.spark.api.java.function.*` have been changed from abstract
+ classes to interfaces. This means that concrete implementations of these
+ `Function` classes will need to use `implements` rather than `extends`.
+* Certain transformation functions now have multiple versions depending
+ on the return type. In Spark core, the map functions (map, flatMap,
+ mapPartitons) have type-specific versions, e.g.
+ [`mapToPair`](api/core/index.html#org.apache.spark.api.java.JavaRDD@mapToPair[K2,V2](f:org.apache.spark.api.java.function.PairFunction[T,K2,V2]):org.apache.spark.api.java.JavaPairRDD[K2,V2])
+ and [`mapToDouble`](api/core/index.html#org.apache.spark.api.java.JavaRDD@mapToDouble[R](f:org.apache.spark.api.java.function.DoubleFunction[T]):org.apache.spark.api.java.JavaDoubleRDD).
+ Spark Streaming also uses the same approach, e.g. [`transformToPair`](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaDStream@transformToPair[K2,V2](transformFunc:org.apache.spark.api.java.function.Function[R,org.apache.spark.api.java.JavaPairRDD[K2,V2]]):org.apache.spark.streaming.api.java.JavaPairDStream[K2,V2]).
# Example
@@ -147,8 +147,8 @@ class Split extends FlatMapFunction<String, String> {
JavaRDD<String> words = lines.flatMap(new Split());
{% endhighlight %}
-Java 8+ users can also possibly write the above `FlatMapFunction` in a more concise way using
-lambda expression as follows:
+Java 8+ users can also write the above `FlatMapFunction` in a more concise way using
+a lambda expression:
{% highlight java %}
JavaRDD<String> words = lines.flatMap(s -> Arrays.asList(s.split(" ")));
@@ -209,10 +209,11 @@ just a matter of style.
We currently provide documentation for the Java API as Scaladoc, in the
[`org.apache.spark.api.java` package](api/core/index.html#org.apache.spark.api.java.package), because
-some of the classes are implemented in Scala. The main downside is that the types and function
+some of the classes are implemented in Scala. It is important to note that the types and function
definitions show Scala syntax (for example, `def reduce(func: Function2[T, T]): T` instead of
-`T reduce(Function2<T, T> func)`).
-We hope to generate documentation with Java-style syntax in the future.
+`T reduce(Function2<T, T> func)`). In addition, the Scala `trait` modifier is used for Java
+interface classes. We hope to generate documentation with Java-style syntax in the future to
+avoid these quirks.
# Where to Go from Here
View
@@ -0,0 +1 @@
+This directory contains build components not included by default in Spark's build.
@@ -1,15 +1,24 @@
-# Java 8 test suites.
+# Java 8 Test Suites
-These tests are bundled with spark and run if you have java 8 installed as system default or your `JAVA_HOME` points to a java 8(or higher) installation. `JAVA_HOME` is preferred to system default jdk installation. Since these tests require jdk 8 or higher, they defined to be optional to run in the build system.
+These tests require having Java 8 installed and are isolated from the main Spark build.
+If Java 8 is not your system's default Java version, you will need to point Spark's build
+to your Java location. The set-up depends a bit on the build system:
-* For sbt users, it automatically detects the presence of java 8 based on either `JAVA_HOME` environment variable or default installed jdk and run these tests. It takes highest precednce, if java home is passed as follows.
+* Sbt users can either set JAVA_HOME to the location of a Java 8 JDK or explicitly pass
+ `-java-home` to the sbt launch script. If a Java 8 JDK is detected sbt will automatically
+ include the Java 8 test project.
- `$ sbt/sbt -java-home "/path/to/jdk1.8.0"`
+ `$ JAVA_HOME=/opt/jdk1.8.0/ sbt/sbt clean "test-only org.apache.spark.Java8APISuite"`
-* For maven users,
+* For Maven users,
- This automatic detection is not possible and thus user has to ensure that either `JAVA_HOME` environment variable or default installed jdk points to jdk 8.
+ Maven users can also refer to their Java 8 directory using JAVA_HOME. However, Maven will not
+ automatically detect the presence of a Java 8 JDK, so a special build profile `-Pjava8-tests`
+ must be used.
- `$ mvn install -Pjava8-tests`
+ `$ JAVA_HOME=/opt/jdk1.8.0/ mvn clean install -DskipTests`
+ `$ JAVA_HOME=/opt/jdk1.8.0/ mvn test -Pjava8-tests -DwildcardSuites=org.apache.spark.Java8APISuite`
- Above command can only be run from project root directory since this module depends on both core and test-jars of core and streaming. These jars are installed first time the above command is run as java8-tests profile enables local publishing of test-jar artifacts as well. Once these artifacts are published then these tests can be run from this module's directory as well.
+ Note that the above command can only be run from project root directory since this module
+ depends on core and the test-jars of core and streaming. This means an install step is
+ required to make the test dependencies visible to the Java 8 sub-project.
@@ -20,8 +20,8 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent</artifactId>
- <version>1.0.0-incubating-SNAPSHOT</version>
- <relativePath>../pom.xml</relativePath>
+ <version>1.0.0-SNAPSHOT</version>
+ <relativePath>../../pom.xml</relativePath>
</parent>
<groupId>org.apache.spark</groupId>
@@ -77,12 +77,18 @@
</execution>
</executions>
<configuration>
+ <systemPropertyVariables>
+ <!-- For some reason surefire isn't setting this log4j file on the
+ test classpath automatically. So we add it manually. -->
+ <log4j.configuration>
+ file:src/test/resources/log4j.properties
+ </log4j.configuration>
+ </systemPropertyVariables>
<skipTests>false</skipTests>
<includes>
<include>**/Suite*.java</include>
<include>**/*Suite.java</include>
</includes>
- <redirectTestOutputToFile>true</redirectTestOutputToFile>
</configuration>
</plugin>
<plugin>
@@ -0,0 +1,28 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Set everything to be logged to the file target/unit-tests.log
+log4j.rootCategory=INFO, file
+log4j.appender.file=org.apache.log4j.FileAppender
+log4j.appender.file.append=false
+log4j.appender.file.file=target/unit-tests.log
+log4j.appender.file.layout=org.apache.log4j.PatternLayout
+log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %p %c{1}: %m%n
+
+# Ignore messages below warning level from Jetty, because it's a bit verbose
+log4j.logger.org.eclipse.jetty=WARN
+org.eclipse.jetty.LEVEL=WARN
@@ -18,7 +18,8 @@ declare -a scalac_args
declare -a sbt_commands
if test -x "$JAVA_HOME/bin/java"; then
- echo -e "using $JAVA_HOME for launching sbt."
+ echo -e "Using $JAVA_HOME as default JAVA_HOME."
+ echo "Note, this will be overridden by -java-home if it is set."
declare java_cmd="$JAVA_HOME/bin/java"
else
declare java_cmd=java

0 comments on commit 95850e6

Please sign in to comment.