Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Databricks cluster deployment to the user guide #86

Merged
merged 4 commits into from Dec 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -7,6 +7,7 @@ project/.gnupg/local*
project/target
target/
.history
.bsp

# Java
*.class
Expand Down
33 changes: 13 additions & 20 deletions .travis.yml
Expand Up @@ -9,29 +9,22 @@ addons:
packages:
- shellcheck

before_install:
- git fetch --tags
- docker pull exasol/docker-db:7.0.2
- docker pull exasol/docker-db:6.2.9-d1
- docker network create -d bridge --subnet 192.168.0.0/24 --gateway 192.168.0.1 dockernet

matrix:
include:
- jdk: openjdk8
scala: 2.11.12
env: SPARK_VERSION="2.4.5" EXASOL_DOCKER_VERSION="6.2.9-d1"
jdk:
- openjdk11

- jdk: openjdk8
scala: 2.11.12
env: SPARK_VERSION="2.4.5" EXASOL_DOCKER_VERSION="7.0.2"
scala:
- 2.12.12

- jdk: openjdk8
scala: 2.12.12
env: SPARK_VERSION="2.4.5" EXASOL_DOCKER_VERSION="6.2.9-d1"
env:
- SPARK_VERSION="2.4.5" EXASOL_DOCKER_VERSION="6.2.12-d1"
- SPARK_VERSION="2.4.5" EXASOL_DOCKER_VERSION="7.0.4"
- SPARK_VERSION="3.0.1" EXASOL_DOCKER_VERSION="6.2.12-d1"
- SPARK_VERSION="3.0.1" EXASOL_DOCKER_VERSION="7.0.4"

- jdk: openjdk8
scala: 2.12.12
env: SPARK_VERSION="2.4.5" EXASOL_DOCKER_VERSION="7.0.2"
before_install:
- git fetch --tags
- docker pull "exasol/docker-db:$EXASOL_DOCKER_VERSION"
- docker network create -d bridge --subnet 192.168.0.0/24 --gateway 192.168.0.1 dockernet

script:
- travis_wait 30 ./scripts/ci.sh
Expand Down
1 change: 1 addition & 0 deletions doc/changes/changelog.md
@@ -1,5 +1,6 @@
# Releases

* [1.0.0](changes_1.0.0.md)
* [0.3.2](changes_0.3.2.md)
* [0.3.1](changes_0.3.1.md)
* [0.3.0](changes_0.3.0.md)
Expand Down
33 changes: 33 additions & 0 deletions doc/changes/changes_1.0.0.md
@@ -0,0 +1,33 @@
# Spark Exasol Connector 1.0.0, released 2020-12-DD

## Features / Improvements


## Documentation

* #85: Updated documentation with configuration for the Databricks cluster (PR #86)

## Dependency Updates

### Runtime Dependency Updates

* Updated to `com.exasol:exasol-jdbc:7.0.4` (was `7.0.0`)
* Updated to `org.apache.spark:spark-core:3.0.1` (was `2.4.5`)
* Updated to `org.apache.spark:spark-sql:3.0.1` (was `2.4.5`)

### Test Dependency Updates

* Updated to `org.scalatest:scalatest:3.2.2` (was `3.2.2`)
* Updated to `org.testcontainers:jdbc:1.15.0` (was `1.14.3`)
* Updated to `com.holdenkarau:spark-testing-base:3.0.1_1.0.0` (was `2.4.5_0.14.0)
* Updated to `org.mockito:mockito-core:3.6.28` (was `3.5.13`)
* Updated to `com.dimafeng:testcontainers-scala:0.38.7` (was `0.38.4`)

### Plugin Updates

* Updated to `sbt.version:1.4.4` (was `1.3.13`)
* Updated to `org.wartremover:sbt-wartremover:2.4.13` (was `2.4.10`)
* Updated to `org.wartremover:sbt-wartremover-contrib:1.3.11` (was `1.3.8`)
* Updated to `com.jsuereth:sbt-pgp:2.0.2` (was `2.0.1`)
* Updated to `org.xerial.sbt:sbt-sonatype:3.9.5` (was `3.9.4`)
* Removed `io.get-coursier:sbt-coursier`
Binary file added doc/images/databricks-deployment.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions doc/user_guide/user_guide.md
Expand Up @@ -86,6 +86,29 @@ libraryDependencies += "com.exasol" % "spark-connector" %% "<VERSION>"
Similar to the Java dependency, we add the resolver to the Exasol Artifactory so
that the Exasol JDBC driver can be found.

### Spark Exasol Connector as Databricks Cluster Dependency

Similar to using maven, you should provide maven artifact coordinates to the
[Databricks Workspace Library][databricks-install].

[databricks-install]: https://docs.databricks.com/libraries/workspace-libraries.html#maven-libraries

<img alt="databricks-deployment" src="../images/databricks-deployment.png"/>

Go to your cluster, then to `Libraries`, and click `Install New`:

- Select Maven as a Library Source.
- In the Coordinate field, enter artifact coordinates
`com.exasol:spark-connector_2.12:<VERSION>`. Please notice that we use the
Scala version 2.12, change it to 2.11 if your Databricks Runtime version
requires it.
- In the Repository field, enter the Exasol Artifactory
`https://maven.exasol.com/artifactory/exasol-releases`.
- Click Install.

Please change the `<VERSION>` to one of the latest Spark Exasol Connector
releases.

### Spark Exasol Connector With Spark Shell

You can also integrate the Spark Exasol Connector to the Spark Shell. Provide
Expand Down
14 changes: 7 additions & 7 deletions project/Dependencies.scala
Expand Up @@ -6,19 +6,19 @@ import sbt._
object Dependencies {

// Versions
private val DefaultSparkVersion = "2.4.5"
private val ExasolJdbcVersion = "7.0.0"
private val DefaultSparkVersion = "3.0.1"
private val ExasolJdbcVersion = "7.0.4"

private val ScalaTestVersion = "3.2.2"
private val ScalaTestVersion = "3.2.3"
private val ScalaTestMockitoVersion = "1.0.0-M2"
private val MockitoVersion = "3.5.13"
private val ContainersJdbcVersion = "1.14.3"
private val ContainersScalaVersion = "0.38.4"
private val MockitoVersion = "3.6.28"
private val ContainersJdbcVersion = "1.15.0"
private val ContainersScalaVersion = "0.38.7"

private val sparkCurrentVersion =
sys.env.getOrElse("SPARK_VERSION", DefaultSparkVersion)

private val SparkTestingBaseVersion = s"${sparkCurrentVersion}_0.14.0"
private val SparkTestingBaseVersion = s"${sparkCurrentVersion}_1.0.0"

val Resolvers: Seq[Resolver] = Seq(
"Exasol Releases" at "https://maven.exasol.com/artifactory/exasol-releases"
Expand Down
26 changes: 26 additions & 0 deletions project/Settings.scala
Expand Up @@ -19,6 +19,7 @@ object Settings {
buildSettings(scalaVersion) ++
miscSettings ++
assemblySettings ++
apiDocSettings ++
scalaStyleSettings ++
Publishing.publishSettings()

Expand Down Expand Up @@ -70,6 +71,31 @@ object Settings {
}
)

def apiDocSettings(): Seq[Setting[_]] = Seq(
autoAPIMappings := true,
apiMappings ++= scalaInstance.value.libraryJars.collect {
case file if file.getName.startsWith("scala-library") && file.getName.endsWith(".jar") =>
file -> url(s"http://www.scala-lang.org/api/${scalaVersion.value}/")
}.toMap ++
// Since Java 9+ introduced modules, API links changed, update these
// links based on used Java modules.
Map(
file("/modules/java.sql") -> url(
"https://docs.oracle.com/en/java/javase/11/docs/api/java.sql"
)
),
// Override doc task in 2.11.x versions since linking external Java
// 11+ classes does not work.
(Compile / doc) := Def.taskDyn {
val docTaskValue = (Compile / doc).taskValue
if (scalaBinaryVersion.value == "2.11") {
(Compile / doc / target).toTask
} else {
Def.task(docTaskValue.value)
}
}.value
)

def scalaStyleSettings(): Seq[Setting[_]] = {
// Creates a Scalastyle task that runs with tests
lazy val mainScalastyle = taskKey[Unit]("mainScalastyle")
Expand Down
2 changes: 1 addition & 1 deletion project/build.properties
@@ -1 +1 @@
sbt.version=1.3.13
sbt.version=1.4.4
12 changes: 4 additions & 8 deletions project/plugins.sbt
@@ -1,14 +1,10 @@
// Adds a pure Scala artifact fetcher `coursier`
// https://github.com/coursier/coursier
addSbtPlugin("io.get-coursier" % "sbt-coursier" % "1.0.3")

// Adds a `wartremover` a flexible Scala code linting tool
// http://github.com/puffnfresh/wartremover
addSbtPlugin("org.wartremover" % "sbt-wartremover" % "2.4.10")
addSbtPlugin("org.wartremover" % "sbt-wartremover" % "2.4.13")

// Adds Contrib Warts
// http://github.com/wartremover/wartremover-contrib/
addSbtPlugin("org.wartremover" % "sbt-wartremover-contrib" % "1.3.8")
addSbtPlugin("org.wartremover" % "sbt-wartremover-contrib" % "1.3.11")

// Adds a `assembly` task to create a fat JAR with all of its dependencies
// https://github.com/sbt/sbt-assembly
Expand Down Expand Up @@ -45,11 +41,11 @@ addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.2")

// Adds a `sonatype` release tasks
// https://github.com/xerial/sbt-sonatype
addSbtPlugin("org.xerial.sbt" % "sbt-sonatype" % "3.9.4")
addSbtPlugin("org.xerial.sbt" % "sbt-sonatype" % "3.9.5")

// Adds a `gnu-pgp` plugin
// https://github.com/sbt/sbt-pgp
addSbtPlugin("com.jsuereth" % "sbt-pgp" % "2.0.1")
addSbtPlugin("com.jsuereth" % "sbt-pgp" % "2.0.2")

// Adds a `git` plugin
// https://github.com/sbt/sbt-git
Expand Down
8 changes: 4 additions & 4 deletions sbtx
Expand Up @@ -34,10 +34,10 @@

set -o pipefail

declare -r sbt_release_version="1.3.13"
declare -r sbt_unreleased_version="1.4.0-M1"
declare -r sbt_release_version="1.4.4"
declare -r sbt_unreleased_version="1.4.4"

declare -r latest_213="2.13.3"
declare -r latest_213="2.13.4"
declare -r latest_212="2.12.12"
declare -r latest_211="2.11.12"
declare -r latest_210="2.10.7"
Expand Down Expand Up @@ -471,7 +471,7 @@ process_args() {
-trace) require_arg integer "$1" "$2" && trace_level="$2" && shift 2 ;;
-debug-inc) addJava "-Dxsbt.inc.debug=true" && shift ;;

-no-colors) addJava "-Dsbt.log.noformat=true" && shift ;;
-no-colors) addJava "-Dsbt.log.noformat=true" && addJava "-Dsbt.color=false" && shift ;;
-sbt-create) sbt_create=true && shift ;;
-sbt-dir) require_arg path "$1" "$2" && sbt_dir="$2" && shift 2 ;;
-sbt-boot) require_arg path "$1" "$2" && addJava "-Dsbt.boot.directory=$2" && shift 2 ;;
Expand Down
4 changes: 2 additions & 2 deletions scripts/ci.sh
Expand Up @@ -10,7 +10,7 @@ BASE_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )"/.. && pwd )"
cd "$BASE_DIR"

MAIN_SCALA_VERSION=2.12.12
MAIN_SPARK_VERSION=2.4.5
MAIN_SPARK_VERSION=3.0.1

if [[ -z "${TRAVIS_SCALA_VERSION:-}" ]]; then
echo "Environment variable TRAVIS_SCALA_VERSION is not set"
Expand Down Expand Up @@ -104,7 +104,7 @@ run_dependency_info () {
./sbtx \
-Dspark.currentVersion=$SPARK_VERSION \
++$TRAVIS_SCALA_VERSION \
dependencyUpdates pluginUpdates coursierDependencyTree
dependencyUpdates pluginUpdates dependencyTree
}

run_shell_check () {
Expand Down
Expand Up @@ -10,7 +10,7 @@

public class ExasolDockerContainer<SELF extends ExasolDockerContainer<SELF>>
extends JdbcDatabaseContainer<SELF> {
private static final String DEFAULT_EXASOL_VERSION = "7.0.2";
private static final String DEFAULT_EXASOL_VERSION = "7.0.4";
// wait for 5 minutes to startup
private static final Integer EXASOL_STARTUP_TIME = 15 * 60;
private String username = "sys";
Expand Down
4 changes: 2 additions & 2 deletions src/main/scala/com/exasol/spark/rdd/ExasolRDD.scala
Expand Up @@ -13,11 +13,11 @@ import org.apache.spark.rdd.RDD
import org.apache.spark.scheduler.SparkListener
import org.apache.spark.scheduler.SparkListenerApplicationEnd
import org.apache.spark.sql.Row
import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
import org.apache.spark.sql.types.StructType

import com.exasol.jdbc.EXAConnection
import com.exasol.jdbc.EXAResultSet
import com.exasol.spark.util.Converter
import com.exasol.spark.util.ExasolConnectionManager

/**
Expand Down Expand Up @@ -155,7 +155,7 @@ class ExasolRDD(
conn = manager.subConnection(partition.connectionUrl)
resultSet = conn.DescribeResult(subHandle)

Converter.resultSetToRows(resultSet, querySchema)
JdbcUtils.resultSetToRows(resultSet, querySchema)
}
// scalastyle:on null return

Expand Down
11 changes: 0 additions & 11 deletions src/main/scala/com/exasol/spark/util/Converter.scala
Expand Up @@ -6,7 +6,6 @@ import java.sql.ResultSet
import org.apache.spark.internal.Logging
import org.apache.spark.sql.Row
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.types._
Expand All @@ -21,16 +20,6 @@ import org.apache.spark.unsafe.types.UTF8String
*/
object Converter extends Logging {

/**
* Converts a [[java.sql.ResultSet]] into an iterator of
* [[org.apache.spark.sql.Row]]-s.
*/
def resultSetToRows(resultSet: ResultSet, schema: StructType): Iterator[Row] = {
val encoder = RowEncoder(schema).resolveAndBind()
val internalRows = resultSetToSparkInternalRows(resultSet, schema)
internalRows.map(encoder.fromRow)
}

@SuppressWarnings(Array("org.wartremover.warts.AsInstanceOf"))
def resultSetToSparkInternalRows(
resultSet: ResultSet,
Expand Down
8 changes: 4 additions & 4 deletions src/test/scala/com/exasol/spark/util/FiltersSuite.scala
Expand Up @@ -41,7 +41,7 @@ class FiltersSuite extends AnyFunSuite with Matchers {
| str_col <= '123'
|AND int_col > 42
|AND (in_col NOT IN (1,2,3))
""".stripMargin.lines.mkString(" ").trim
""".stripMargin.replaceAll("\\s+", " ").trim()

assert(createWhereClause(testSchema, filters) === expected)
}
Expand Down Expand Up @@ -78,7 +78,7 @@ class FiltersSuite extends AnyFunSuite with Matchers {
|AND (str_col LIKE '%inside%')
|AND (str_col LIKE 'prefix%')
|AND (in_col IN (1,2,3))
""".stripMargin.lines.mkString(" ").trim
""".stripMargin.replaceAll("\\s+", " ").trim()

assert(createWhereClause(testSchema, filters) === expected)
}
Expand Down Expand Up @@ -108,7 +108,7 @@ class FiltersSuite extends AnyFunSuite with Matchers {
|AND double_col = 100.0
|AND date_col = date '2018-01-01'
|AND datetime_col = timestamp '2018-01-01 00:00:59.123'
""".stripMargin.lines.mkString(" ").trim
""".stripMargin.replaceAll("\\s+", " ").trim()

assert(createWhereClause(testSchema, filters) === expected)
}
Expand All @@ -126,7 +126,7 @@ class FiltersSuite extends AnyFunSuite with Matchers {
|((str_col = 'abc') OR (int_col = 123))
|AND (((NOT ((int_col IS NULL)))) OR ((str_col IS NOT NULL)))
|AND ((str_col = 'xyz') OR (((float_col = 3.14) AND (int_col != 3))))
""".stripMargin.lines.mkString(" ").trim
""".stripMargin.replaceAll("\\s+", " ").trim()

assert(createWhereClause(testSchema, filters) === expected)
}
Expand Down