Skip to content

Commit

Permalink
Added support of the spark 3.1, updated Layerfile to test other versi…
Browse files Browse the repository at this point in the history
…ons of S2 and spark

Summary:
 - Added support of the spark 3.1 and deleted support of the 2.3 and 2.4
 - Deleted testing of spark 2.3 and spark 2.4
 - Deleted testing of the 6.7 and 6.8 SingleStore versions
 - Added testing of the 7.5 SingleStore version
 - Added testing of the 3.1 spark version
 - Changed the way sbt is installed in CI (Bintray doesn't work. I guess that it is connected to  https://www.infoq.com/news/2021/02/jfrog-jcenter-bintray-closure/ )
**Design doc/spec**:
**Docs impact**: none

Test Plan: https://webapp.io/memsql/commits?query=%22Adalbert+Makarovych%22+repo%3Asinglestore-spark-connector+id%3A16

Reviewers: iblinov-ua, vtkachuk-ua, carl

Reviewed By: iblinov-ua

Subscribers: engineering-list

JIRA Issues: PLAT-5757

Differential Revision: https://grizzly.internal.memcompute.com/D50927
  • Loading branch information
AdalbertMemSQL committed Sep 6, 2021
1 parent 5c091bc commit b56a7c3
Show file tree
Hide file tree
Showing 32 changed files with 501 additions and 467 deletions.
13 changes: 0 additions & 13 deletions .idea/runConfigurations/Test_Spark_2_4.xml

This file was deleted.

2 changes: 1 addition & 1 deletion .idea/runConfigurations/Test_Spark_3_0.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 0 additions & 10 deletions .idea/runConfigurations/ensure_test_memsql_cluster_6_8.xml

This file was deleted.

12 changes: 5 additions & 7 deletions Layerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,9 @@ RUN sudo apt update && \
sudo apt install -y curl python-pip mysql-client-core-5.7

# install sbt
RUN curl -L -o /tmp/sbt-1.3.5.deb https://dl.bintray.com/sbt/debian/sbt-1.3.5.deb && \
sudo dpkg -i /tmp/sbt-1.3.5.deb && \
sudo rm /tmp/sbt-1.3.5.deb && \
sudo apt-get update && \
sudo apt-get install -y sbt
RUN wget https://github.com/sbt/sbt/releases/download/v1.3.5/sbt-1.3.5.tgz && \
sudo tar xzvf sbt-1.3.5.tgz -C /usr/share/ && \
sudo update-alternatives --install /usr/bin/sbt sbt /usr/share/sbt/bin/sbt 100

# install the latest version of Docker
RUN apt-get update && \
Expand Down Expand Up @@ -37,9 +35,9 @@ SECRET ENV LICENSE_KEY
MEMORY 4G
MEMORY 8G

# split to 15 states
# split to 8 states
# each of them will run different version of the singlestore and spark
SPLIT 15
SPLIT 8

# copy the entire git repository
COPY . .
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ spark-packages.org. The group is `com.singlestore` and the artifact is

You can add the connector to your Spark application using: spark-shell, pyspark, or spark-submit
```
$SPARK_HOME/bin/spark-shell --packages com.singlestore:singlestore-spark-connector_2.12:3.1.2-spark-3.0.0
$SPARK_HOME/bin/spark-shell --packages com.singlestore:singlestore-spark-connector_2.12:3.1.2-spark-3.1.0
```

We release three versions of the `singlestore-spark-connector`, one per Spark version.
An example version number is: `3.1.2-spark-3.0.0` which is the 3.1.2
version of the connector, compiled and tested against Spark 3.0.0. Make sure
We release two versions of the `singlestore-spark-connector`, one per Spark version.
An example version number is: `3.1.2-spark-3.1.0` which is the 3.1.2
version of the connector, compiled and tested against Spark 3.1.0. Make sure
you are using the most recent version of the connector.

In addition to adding the `singlestore-spark-connector`, you will also need to have the
Expand Down Expand Up @@ -514,7 +514,7 @@ The SingleStore Spark Connector 3.1.2 has a number of key features and enhanceme
* Implemented as a native Spark SQL plugin
* Supports both the DataSource and DataSourceV2 API for maximum support of current and future functionality
* Contains deep integrations with the Catalyst query optimizer
* Is compatible with Spark 2.3, 2.4 and 3.0
* Is compatible with Spark 3.0 and 3.1
* Leverages SingleStore LOAD DATA to accelerate ingest from Spark via compression, vectorized cpu instructions, and optimized segment sizes
* Takes advantage of all the latest and greatest features in SingleStore 7.x

Expand Down
12 changes: 4 additions & 8 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,8 @@ import xerial.sbt.Sonatype._
To run tests or publish with a specific spark version use this java option:
-Dspark.version=3.0.0
*/
val sparkVersion = sys.props.get("spark.version").getOrElse("3.0.0")
val scalaVersionStr = sparkVersion match {
case "2.3.4" | "2.4.4" => "2.11.11"
case _ => "2.12.12"
}
val sparkVersion = sys.props.get("spark.version").getOrElse("3.0.0")
val scalaVersionStr = "2.12.12"
val scalaVersionPrefix = scalaVersionStr.substring(0, 4)

lazy val root = project
Expand All @@ -19,9 +16,8 @@ lazy val root = project
organization := "com.singlestore",
scalaVersion := scalaVersionStr,
Compile / unmanagedSourceDirectories += (Compile / sourceDirectory).value / (sparkVersion match {
case "2.3.4" => "scala-sparkv2"
case "2.4.4" => "scala-sparkv2"
case _ => "scala-sparkv3"
case "3.0.0" => "scala-sparkv3.0"
case "3.1.0" => "scala-sparkv3.1"
}),
version := s"3.1.2-spark-${sparkVersion}",
licenses += "Apache-2.0" -> url(
Expand Down
40 changes: 14 additions & 26 deletions scripts/define-layerci-matrix.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,41 +3,29 @@ set -eu

TEST_NUM=${SPLIT:-"0"}

if [ "$TEST_NUM" == '0' ] || [ "$TEST_NUM" == '1' ] || [ "$TEST_NUM" == '2' ]
if [ "$TEST_NUM" == '0' ] || [ "$TEST_NUM" == '1' ]
then
echo 'export SINGLESTORE_IMAGE="memsql/cluster-in-a-box:centos-7.0.15-619d118712-1.9.5-1.5.0"'
elif [ "$TEST_NUM" == '3' ] || [ "$TEST_NUM" == '4' ] || [ "$TEST_NUM" == '5' ]
then
echo 'export SINGLESTORE_IMAGE="memsql/cluster-in-a-box:centos-6.8.15-029542cbf3-1.9.3-1.4.1"'
elif [ "$TEST_NUM" == '6' ] || [ "$TEST_NUM" == '7' ] || [ "$TEST_NUM" == '8' ]
then
echo 'export SINGLESTORE_IMAGE="memsql/cluster-in-a-box:6.7.18-db1caffe94-1.6.1-1.1.1"'
elif [ "$TEST_NUM" == '9' ] || [ "$TEST_NUM" == '10' ] || [ "$TEST_NUM" == '11' ]
elif [ "$TEST_NUM" == '2' ] || [ "$TEST_NUM" == '3' ]
then
echo 'export SINGLESTORE_IMAGE="memsql/cluster-in-a-box:centos-7.1.13-11ddea2a3a-3.0.0-1.9.3"'
else
echo 'export SINGLESTORE_PASSWORD="password"'
elif [ "$TEST_NUM" == '4' ] || [ "$TEST_NUM" == '5' ]
then
echo 'export SINGLESTORE_IMAGE="memsql/cluster-in-a-box:centos-7.3.2-a364d4b31f-3.0.0-1.9.3"'
echo 'export SINGLESTORE_PASSWORD="password"'
else
echo 'export SINGLESTORE_IMAGE="memsql/cluster-in-a-box:centos-7.5.8-12c73130aa-3.2.11-1.11.11"'
echo 'export SINGLESTORE_PASSWORD="password"'
fi


if [ "$TEST_NUM" == '0' ] || [ "$TEST_NUM" == '3' ] || [ "$TEST_NUM" == '6' ] || [ "$TEST_NUM" == '9' ] || [ "$TEST_NUM" == '12' ]
if [ "$TEST_NUM" == '0' ] || [ "$TEST_NUM" == '2' ] || [ "$TEST_NUM" == '4' ] || [ "$TEST_NUM" == '6' ]
then
echo 'export SPARK_VERSION="3.0.0"'
echo 'export SCALA_VERSION="2.12.12"'
echo 'export TEST_FILTER="test"'
elif [ "$TEST_NUM" == '1' ] || [ "$TEST_NUM" == '4' ] || [ "$TEST_NUM" == '7' ] || [ "$TEST_NUM" == '10' ] || [ "$TEST_NUM" == '13' ]
then
echo 'export SPARK_VERSION="2.4.4"'
echo 'export SCALA_VERSION="2.11.11"'
echo 'export TEST_FILTER="testOnly -- -l OnlySpark3"'
echo 'export TEST_FILTER="testOnly -- -l OnlySpark31"'
else
echo 'export SPARK_VERSION="2.3.4"'
echo 'export SCALA_VERSION="2.11.11"'
echo 'export TEST_FILTER="testOnly -- -l OnlySpark3"'
echo 'export SPARK_VERSION="3.1.0"'
echo 'export TEST_FILTER="testOnly -- -l OnlySpark30"'
fi

if [ "$TEST_NUM" -gt 8 ]
then
echo 'export SINGLESTORE_PASSWORD="password"'
fi

echo 'export SCALA_VERSION="2.12.12"'
4 changes: 0 additions & 4 deletions scripts/ensure-test-singlestore-cluster-68.sh

This file was deleted.

This file was deleted.

This file was deleted.

14 changes: 0 additions & 14 deletions src/main/scala-sparkv2/com/singlestore/spark/JoinExtractor.scala

This file was deleted.

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
package com.singlestore.spark

import com.singlestore.spark.SQLGen.{ExpressionExtractor, SQLGenContext, Statement}
import com.singlestore.spark.ExpressionGen.aggregateWithFilter
import org.apache.spark.sql.catalyst.expressions.Literal
import org.apache.spark.sql.catalyst.expressions.aggregate.{
AggregateFunction,
First,
Last,
StddevPop,
StddevSamp,
VariancePop,
VarianceSamp
}
import org.apache.spark.sql.types.BooleanType

case class VersionSpecificAggregateExpressionExtractor(expressionExtractor: ExpressionExtractor,
context: SQLGenContext,
filter: Option[SQLGen.Joinable]) {
def unapply(f: AggregateFunction): Option[Statement] = {
f match {
// CentralMomentAgg.scala
case StddevPop(expressionExtractor(child)) =>
Some(aggregateWithFilter("STDDEV_POP", child, filter))
case StddevSamp(expressionExtractor(child)) =>
Some(aggregateWithFilter("STDDEV_SAMP", child, filter))
case VariancePop(expressionExtractor(child)) =>
Some(aggregateWithFilter("VAR_POP", child, filter))
case VarianceSamp(expressionExtractor(child)) =>
Some(aggregateWithFilter("VAR_SAMP", child, filter))

// First.scala
case First(expressionExtractor(child), Literal(false, BooleanType)) =>
Some(aggregateWithFilter("ANY_VALUE", child, filter))

// Last.scala
case Last(expressionExtractor(child), Literal(false, BooleanType)) =>
Some(aggregateWithFilter("ANY_VALUE", child, filter))

case _ => None
}
}
}
Loading

0 comments on commit b56a7c3

Please sign in to comment.