Skip to content

hkustDB/SparkSQLPlus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparkSQL+

This is the repository for the demo paper SparkSQL+: Next-generation Query Planning over Spark.

Prerequisites

  • Java 1.8
  • Scala 2.12.10
  • Maven 3.8.6
  • Spark 3.0.1
  • HDFS(Optional) 2.7.7

Project Structure:

SparkSQL+ uses Maven multi-module for project organization and dependency management. It consists of the following submodules:

  • sqlplus-core: contains the parser, planner, and code generator of SparkSQL+.
  • sqlplus-lib: contains the library functions used by the Scala programs generated by SparkSQL+.
  • sqlplus-web: contains the web-based interface.
  • sqlplus-cli: contains the command line interface.
  • sqlplus-example: contains the implementation of built-in example queries.

Setup

Clone

Run git clone git@github.com:hkustDB/SparkSQLPlus.git.

Configurations

SparkSQL+ modes

  • Local mode
    • Use Local mode if the Spark Standalone Cluster is deploy on the same machine.
    • The HDFS is not needed in Local mode.
  • Remote mode
    • Use Remote mode if the Spark Standalone Cluster is deploy on different machines.
    • The jars will be uploaded to HDFS before execution.
    • The input data should be in HDFS before execution.

Local forwarding

  • Local port forwarding is necessary when the remote Spark and HDFS clusters are not directly accessible. In this case, enable the experiment.forwarding (see below).
  • Manually enable local port forwarding for the following ports(e.g., through SSH -L):
    • 6066
    • 7077
    • 8080
    • 8081
    • 9000
    • 50070
    • 50075

SparkSQL+ configurations

  • Run cp src/main/resources/application.yml.template src/main/resources/application.yml.
  • Edit src/main/resources/application.yml.
server:
  port: 8848

logging:
  config: classpath:log4j2-spring.xml

sqlplus:
  home: /Users/sqlplus/Projects/SparkSQLPlus    # the absolute path of SparkSQLPlus

experiment:
  mode: local                                   # local or remote
  forwarding: false                             # whether using local port forwarding
  spark:
    master:
      host: localhost
      port: 7077
      submission:
        port: 6066
      web-ui:
        port: 8080
    driver:
      memory: 4g
      cores: 1
    executor:
      memory: 4g
      cores: 1
    default:
      parallelism: 1
  timeout: 300                                  # timeout setting, in seconds 
  hdfs:
    host: localhost
    post: 50070
    path: /Users/sqlplus                        # base path for the uploaded jars
    user: sqlplus
  data:
    path: /Users/sqlplus/data                   # base path for the input data
  result:
    type: web-ui

Spark configurations

  • Edit $SPARK_HOME/conf/spark-defaults.conf and add spark.master.rest.enabled true to the end.
  • Edit $SPARK_HOME/conf/log4j.properties and add the following configurations.
log4j.logger.SparkSQLPlusExperiment=INFO, SparkSQLPlus
log4j.appender.SparkSQLPlus=org.apache.log4j.ConsoleAppender
log4j.appender.SparkSQLPlus.target=System.out
log4j.appender.SparkSQLPlus.layout=org.apache.log4j.PatternLayout
log4j.appender.SparkSQLPlus.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

Build

Run mvn clean package.

Download data

Run bash examples/data/download.sh to download a graph from SNAP. It is also possible to use other input data as long as the columns are separated by commas.

Usage

The example queries are in examples/query.

Command line interface

sparksql-plus compiles the input SQL file into SparkSQL+ code.

syntax: sparksql-plus [OPTIONS] <query>
  options:
     -d,--ddl <path>           Set the path to the ddl file.
     -h,--help                 Show the help message.
     -n,--name <object name>   Set the object name for the output object.
     -o,--output <path>        Set the path to the output file.
     -p,--pkg <package name>   Set the package name for the output object.

The following command generates SparkSQL+ code for examples/query/q1.

./bin/sparksql-plus -d examples/query/q1/graph.ddl -o examples/query/q1/q1.scala examples/query/q1/query.sql

Web-based interface

Start the application

Run java -jar sqlplus-web/target/sparksql-plus-web-jar-with-dependencies.jar.

Access the web

Visit http://localhost:8848/ in the browser.

Compile a query

  • Submit a query.
  • Select a candidate.
  • Persist the generated SparkSQL+ code.

Run experiments

Click the Experiment tab on the top and submit the experiments. The result will be fetched and displayed automatically.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages