参考主流开源SQL引擎,基于Spark SQL进行整理介绍
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
.gitignore
README.md
mkdocs.yml

README.md

Introduction

基于Spark SQL 2.x进行整理,参考主流分布式SQL计算引擎相关的开源项目,以下为主要参考的项目:

Spark SQL

  • Spark Core(RDD APIs)、Data Source Connectors
  • Catalyst Optimization、 Tungsten Execution
  • SparkSession、Dataset/DataFrame APIs、SQL
  • Structured Streaming、MLlib、GraphFrame、TensorFrames

Reference

  • Spark SQL: Spark SQL is Apache Spark's module for working with structured data.
  • Hive: The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
  • Presto: Distributed SQL Query Engine for Big Data.