Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
16 lines (11 sloc) 950 Bytes

Introduction

基于Spark SQL 2.x进行整理,参考主流分布式SQL计算引擎相关的开源项目,以下为主要参考的项目:

Spark SQL

  • Spark Core(RDD APIs)、Data Source Connectors
  • Catalyst Optimization、 Tungsten Execution
  • SparkSession、Dataset/DataFrame APIs、SQL
  • Structured Streaming、MLlib、GraphFrame、TensorFrames

Reference

  • Spark SQL: Spark SQL is Apache Spark's module for working with structured data.
  • Hive: The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
  • Presto: Distributed SQL Query Engine for Big Data.