-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Design] Presto-on-Spark: A Tale of Two Computation Engines #13856
Comments
excellent job! A unify entry for |
TODOs (for tracking purpose, keep updating):
|
As per earlier discussion, we decided to go with this name explicitly to emphasize that this module is only needed for the classloader isolation, and not for anything fundamental. Once Spark supports classloader isolation internally (or once it is migrated to Java 9+ that supports Java modules), this artificial module should be removed. |
I see. But I do see we might also want to put some common classes into |
What's the difference between doing this and sparksql? |
@wubiaoi : From user experience perspective, Presto-on-Spark will provide the exact language and semantic between interactive and batch. While both Presto and SparkSQL is ANSI-SQL compatible, note there is no “ANSI SQL” as a language: ANSI SQL is an (in some way loose) specification. Many SQL dialects are claimed to be ANSI SQL compatible (notably, Oracle, SQL Server and DB2), yet they are significantly incompatible with each other. As more details explained in this Quora answer:
Even the language and semantic can be exactly the same, Presto-on-Spark provides unified SQL experience for interactive and batch use case. The unified SQL experience means not only the SQL language and semantic is the same, but the experience should also be similar. This is because while SQL is originally designed to be a declarative language, in almost all practice, user depends on engine-specific implementation details, and use it as imperative language in some part, to get the best performance. The SQL experience includes, but not limited to:
I will explain the technical perspective in a separate comment :) |
@wubiaoi : From technical perspective, SparkSQL execution model is row-oriented + whole stage codegen[1], while Presto execution model is columnar processing + vectorization. So architecture-wise Presto-on-Spark will be more similar to the early research prototype Shark [2]. The design trade-offs between row-oriented + whole stage codegen vs. columnar processing + vectorization deserves a very long discussion , I will let @oerling to provide more insights :) . However, with modern Big Data where denormalization is omnipresent, we do see an ever-increasing value of columnar processing + vectorization [3] [1] Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop: https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html |
@wenleix 👍 Thank you very much for the explanation. |
@wubiaoi : While this is certainly possible, this complicates the execution a lot as it requires coordination between two (heterogeneous) execution engine. Also, why not use Presto Unlimited in this case? :) |
As I'm looking for HTTP service instead of spark-submit I can work on it. But now you get what I want to do, right? WDYT about it? |
Classic presto acts like a service, with an HTTP endpoint to fetch the results. Are you hitting the scalability wall with the classic presto?
I'm not sure if Spark even supports gradual fetching of the results. You can investigate it. But currently we are collecting results via the As a middle ground you can change your workload to slightly different
Generally speaking Presto on Spark is mostly designed to run insert queries, that's why we don't care much about returning the results. |
Presto on Spark allows changing the catalog for each query by creating Presto runner for each query, correct? Classic Presto does not support to load/unload catalogs: #12605. My main goal is to provide context (presto catalog) for classic Presto for each query. But in fact, we need to support a very high scale. And I found this project that seems to match my requirements. |
Could you please describe your usecase a little bit more? Maybe there's a better way to achieve this dynamic catalog behaviour? |
Considering millions of catalogs of different types (mysql, psql, ...). Thousands of clients. So a lot of queries. A client comes with its catalog and query on an HTTP service. This service sends to Presto/Presto-on-Spark (let's call it system) catalog list and query to run. Then the system should run the query and stream results through HTTP chunks to limit RAM usage if possible to answer to the client. This is the use case, it seems simple but its implementation is not. |
@KannarFr : From operation/service perspective, Presto-on-Spark is more like Spark. Thus in my opinion we should leverage what Spark provides for such service (instead of thinking it in the Presto coordinator way). |
@arhimondr @wenleix Is it possible to run multiple SQL queries in the query file? |
@djiangc Unfortunately no. But that should be an easy feature to add. |
@arhimondr @wenleix another question. It seems I can't use cluster deploy-mode on spark-submit for presto-spark-launcher, only client is supported. Is this true or am I missing something? |
@djiangc Yes, currently only the client mode is supported. |
@arhimondr thanks for your response. I have another question, can I do insert overwrite? |
@djiangc Currently the launcher doesn't support setting session properties. You must enable the Also it should be pretty easy to add parameter to the launcher that accepts session properties. |
many thanks for your help and pointer, I got the partition overwrite working with spark-presto-launcher @arhimondr |
@arhimondr I am not able to run insert in overwrite mode by setting above property. Is it not supported with s3? |
Hi, @arhimondr I understand the philosophy behind this sentence
but the insert needs a predefined destination table with a schema, format, location, right? As an AWS user, what I would find very usefull is to write the result of a presto-on-spark Maybe a CLI argument configuring the dataOutputLocation would do the trick ? |
@rguillome Hi! Thanks for reaching out. In our case we know the output schema in advance, thus we always ending up running |
Hi @arhimondr I was trying to
So basically I will try to push a MR with those current changes already made in trinodb I wonder if the the ultimate solution should'nt be an option to write each final split to a hdfs or S3 location directly to avoid the gathering at the driver level. We could imagine having all the benefits of Hadoop FS organisation (partionning, bucketing, sort and splits). But I'm not already cumfortable with all the details that It would involve to dig into this for now. |
Is the presto-on-spark's physical plan be applied by DynamicFilter(vs DynamicPartitionPrune) ? |
SparkSQL 3.0+ execution model is aslo columnar processing + vectorization |
It is ! |
@wenleix Hello, I have a question. Although the compatibility is increased, for queries with small amount of data, isn't the query speed slowed down after adding materialized shuffle? At the same time, I would like to ask how the improved Presto and sparksql compare in terms of a large amount of data? |
The idea is to run small queries on classic Presto, and run large (won't fit within memory limit) / long running queries (more likely to be affected by cluster stability issues) using Presto-on-Spark. |
@rongrong |
@rongrong Does this mean that if the user fails to execute through Presto and finds that the SQL is a large query, then submit it through Presto on spark? Does the user have a process of switching the submission method? |
As I know, these are totally two processes; You must develop the judge logic to decide whether it is a large query. |
use same sql. does the presto-on-spark use less memory and more times? |
Links/Resources:
Abstract
The architecture tradeoff between MapReduce and parallel database has been an open discussion since the dawn of MapReduce system over a decade ago. At Facebook, we have been spent past several years in scaling Presto to Facebook-scale batch workload.
Presto Unlimited aims at solving such scalability challenges. After revisiting the key architecture change (e.g. disaggregated shuffle) required to further scale Presto, we decided Presto-on-Spark as the path to further scale Presto. See the rest of the design doc for details.
We believe this is only a first step towards more confluence between the Spark and the Presto communities, and a major step towards enabling unified SQL experience between interactive and batch use cases.
Introduction
Presto was originally designed for interactive queries but has evolved into a unified engine for both interactive and batch use cases. Scaling an MPP architecture database to batch data processing over Internet-scale datasets is known to be an extremely difficult problem [1].
Presto Unlimited aims at solving such scalability challenges. To truly scale Presto Unlimited to Internet-scale batch workloads we need the following (excluding coordinator scaling and spilling):
We realized these work lays down the foundation for a general-purpose parallel data processing system, such as Spark, FlumeJava, Dryad. Note such data processing system has its own usage and well-defined programming abstraction, and requires years to mature.
We found Presto should leverage existing well-developed systems to scale to large batch workload, instead of “embedding” such a system inside Presto. We also believe such collaboration would help the whole Big Data community to better understand the abstraction between SQL engine and data processing system, as well as evolve and refine the execution primitives to provide near-optimal performance without sacrificing the abstractions.
We choose to leverage Spark as the parallel data processing system to further scale Presto Unlimited as it’s the most widely used open source system in this category. However, the design and architecture here should apply to any other parallel data processing system as well.
Architecture
Presto Planner needs to know it’s generating plan for Spark execution, and can thus reduce unnecessary nodes (e.g. LocalExchange)
On Spark worker, it includes:
Construct operator factory chain (a.k.a DriverFactory) through LocalExecutionPlanner
Instatinate driver by binding the input split, and run the driver
Send the data to a SparkOutputBuffer which will emit to Spark.
The text was updated successfully, but these errors were encountered: