-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(flink): introduce Apache Flink backend #6408
Conversation
20720ea
to
7aef924
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This is off to a good start - I've left some comments below.
|
|
||
|
|
||
| def _count_star(translator: ExprTranslator, op: ops.Node) -> str: | ||
| return "count(*)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're inheriting the base registry, CountStar should already be handled - I don't believe you'll need to define a handler for this op here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it compiles into something like SELECT t0.`i`, count(1) AS `count` instead of count(*)
df8b339
to
cd9d433
Compare
|
Overall this looks good! I pushed up a commit fixing the lockfile changes - what version of Last thing - can you squash your commits? We generate our changelog from commit messages - specifically anything with |
|
@jcrist Thanks for helping fix the lockfile changes. I'm using poetry 1.2.2. Also squashed the commits. Lmk if there's anything else you'd like me to address! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thansk! 🚀
This PR looks to add support for Apache Flink (streaming) in Ibis.
Motivation
As data grows, developers want to do increasingly sophisticated data science, as well as reduce the end-to-end time to value. There has been an explosion of tools, libraries, and frameworks to democratize the performance and functionality needed to create increasingly complex data systems that address these new data science demands. That said, the combinatorics of tools has forced developers to spend an increasing amount of their time rewriting code or learning new APIs. Building and extending existing standards would ameliorate much of this pain by allowing developers to write code once and use it across a myriad of tools in the data tool chain. Ibis already works with numerous backends for batch analytics, and we want to extend Ibis into real-time systems, starting with Apache Flink.
Current State
Ibis is currently a batch-oriented library. All of the current supported backends derive from a batch paradigm (aside from Spark, which does offer support for stream processing, albeit using micro-batches underneath the hood).
Unlike batch systems, which operate on bounded data, streaming systems are designed with unbounded data in mind. In order to deal with an infinite data stream, streaming data systems operate with unique concepts such as “event time”, “processing time”, “watermark”, etc.
As streaming systems gain more use cases, there have been efforts to close the gap between batch and streaming. Flink SQL, for example, was born as a part of such effort and, through allowing users to write streaming engines in a SQL-like manner, have been vastly successful in that regard. The success of Flink SQL both validates the potential of stream and batch unification and inspires the community to push for better standards, a vision that Ibis is at a unique and valuable position to help build.
Adding support for a streaming engine like Flink, however, is non-trivial.
This PR
We would like to make the work to support Apache Flink & streaming in Ibis incremental. In the first stage of the implementation, we will focus on a "string-generating backend", where a top-level function (similar to
ibis.backends.clickhouse.compiler.core.translate) can be used to handle just the expr -> SQL compilation. Later on we will use this as part of the backend, when we're ready to add support for it.This PR introduces the aforementioned function and sets up the testing infrastructure via
pytest-snapshotwithout introducing new APIs for the time being.