-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(clickhouse): move translate_val
to toposorted compiler loop
#7209
refactor(clickhouse): move translate_val
to toposorted compiler loop
#7209
Conversation
26cb0e2
to
58449f6
Compare
034819a
to
d771fd4
Compare
op = op.replace( | ||
p.WindowFunction(p.Cumulative, ...) | ||
>> (lambda op, _: cumulative_to_window(op.func, op.frame)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern system is pretty darn sweet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You "should" be able to do the replacements in one pass by combining them into an AnyOf
pattern:
params = {param.op(): value for param, value in params.items()}
scalar_rule = p.ScalarParameter >> (
lambda op, _: ops.Literal(value=params[op], dtype=op.dtype)
)
window_rule = p.WindowFunction(p.Cumulative, ...) >> (
lambda op, _: cumulative_to_window(op.func, op.frame)
)
op = op.replace(scalar_rule | window_rule)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide a way to spell the replacement rule as a decorated function:
@replace(ops.ScalarParameter)
def substitute_param(op, ctx):
return ops.Literal(value=params[op], dtype=op.dtype)
Possibly worth a low-priority ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid building that until we have some more use cases as input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single-pass replacement worked 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will work until the replacement rules have distinct matchers because the first matching pattern gets applied.
return new_expr.op() | ||
|
||
|
||
def _translate_node(node, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove _translate_node
, translate_rel
and translate_val
in favor of a single flat overloaded function, e.g. translate()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think this adds unnecessary implicit behavior:
- I now have to deal with import cycles
- I now have to ensure that both
values
andrelations
translate
is imported so that all the rules are registered.
I don't really see what we gain by making these into a single rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put all the translate implementations into a single file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that they are separate now, it makes it easier to find things IMO.
) | ||
|
||
# apply translate rules in topological order | ||
results = op.map(fn, filter=(ops.TableNode, ops.Value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restricting the traversal is required or just performance optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried it without. I'd rather keep it as is, if only for readability. It's easier to see that the code should only work with these node types than if there were no filter, which would suggest it handles generic Node
instances.
a320152
to
a2b4eaf
Compare
*map(partial(translate_val, **kw), op.values), dialect="clickhouse" | ||
) | ||
@translate_rel.register | ||
def _dummy(op: ops.DummyTable, *, values, **_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the keyword-only marker *
from the functions? I usually find it more distracting than useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like it as an indicator of compiled inputs. I'd rather keep it if that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also further enforces the op structure during compilation, you can't have any rogue arguments.
The error messages are also much more readable with this when an argument is missing. With positional-or-keyword you don't get the name of the argument that was missing when you fail to pass it in whereas with keyword-only you do.
I think it's better than having it positional-or-keyword, even if the *
is a bit distracting.
ops.Multiply: "*", | ||
ops.Divide: "/", | ||
ops.Modulus: "%", | ||
ops.Add: operator.add, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We repeat these mappings (along with their string infix representation) at multiple places, possible we should set them directly on the operation classes. Should create a follow-up ticket for this.
9ec68ae
to
85a56b2
Compare
18dff95
to
3c4ca70
Compare
3c4ca70
to
9d09f8e
Compare
2c19f21
to
3bccfb1
Compare
…on for easier non-recursive compilation
3bccfb1
to
e2b4c61
Compare
def cumulative_to_window( | ||
func: ops.Cumulative, frame: ops.WindowFrame | ||
) -> ops.WindowFunction: | ||
klass = getattr(ops, func.__class__.__name__.replace("Cumulative", "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a bit fragile to me but fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't changed the name of cumulative operations ever ... maybe?
@@ -952,9 +949,6 @@ def agg(df): | |||
|
|||
|
|||
@pytest.mark.notimpl(["datafusion", "polars"], raises=com.OperationNotDefinedError) | |||
@pytest.mark.broken( | |||
["clickhouse"], reason="clickhouse returns incorrect results", raises=AssertionError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
alltypes[alltypes.int_col == 1] | ||
.limit(n) | ||
.int_col.collect() | ||
.map(lambda x: my_add(x, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty nice!
|
||
|
||
@pytest.fixture | ||
def translate(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see this gone!
# rewrite cumulative functions to window functions, so that we don't have | ||
# to think about handling them in the compiler, we need only compile window | ||
# functions | ||
replace_cumulative_ops = p.WindowFunction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like it!
op.value, | ||
ops.TableArrayView( | ||
ops.Selection( | ||
table=an.find_first_base_table(op.options), selections=(op.options,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should switch to Topmost
pattern to locate the base table, but in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like BaseTableOf = Topmost(ops.Relation)
c.ExistsSubquery(x) | ||
) | ||
|
||
op = op.replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting to have a ruleset here, great to see it in action!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement, thanks @cpcloud!
This PR moves
translate_val
to function the same way thattranslate_rel
does: inputs are compiled and then passed into
translate_val
instead oftranslation rules having to compile their own inputs.
A number interesting things come out of this:
much easier to follow IMO.
translate_val
recursively, inputs are compiled and then passed to functions.
for the op is simpler (window functions)
Additionally, I was able to remove most occurrences of
dialect="clickhouse"
and convert the clickhouse translate rules to use sqlglot functions or or other
constructs (
SQLStringView
andJoin
s need the dialect parameter).Not counting the newly generated SQL, the net lines of code are reduced as well: