New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OVER, PARTITION BY and WINDOW #2618

Merged
merged 1 commit into from Sep 25, 2018

Conversation

Projects
None yet
4 participants
@Anber
Contributor

Anber commented Jul 13, 2018

Hi

PR implements window functions for PostgreSQL.
It can be used instead of fragments for the more clear and flexible solution of problems like coderplanets/coderplanets_server#16 or absinthe-graphql/absinthe_relay#100 (comment)

# SELECT row_number(s0."x") OVER () FROM "schema" AS s0
Schema |> select([r], row_number(r.x) |> over)

# SELECT count(s0."x") OVER w FROM "schema" AS s0 WINDOW w AS (PARTITION BY s0."x" ORDER BY s0."a", s0."b")
from s in Schema,
  window: (w as partition_by s.x, order_by: [s.a, s.b]),
  select: (count(s.x) |> over(w))

# SELECT count(s0."x") OVER (PARTITION BY s0."x" ORDER BY s0."y") FROM "schema" AS s0
partition = partition_by([r], r.x) |> order_by([r], r.y)
Schema |> select([r], count(r.x) |> over(^partition))

MySQL also supports window functions since 8.0, but we don't know version at compile time. So I can add the same implementation, but it will throw runtime errors if a user uses MySQL <8, or I can add compile-time errors like "… is not supported by MySQL".

If you are ok with this feature, I will add documentation and some more unit tests.

  • traverse of :windows
  • inspect
  • MySQL
  • tests
  • documentation
@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Jul 13, 2018

Member

This looks amazing! We are currently focused on getting Elixir v1.7 out but we plan to review this in a week or two. Thanks for this great work!

MySQL also supports window functions since 8.0, but we don't know version at compile time. So I can add the same implementation, but it will throw runtime errors if a user uses MySQL <8, or I can add compile-time errors like "… is not supported by MySQL".

For MySQL, we should emit the correct SQL and then it will fail if they are using an old version.

Member

josevalim commented Jul 13, 2018

This looks amazing! We are currently focused on getting Elixir v1.7 out but we plan to review this in a week or two. Thanks for this great work!

MySQL also supports window functions since 8.0, but we don't know version at compile time. So I can add the same implementation, but it will throw runtime errors if a user uses MySQL <8, or I can add compile-time errors like "… is not supported by MySQL".

For MySQL, we should emit the correct SQL and then it will fail if they are using an old version.

@zachdaniel

This comment has been minimized.

Show comment
Hide comment
@zachdaniel

zachdaniel Jul 13, 2018

Contributor

I want to go over this a little more in depth, but this all looks pretty great! Very excited about this feature. (Also hi everyone! I still read all the issues, I'm just constantly busy)

Contributor

zachdaniel commented Jul 13, 2018

I want to go over this a little more in depth, but this all looks pretty great! Very excited about this feature. (Also hi everyone! I still read all the issues, I'm just constantly busy)

Show outdated Hide outdated lib/ecto/query/builder.ex Outdated
@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Jul 14, 2018

Member

Hi @Anber! I did some review of the codebase and of the feature and it looks great! The over/2 feature is excellent.

I have only two comments for now. You added this example:

partition = partition_by([r], r.x) |> order_by([r], r.y)
Schema |> select([r], count(r.x) |> over(^partition))

This feature (interpolation) is actually really hard to implement. That's because if order_by has any interpolation inside itself, we need to surface those parameters to the outer select in a separate pass. This will require changes in the planner, inspect, and other places around Ecto.

However, we do have another feature that is very similar: the window function. Both the interpolation syntax that you propose and the Window function have the same purpose, which is to avoid duplication and allow some sort of composition:

The purpose of a WINDOW clause is to specify the behavior of window functions appearing in the query's SELECT List or ORDER BY Clause. These functions can reference the WINDOW clause entries by name in their OVER clauses. A WINDOW clause entry does not have to be referenced anywhere, however; if it is not used in the query it is simply ignored. It is possible to use window functions without any WINDOW clause at all, since a window function call can specify its window definition directly in its OVER clause. However, the WINDOW clause saves typing when the same window definition is needed for more than one window function.

My suggestion is the following: remove partition_by as a function in Ecto.Query because making it work as an interpolation under all scenarios will be extremely complex and ultimately undesired.

Member

josevalim commented Jul 14, 2018

Hi @Anber! I did some review of the codebase and of the feature and it looks great! The over/2 feature is excellent.

I have only two comments for now. You added this example:

partition = partition_by([r], r.x) |> order_by([r], r.y)
Schema |> select([r], count(r.x) |> over(^partition))

This feature (interpolation) is actually really hard to implement. That's because if order_by has any interpolation inside itself, we need to surface those parameters to the outer select in a separate pass. This will require changes in the planner, inspect, and other places around Ecto.

However, we do have another feature that is very similar: the window function. Both the interpolation syntax that you propose and the Window function have the same purpose, which is to avoid duplication and allow some sort of composition:

The purpose of a WINDOW clause is to specify the behavior of window functions appearing in the query's SELECT List or ORDER BY Clause. These functions can reference the WINDOW clause entries by name in their OVER clauses. A WINDOW clause entry does not have to be referenced anywhere, however; if it is not used in the query it is simply ignored. It is possible to use window functions without any WINDOW clause at all, since a window function call can specify its window definition directly in its OVER clause. However, the WINDOW clause saves typing when the same window definition is needed for more than one window function.

My suggestion is the following: remove partition_by as a function in Ecto.Query because making it work as an interpolation under all scenarios will be extremely complex and ultimately undesired.

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Jul 14, 2018

Contributor

@josevalim,
I hope I've fixed all the mentioned issues, but maybe I didn't catch your idea about partition_by. I've removed it from Ecto.Query, but this syntax is still available:

fields = [:y, :z]
Schema |> select([r], count(r.x) |> over(partition_by(r.x, order_by: ^fields)))
Contributor

Anber commented Jul 14, 2018

@josevalim,
I hope I've fixed all the mentioned issues, but maybe I didn't catch your idea about partition_by. I've removed it from Ecto.Query, but this syntax is still available:

fields = [:y, :z]
Schema |> select([r], count(r.x) |> over(partition_by(r.x, order_by: ^fields)))
Show outdated Hide outdated lib/ecto/query/builder.ex Outdated
Show outdated Hide outdated lib/ecto/query/builder.ex Outdated
Show outdated Hide outdated lib/ecto/query/inspect.ex Outdated
Show outdated Hide outdated lib/ecto/query/inspect.ex Outdated
@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Jul 14, 2018

Member

Hi @Anber! I have done a more careful review now. They are mostly minor details but there is one big issue: you have not changed Ecto.Query.Planner to traverse :windows. The main consequence of not doing this is that any binding inside :windows won't be considered. So I think you should write some queries that has parameters inside the window, maybe something silly in the order_by field, such as order_by: [desc: ^param].

In order to make this work, there are a couple things you need to have in mind:

  1. You need to make sure that you are consistent in the order things are processed. So if you change the planner to traverse windows immediately after from, then the drivers need to do it exactly after from too. See traverse_exprs in the planner.

  2. I am afraid the fact you are using order_bys inside PartitionByExpr is going to be more trouble than help. That's because order_bys will have their own bindings and you need to make sure they are traversed and changed accordingly. It is probably better if PartitionByExpr simply calls OrderBy.escape to do the escape work but you keep them as regular ASTs inside PartitionByExpr and not as full blown Ecto.Query.QueryExpr.

  3. Btw, regarding the point above, you will also need to change PartitionByExpr.build to return the PartitionByExpr.build to receive params and return params, as you will no longer be able to keep params in the inner order_by expression. This will also require you change the Ecto.Query.Builder.Windows to have an Ecto.Query.QueryExpr at the root. I would recommend you to take a look at Ecto.Query.Builder.OrderBy.

Although I wrote them as 1 -> 2 -> 3, probably the best order of tackling them is 3 -> 2 -> 1, as you need 3 to build 2 and so on.

I am really excited about this! If you have any questions, please let me know!

Member

josevalim commented Jul 14, 2018

Hi @Anber! I have done a more careful review now. They are mostly minor details but there is one big issue: you have not changed Ecto.Query.Planner to traverse :windows. The main consequence of not doing this is that any binding inside :windows won't be considered. So I think you should write some queries that has parameters inside the window, maybe something silly in the order_by field, such as order_by: [desc: ^param].

In order to make this work, there are a couple things you need to have in mind:

  1. You need to make sure that you are consistent in the order things are processed. So if you change the planner to traverse windows immediately after from, then the drivers need to do it exactly after from too. See traverse_exprs in the planner.

  2. I am afraid the fact you are using order_bys inside PartitionByExpr is going to be more trouble than help. That's because order_bys will have their own bindings and you need to make sure they are traversed and changed accordingly. It is probably better if PartitionByExpr simply calls OrderBy.escape to do the escape work but you keep them as regular ASTs inside PartitionByExpr and not as full blown Ecto.Query.QueryExpr.

  3. Btw, regarding the point above, you will also need to change PartitionByExpr.build to return the PartitionByExpr.build to receive params and return params, as you will no longer be able to keep params in the inner order_by expression. This will also require you change the Ecto.Query.Builder.Windows to have an Ecto.Query.QueryExpr at the root. I would recommend you to take a look at Ecto.Query.Builder.OrderBy.

Although I wrote them as 1 -> 2 -> 3, probably the best order of tackling them is 3 -> 2 -> 1, as you need 3 to build 2 and so on.

I am really excited about this! If you have any questions, please let me know!

@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Jul 14, 2018

Member

I have updated the list above. :)

Member

josevalim commented Jul 14, 2018

I have updated the list above. :)

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Jul 15, 2018

Contributor

It turned out to be a bit more complicated than I thought :)
I have added the checklist and I will try to finish it in several days.

Contributor

Anber commented Jul 15, 2018

It turned out to be a bit more complicated than I thought :)
I have added the checklist and I will try to finish it in several days.

Show outdated Hide outdated lib/ecto/query/builder.ex Outdated
Show outdated Hide outdated lib/ecto/query/builder.ex Outdated
Show outdated Hide outdated lib/ecto/query/builder.ex Outdated
@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Jul 18, 2018

Contributor

Hi @josevalim!

They are mostly minor details but there is one big issue

I hope I have fixed it, but I don't like how I escape windows. Typically macros are injected to the AST on escape stage for expanding to %QueryExpr{} on the later stages, but I don't have a partition_by macro, so I just return %QueryExpr{} from escape_window. It works, but I can't test it clearly in builder_test.exs.

Contributor

Anber commented Jul 18, 2018

Hi @josevalim!

They are mostly minor details but there is one big issue

I hope I have fixed it, but I don't like how I escape windows. Typically macros are injected to the AST on escape stage for expanding to %QueryExpr{} on the later stages, but I don't have a partition_by macro, so I just return %QueryExpr{} from escape_window. It works, but I can't test it clearly in builder_test.exs.

@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Jul 18, 2018

Member

@Anber looks great to me! Now we need to go ahead with MySQL support and docs. :)

Member

josevalim commented Jul 18, 2018

@Anber looks great to me! Now we need to go ahead with MySQL support and docs. :)

@Anber Anber changed the title from OVER,PARTITION BY and WINDOW to OVER, PARTITION BY and WINDOW Jul 19, 2018

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Jul 19, 2018

Contributor

Done!

Contributor

Anber commented Jul 19, 2018

Done!

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Jul 20, 2018

Contributor

Hi @josevalim
Now we have an ability to fix preloader behavior which mentioned in the note here, but it will be a breaking change. May I fix it?

Contributor

Anber commented Jul 20, 2018

Hi @josevalim
Now we have an ability to fix preloader behavior which mentioned in the note here, but it will be a breaking change. May I fix it?

@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Jul 20, 2018

Member

@Anber shall we discuss it in another issue? I also think this behaviour may been fixed on master.

Btw, we should merge this PR next week, thanks! :)

Member

josevalim commented Jul 20, 2018

@Anber shall we discuss it in another issue? I also think this behaviour may been fixed on master.

Btw, we should merge this PR next week, thanks! :)

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Jul 20, 2018

Contributor

@josevalim It definitely will be another issue and another PR :)

Contributor

Anber commented Jul 20, 2018

@josevalim It definitely will be another issue and another PR :)

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Jul 22, 2018

Contributor

Hi @josevalim
When you said that we should merge this PR, did you mean that I forgot to do something? :)

Contributor

Anber commented Jul 22, 2018

Hi @josevalim
When you said that we should merge this PR, did you mean that I forgot to do something? :)

@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Jul 22, 2018

Member

@Anber it is blocking on us right now. No worries. :)

Member

josevalim commented Jul 22, 2018

@Anber it is blocking on us right now. No worries. :)

@hauleth

This comment has been minimized.

Show comment
Hide comment
@hauleth

hauleth Jul 23, 2018

@Anber damn you. This is awesome PR and obsoletes all my work that I have done on supporting window functions via fragments. Now I need to focus on adding grouping sets to Ecto.

hauleth commented Jul 23, 2018

@Anber damn you. This is awesome PR and obsoletes all my work that I have done on supporting window functions via fragments. Now I need to focus on adding grouping sets to Ecto.

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Jul 23, 2018

Contributor

@hauleth, we also tried to use fragments, but then we faced this issue.

Contributor

Anber commented Jul 23, 2018

@hauleth, we also tried to use fragments, but then we faced this issue.

@hauleth

This comment has been minimized.

Show comment
Hide comment

hauleth commented Jul 23, 2018

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Jul 23, 2018

Contributor

@hauleth btw, I haven't implemented OVER (ORDER BY …) and OVER (RANGE …) yet. This PR became too big, so I decided to do it in another one.

Contributor

Anber commented Jul 23, 2018

@hauleth btw, I haven't implemented OVER (ORDER BY …) and OVER (RANGE …) yet. This PR became too big, so I decided to do it in another one.

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Aug 16, 2018

Contributor

Hi @josevalim!
Do you have any news? :)

Contributor

Anber commented Aug 16, 2018

Hi @josevalim!
Do you have any news? :)

@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Sep 25, 2018

Member

I have started working on this on a branch and it should be merged later today. :)

Member

josevalim commented Sep 25, 2018

I have started working on this on a branch and it should be merged later today. :)

{expr_cache, {params, cacheable?}} =
Enum.map_reduce exprs, {params, true}, fn {_, expr}, {params, cacheable?} ->
{params, current_cacheable?} = cast_and_merge_params(:windows, query, expr, params, adapter)
{expr_to_cache(expr), {params, cacheable? and current_cacheable?}}

This comment has been minimized.

@josevalim

josevalim Sep 25, 2018

Member

Key must be part of the cache.

@josevalim

josevalim Sep 25, 2018

Member

Key must be part of the cache.

@@ -1023,6 +1049,29 @@ defmodule Ecto.Query.Planner do
{type, [expr | fields], from}
end
# OVER ()
defp collect_fields({:over, _, [call, nil]} = expr, fields, from, query, take) do

This comment has been minimized.

@josevalim

josevalim Sep 25, 2018

Member

We should probably make this arity 1.

@josevalim

josevalim Sep 25, 2018

Member

We should probably make this arity 1.

escape(quote(do: lag(:a, 1)), [], __ENV__)
end
assert_raise Ecto.Query.CompileError, ~r"window function lag/0 is undefined.", fn ->

This comment has been minimized.

@josevalim

josevalim Sep 25, 2018

Member

Remove trailing dot.

@josevalim

josevalim Sep 25, 2018

Member

Remove trailing dot.

assert all(query) == ~s{SELECT count(s0."x") OVER "w2" FROM "schema" AS s0 WINDOW "w1" AS (PARTITION BY s0."a"), "w2" AS (PARTITION BY s0."x" ORDER BY s0."a", s0."b" DESC)}
end
test "count over unknown window" do

This comment has been minimized.

@josevalim

josevalim Sep 25, 2018

Member

This should be mover to the planner suite.

@josevalim

josevalim Sep 25, 2018

Member

This should be mover to the planner suite.

assert all(query) == ~s{SELECT count(s0.`x`) OVER `w2` FROM `schema` AS s0 WINDOW `w1` AS (PARTITION BY s0.`a`), `w2` AS (PARTITION BY s0.`x` ORDER BY s0.`a`, s0.`b` DESC)}
end
test "count over unknown window" do

This comment has been minimized.

@josevalim

josevalim Sep 25, 2018

Member

This should be mover to the planner suite.

@josevalim

josevalim Sep 25, 2018

Member

This should be mover to the planner suite.

@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Sep 25, 2018

Member

Thank you @Anber! I have added some notes but I can tackle them locally as they are minor. Amazing work overall!

Member

josevalim commented Sep 25, 2018

Thank you @Anber! I have added some notes but I can tackle them locally as they are minor. Amazing work overall!

@josevalim josevalim merged commit 00dbf3e into elixir-ecto:master Sep 25, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Sep 25, 2018

Member

❤️ 💚 💙 💛 💜

Member

josevalim commented Sep 25, 2018

❤️ 💚 💙 💛 💜

@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Sep 25, 2018

Member

Hi @Anber! I have noticed that SQL databases allow you to specify order_by without a partition_by, so for consistency, i have converted all of them into keyword lists. So now you specify:

windows: [w: [partition_by: table.x, order_by: table.y]]

This looks very nice with over:

over(agv(table.salary), partition_by: table.x, order_by: table.y)

So if you want to go ahead and add support for ranges, the infrastructure is there and we can start this discussion. The only thing left from this PR is to allow dynamic fields in partition_by and improve the coverage in the integration tests.

Member

josevalim commented Sep 25, 2018

Hi @Anber! I have noticed that SQL databases allow you to specify order_by without a partition_by, so for consistency, i have converted all of them into keyword lists. So now you specify:

windows: [w: [partition_by: table.x, order_by: table.y]]

This looks very nice with over:

over(agv(table.salary), partition_by: table.x, order_by: table.y)

So if you want to go ahead and add support for ranges, the infrastructure is there and we can start this discussion. The only thing left from this PR is to allow dynamic fields in partition_by and improve the coverage in the integration tests.

@Anber

This comment has been minimized.

Show comment
Hide comment
@Anber

Anber Sep 26, 2018

Contributor

Hi @josevalim!

It looks like you've rewritten almost whole my commit :)
For bare ORDER BY I planned to use syntax like over(order_by(…)) but keyword list looks more consistent.

So, now I need some time to rewrite our application from fork to master. It will take some hours and if everything is ok, I will start to implement ranges.

Contributor

Anber commented Sep 26, 2018

Hi @josevalim!

It looks like you've rewritten almost whole my commit :)
For bare ORDER BY I planned to use syntax like over(order_by(…)) but keyword list looks more consistent.

So, now I need some time to rewrite our application from fork to master. It will take some hours and if everything is ok, I will start to implement ranges.

@Anber Anber deleted the Anber:window_functions branch Oct 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment