-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common Table Expression support added "out-of-the-box" #37944
Conversation
@kaspth not sure how could I miss that. I have pushed also fix for update_all/delete_all in my branch and it is missing recursive for now (to keep the PR simple). @vlado would you mind if I port some of your changes into my PR, mark you as an author and continue in there? You have really nice documentation and also I can port some tests. |
Hi @simi and @kaspth thanks for the feedback. What a coincidence that we worked on same thing same time :) I forgot to mention in PR description that I already added support for recursive expressions I agree that we co-author this but I think it is easier to finish it here. From what I see only thing missing is @kaspth What do you think? |
@vlado the recursive api looks a little weird for me. I don't think there's any other AR relation method which uses "argument as a modifier of behaviour". Therefore I have decided in my PR to not introduce it for now to not block basic implementation and open another PR later once the first one get merged. Comparing our code changes I have spotted those:
Finally if you can prepare failing specs for |
Hi @simi, thanks for the constructive feedback. Replies are below.
Hm, you could be right here. I also can not think of any AR relation method at this moment. I initially started with
Looks like a good idea 👍
Arrays are still supported but only if all items are Regarding Hash changes, I think there was an issue with I never used https://github.com/DavyJonesLocker/postgres_ext or https://github.com/kmurph73/ctes_in_my_pg in any of my apps (i was using my own patch) but I used them as inspiration when starting this PR.
Sure, I'll add failing tests. |
Post.with(
posts_with_comments: Post.where("comments_count > ?", 0),
posts_with_tags: Post.where("tags_count > ?", 0)
) I'm not too keen on this API. Seems verbose, yet still fairly low level. I haven't used CTEs before but just realized a case where I could, what's the use cases you've needed them for? And how do you act on the values? I'm guessing What I'm saying is since Rails is an extracted, not built, framework, I'd prefer to see full production samples (with renames or some business logic stripped if needed). It seems like |
@kaspth CTEs are super useful since they work as a "temporary table". Our codebase is full of CTEs and this will make it much nicer, since we usually need to fallback to just executing SQL string written in heredoc. Also queries using CTE are often more readable. This will allow to compose those queries based on AR scopes instead of messing with SQL strings (we usually interpolate SQL string with relation.to_sql). Yes, I agree all examples in this (and my PR) are really simple and not real reason to use CTE. I think there are articles around describing CTE and how/when to use them. On the other side Rails active record tests are full of examples not really making sense, but doing great job testing the feature. I understand this not really well known feature of SQL (mainly for beginners). I have learned about that just few months ago. Introducing this feature with nice and friendly guide explaning purpose of this Active Record API will be great service to help people understanding CTEs.
Just one quick example I do remember is we need to group and filter data and then join them with belongs_to relation. I'll try to prepare some examples with current code and how it will work with current pull request to show you some real examples extracted from real application. ℹ️ This is also super useful in combination with recently introduced bulk upserts in combination with RETURNING (which did not make it yet to Rails codebase). |
Great questions @kaspth and you are guessing right :) I'll try to explain with one example ExampleAt BetterDoc we have a 60+ lines SQL query that we use to dynamically calculate clinic rank based on some params (medical problem, geo location and radius, ...). DomainWe have 2 models in our domain. class Clinic
has_many :case_numbers
end
class CaseNumber
belongs_to :clinic
end SQL QueryHere is a full SQL that is explained below. WITH relevant_clinics AS (
SELECT clinics.id, clinics.name, SUM(case_numbers.count) AS cases_count
FROM clinics INNER JOIN case_numbers ON case_numbers.clinic_id = clinics.id
WHERE case_numbers.code ILIKE 'F4%'
GROUP BY clinics.id
),
stats AS (
SELECT
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY cases_count DESC) AS quartile_2,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY cases_count DESC) AS quartile_1
FROM relevant_clinics
)
SELECT name, cases_count,
CASE
WHEN cases_count > (SELECT quartile_2 FROM stats) THEN 'High'
WHEN cases_count > (SELECT quartile_1 FROM stats) THEN 'Average'
ELSE 'Low'
END AS rank
FROM relevant_clinics
ORDER BY cases_count DESC First step is to get clinics that match provided medical topic and store them as temporary result set ( How to organise this in RailsSince ActiveRecord does not have support for CTE we started thinking how to organise this in our app. We started by putting SQL into heredoc and executing it with Next try was with In the end I wrote a monkey patch that adds class Clinic < ApplicationRecord
has_many :case_numbers
scope :matches_medical_topic, -> (q) { joins(:case_numbers).where("case_numbers.code ILIKE ? "#{q}%") }
def self.search(q)
relevant_clinics = matches_medical_topic(q)
.select("cllinics.id, clinics.name, sum(case_numbers.count) AS cases_count
.group("clinics.id")
stats = select(
"PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY cases_count DESC) AS quartile_2",
"PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY cases_count DESC) AS quartile_1"
)
.from("relevant_clinics")
with(relevant_clinics: relevant_clinics, stats: stats)
.from("relevant_clinics AS clinics")
.order("cases_count DESC")
end
def self.with_rank
select(:id, :name, :cases_count)
.select("CASE WHEN cases_count > (SELECT quartile_2 FROM stats) THEN 'High' WHEN cases_count > (SELECT quartile_1 FROM stats) THEN 'Average' ELSE 'Low' END as rank")
end
end and use it like: Clinic.search("f4").with_rank # => ActiveRecord::Relation
Clinic.search("f4").count # => Integer Since there are gems to do same thing I assumed that other people would like to see this in Rails so I started this PR. ConclusionYou can see from my example that I used I absolutely agree that this is a bit too verbose and low level but I haven't been able to figure out better API yet. I'm eager to hear your ideas and to make it more "geared" :) Maybe the way to go could would be to have something similar to associations and/or scopes that once defined will abstract everything for the user. Similar to the way class Post
has_many :case_numbers
common_table :relevant_clinics, -> (q) { join(:case_numbers).where("code = ?", q) }
common_table :stats, -> { select(...).from(:relevant_clinics) }
end
Post.with(:relevant_clinics, :stats) |
I see now that I forgot about UNION example. You can find one in tests here https://github.com/rails/rails/pull/37944/files#diff-25e9a621b650cd62a407c4ce7a117b20R77-R86 Not a real life example but it is getting late here in Croatia :) |
@vlado thanks for very detailed info. Our usecase and story is really similar, but in the end I have used https://github.com/kmurph73/ctes_in_my_pg as a @kaspth my idea was to introduce this I think demand for Few examples including which strategy is used to integrate CTE to ActiveRecord (just selected from first search page):
PS: I was thinking about gemifying this as well in some general way (not supporting only PostgreSQL), but monkeypatching of private methods from ActiveRecord and Arel would be needed. That was the main motivation to send my pull request to integrate this into core. |
I found out about
Totally agree. We are using CTEs with Rails for more then 4 years now and it is a pain from the beginning. I was also giving talks about this topic (organising complex SQL queries in Rails) few times and from the feedback regarding CTE I was actually surprised that this was not implemented yet and decided to do it properly this time :) |
a04212b
to
68dedf9
Compare
That's pretty much the opposite of what I'm curious to explore here or how I'd proceed. I'd rather we have as many real world cases to base this off of and extract the commonalities. Then if that looks like something that's worth it, we can consider adding that and some lower level constructs. As it is I don't think there's enough meat here to add For instance the
Sure, unfortunately that doesn't necessarily mean that we want to maintain something in Active Record directly. Then we'd be looking at APIs for gems instead, of which the current proposed @vlado I'm curious about your usage of Another thing that's tough for me, although I like the |
@kaspth just try to take a look at this feature from different perspective.
https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL#Common_table_expression My motivation to support Sometimes you can avoid
Yes, but I think this is something you need to handle carefully already if you use more
Supporting this in gem is really hard since
CTEs are not definitely useful just for one off queries. We use them for regular queries for various reasons already mentioned before. If you would like to not continue in this review I would like to thank you anyway for your time spent already on this. I can try to bring some attention here to get someone another to review if this feature is not actually finally rejected by that decision. PS: Originally I have forgotten to mention that it is probably possible (I'm not sure if that will be possible for all adapters) to fix some problems using |
@kaspth I'm aliasing relevant_clinics as clinics here to be able to easily chain and call other query methods later. For example to do something like: Clinic.search("f4").with_rank.where(id: [1, 2, 3])
# WITH relevant_clinics AS (...) SELECT ... FROM relevant_clinics clinics WHERE clinics.id IN (1,2,3) As mentioned you could also use join
Agreed, I also did not liked it but it was good enough :) Really appreciate your participation on this PR and would not want you to quit. You asked great questions and I like your thinking. I would say that biggest challenge (at least it was for us) is that you have expression(s) that builds your temporary table(s) and then you have multiple ways you want to use them (select without rank, select with rank, aggregate, call some additional where, ...) which makes it hard to put all this things together in one abstraction. That is way I proposed the That being said I must also agree with @simi that just adding PS. I'm sorry for a late reply. Was more focused on family stuff for holidays ☃️ |
hash.map do |name, value| | ||
expression = | ||
case value | ||
when Arel::Nodes::SqlLiteral then Arel.sql("(#{value})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a Grouping node to add parens, don't re-create the SqlLiteral by modifying its string form.
else | ||
raise ArgumentError, "Unsupported argument type: `#{value}` #{value.class}" | ||
end | ||
Arel::Nodes::TableAlias.new(expression, Arel.sql(name.to_s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the specific node this should be off the top of my head, but it definitely should not inject the hash key as raw SQL; name
should end up table/column quoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
with_statements = with_values.map do |with_value| | ||
case with_value | ||
when Arel::Nodes::As, Arel::Nodes::TableAlias then with_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind keeping this if e.g. your planned use case really works a lot better if it's allowed, but I don't think we should retain undocumented extra argument options just because it might be internally useful later: if it is, we can add it then; adding it early makes life harder later because if it exists, someone will call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 464d5ac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 464d5ac
Ah forgot to mention, you'll have to squash all these 24 commits into a single one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah forgot to mention, you'll have to squash all these 24 commits into a single one.
Sure. Just addressed feedback from @matthewd and will squash if all is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperisfine Squashed and tests are green.
Also the CI failures look legit now. |
Construct common table expressions with ease and get `ActiveRecord::Relation` back.
Thank you @vlado. Feel free to open a second PR for recursive support. |
Upstream PR: rails/rails#37944
Upstream PR: rails/rails#37944
Upstream PR: rails/rails#37944
Upstream PR: rails/rails#37944
Just an unsolicited comment on this thread for posterity on the syntax and need for recursive CTEs.
|
…iveRecordAllMethod` Follow up rails/rails#44446, rails/rails#37944, rails/rails#46503, and rails/rails#47010. This PR supports some Rails 7.1's new querying methods for `Rails/RedundantActiveRecordAllMethod`.
…iveRecordAllMethod` Follow up rails/rails#44446, rails/rails#37944, rails/rails#46503, and rails/rails#47010. This PR supports some Rails 7.1's new querying methods for `Rails/RedundantActiveRecordAllMethod`.
Support for the `.with` query method was added in rails#37944 The hash based syntax added in that PR supports Relations, Arel SelectManagers, and SqlLiterals as values, but only allows simple names for the CTEs. In some circumstances, it's useful to be able to name the columns of the CTE, as well as the CTE itself. For example, if the expression is a ValuesList: WITH magic(a, b, c) AS (VALUES (2, 6, 7), (9, 5, 1), (4, 3, 8)) SELECT a + b + c FROM magic; ... we need to be able to name the columns in order to reference them later. One use case for having this supported in ActiveRecord's with query method is joining the results from a previous query to a subsequent one. For example, if you were walking through an org chart, but you couldn't use recursive CTEs, you could use this approach to keep information about the management path: big_bosses = Employee.where(manager: nil).pluck(id) first_level = Employee.with(Arel::Nodes::Cte.new( Arel.sql("bosses(id)"), Arel::Nodes::Grouping.new(Arel::Nodes::ValuesList.new(big_bosses)) ) .joins("INNER JOIN employees ON emplooyees.manager = bosses.id") .pluck("employees.id", "bosses.id") second_level = Employee.with(Arel::Nodes::Cte.new( Arel.sql("bosses(id, grand_boss_id)"), Arel::Nodes::Grouping.new(Arel::Nodes::ValuesList.new(second_level)) ) .joins("INNER JOIN employees ON emplooyees.manager = bosses.id") .pluck("employees.id", "bosses.id", "bosses.grand_boss_id") ... and so on ... (not tested) I'm not sure about this use case particularly, but I imagine there are a few other situations where it's useful to have more control over the name of the CTE than we currently have.
Support for the `.with` query method was added in rails#37944 The hash based syntax added in that PR supports Relations, Arel SelectManagers, and SqlLiterals as expressions, but only allows simple names for the CTEs. In some circumstances, it's useful to be able to name the columns of the CTE as well as the table. For example, if the expression is a ValuesList: WITH magic(a, b, c) AS (VALUES (2, 6, 7), (9, 5, 1), (4, 3, 8)) SELECT a + b + c FROM magic; ... we need to be able to name the columns in order to reference them later. One use case for having this supported in ActiveRecord's with query method is joining the results from a previous query to a subsequent one. For example, if you were walking through an org chart, but you couldn't use recursive CTEs, you could use this approach to keep information about the management path: big_bosses = Employee.where(manager: nil).pluck(id) first_level = Employee.with(Arel::Nodes::Cte.new( Arel.sql("bosses(id)"), Arel::Nodes::Grouping.new(Arel::Nodes::ValuesList.new(big_bosses)) ) .joins("INNER JOIN employees ON emplooyees.manager = bosses.id") .pluck("employees.id", "bosses.id") second_level = Employee.with(Arel::Nodes::Cte.new( Arel.sql("bosses(id, grand_boss_id)"), Arel::Nodes::Grouping.new(Arel::Nodes::ValuesList.new(second_level)) ) .joins("INNER JOIN employees ON emplooyees.manager = bosses.id") .pluck("employees.id", "bosses.id", "bosses.grand_boss_id") ... and so on ... (not tested) I'm not sure about this use case particularly, but I imagine there are a few other situations where it's useful to have more control over the name of the CTE than we currently have.
Support for the `.with` query method was added in rails#37944 The hash based syntax added in that PR supports Relations, Arel SelectManagers, and SqlLiterals as expressions, but only allows simple names for the CTEs. In some circumstances, it's useful to be able to name the columns of the CTE as well as the table. For example, if the expression is a ValuesList: WITH magic(a, b, c) AS (VALUES (2, 6, 7), (9, 5, 1), (4, 3, 8)) SELECT a + b + c FROM magic; ... we need to be able to name the columns in order to reference them later. One use case for having this supported in ActiveRecord's with query method is joining the results from a previous query to a subsequent one. For example, if you were walking through an org chart, but you couldn't use recursive CTEs, you could use this approach to keep information about the management path: big_bosses = Employee.where(manager: nil).pluck(id) first_level = Employee.with(Arel::Nodes::Cte.new( Arel.sql("bosses(id)"), Arel::Nodes::Grouping.new(Arel::Nodes::ValuesList.new(big_bosses)) ) .joins("INNER JOIN employees ON emplooyees.manager = bosses.id") .pluck("employees.id", "bosses.id") second_level = Employee.with(Arel::Nodes::Cte.new( Arel.sql("bosses(id, grand_boss_id)"), Arel::Nodes::Grouping.new(Arel::Nodes::ValuesList.new(second_level)) ) .joins("INNER JOIN employees ON emplooyees.manager = bosses.id") .pluck("employees.id", "bosses.id", "bosses.grand_boss_id") ... and so on ... (not tested) I'm not sure about this use case particularly, but I imagine there are a few other situations where it's useful to have more control over the name of the CTE than we currently have.
Support for the `.with` query method was added in rails#37944 The hash based syntax added in that PR supports Relations, Arel SelectManagers, and SqlLiterals as expressions, but only allows simple names for the CTEs. In some circumstances, it's useful to be able to name the columns of the CTE as well as the table. For example, if the expression is a ValuesList: WITH magic(a, b, c) AS (VALUES (2, 6, 7), (9, 5, 1), (4, 3, 8)) SELECT a + b + c FROM magic; ... we need to be able to name the columns in order to reference them later. One use case for having this supported in ActiveRecord's with query method is joining the results from a previous query to a subsequent one. For example, if you were walking through an org chart, but you couldn't use recursive CTEs, you could use this approach to keep information about the management path: big_bosses = Employee.where(manager: nil).pluck(id) first_level = Employee.with(Arel::Nodes::Cte.new( Arel.sql("bosses(id)"), Arel::Nodes::Grouping.new(Arel::Nodes::ValuesList.new(big_bosses)) ) .joins("INNER JOIN employees ON emplooyees.manager = bosses.id") .pluck("employees.id", "bosses.id") second_level = Employee.with(Arel::Nodes::Cte.new( Arel.sql("bosses(id, grand_boss_id)"), Arel::Nodes::Grouping.new(Arel::Nodes::ValuesList.new(first_level)) ) .joins("INNER JOIN employees ON emplooyees.manager = bosses.id") .pluck("employees.id", "bosses.id", "bosses.grand_boss_id") ... and so on ... (not tested) I'm not sure about this use case particularly, but I imagine there are a few other situations where it's useful to have more control over the name of the CTE than we currently have.
Summary
This PR adds
.with
query method which makes it super easy to build complex queries with Common Table Expressions.It basically just "wraps" the passed arguments in the way
Arel::SelectManager.with
expects it. The biggest advantages this brings are:Arel::Nodes::As
nodesActiveRecord::Relation
so you don't loose any flexibilitySee example below: