Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support real life SQL #21

Closed
niner opened this issue Aug 21, 2018 · 6 comments
Closed

Support real life SQL #21

niner opened this issue Aug 21, 2018 · 6 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@niner
Copy link

niner commented Aug 21, 2018

A lesson taught by all existing attempts at creating ORMs (in any language) but one that has yet to be learned is that while most SQL statements are rather simple, there will be use cases that require some real SQL mastery and if the ORM does not allow for the latter, it's value is greatly diminished.

It seems like every ORM starts out with the simple stuff. How can we make fetching one row from a table simple? How do we allow for boolean expressions in the where clause? Let's make joining simple tables by id key trivial.

These (and a couple more) are the things that ORMs make really simple but then real users come along and ask for more and more complicated things which then get bolted on top of the simple mechanisms. And suddenly you reach the point, where writing the SQL query is the easy part, but getting your ORM to generate that query starts getting really hard or downright impossible. Then you end up with an application that's a crude mixture of ORM code and hand crafted SQL statements combining the disadvantages of both.

The conclusion is that for the API design one should start with the absolute worst and most complicated SQL statement imaginable and find an API that allows one to express this. This statement should at least contain:

  • Common Table Expressions (preferably recursive)
  • Window Functions
  • Correlated subqueries in the select clause
  • Subqueries in the from clause
  • (Nested) calls to SQL functions with calling conventions that do not like C (i.e. extract(hour from some_timestamp_column)
  • Subqueries as arguments to function calls
  • Unions
  • Type coercion ("foo::int" or "interval '1 day'")
  • Grouping and ordering
  • Special operators like #> (Get JSON object at specified path in PostgreSQL)
  • Conditionals (CASE foo WHEN bar THEN baz ELSE qux END)

I'm sure there are a couple more interesting aspects of SQL, but this should get you started. They are all features that are in use in our code base. The trick will be to make these possible while keeping the trivial cases simple.

To finish, here's an example of a real life query from our production code that DBIx::Class is completely unequipped to handle:

with product_active(product_id, active, changetime, active_end) as (
    select product_id,
        active,
        changetime,
        lead(product_active_history.changetime, 1, now()::timestamp without time zone)
            OVER (
                PARTITION BY product_active_history.product_id
                ORDER BY product_active_history.changetime
            )
            as active_end
    from product_active_history
        join products on product_active_history.product_id = products.id
        join customers on products.customer_id = customers.id
    where products.article_id = 62
        and country_id in (1, 2, 3, 4)
        and products.customer_id not in (885, 840, 246, 362, 3233, 6378)
    order by product_active_history.product_id, product_active_history.changetime
),
valuation_date as (
    select (generate_series('2000-01-01'::date, now(), interval '1 year'))::date as day
)
select *,
    100.0 * (cancelled - lag(cancelled) over (order by day)) / online as cancelled_percent,
    100.0 * (redesign - lag(redesign) over (order by day)) / online as redesign_percent
from (
    select
        day,
        count(nullif(product_active.active_end > valuation_date.day, false)) as finished,
        count(nullif(product_active.active_end > valuation_date.day and product_active.active, false)) as online,
        count(nullif(product_active.active = false and product_active.active_end > valuation_date.day, false)) as offline,
        count(nullif(product_active.active = false and not exists (
            select * from products successor where successor.predecessor_id = product_active.product_id
        ) and product_active.active_end > valuation_date.day, false)) as cancelled,
        count(nullif(
            product_active.active = false
                and exists (select * from products successor where successor.predecessor_id = product_active.product_id)
                and product_active.active_end > valuation_date.day,
            false)) as redesign
    from
        valuation_date
        join product_active on product_active.changetime <= valuation_date.day
    group by day
) as data
order by day
@FCO
Copy link
Owner

FCO commented Aug 21, 2018

I think if we find a generic and composable way to represent each SQL feature it will be possible to create large and complex query.

So, I’ll put it here some thoughts about some of the “special” features you used on your example. Not all at once. And please comment every one you’d like to!

@FCO
Copy link
Owner

FCO commented Aug 21, 2018

I don’t know if it is possible but the way I think to do something like this

WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

Is with something like this

(Red.select({ 1 }), * + 1 ... * < 100).sum

@FCO FCO added help wanted Extra attention is needed question Further information is requested labels Aug 21, 2018
@FCO
Copy link
Owner

FCO commented Aug 21, 2018

WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
    SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
  UNION ALL
    SELECT p.sub_part, p.part, p.quantity
    FROM included_parts pr, parts p
    WHERE p.part = pr.sub_part
  )
SELECT sub_part, SUM(quantity) as total_quantity
FROM included_parts
GROUP BY sub_part

I know, it’s not complete... and I don’t know if it is possible (yet). But the way I’m thinking of taking is something like:

(IncludedParts.where({ .part eq our_product }), { .^where: .part == IncludedParts.sub-part } ... *).classify: *.sub-part, :reduce-field{ .quantity.sum }

@FCO
Copy link
Owner

FCO commented Aug 24, 2018

I was thinking about that and Im starting to disagree...
The way Im thinking now is that the Red user should be able to do with Red any thing it could with local data, what SQL would it generate isn't a problem. What do you guys think?
I mean: we should map language syntax to sql, and not sql tho language syntax...

@MattOates
Copy link
Contributor

MattOates commented Sep 3, 2018

Having recently spent a week fighting hand and fist against SQLAlchemy to get it to just produce a union query with selects from subqueries with aggregate and window functions, that took me ten minutes to actually write by hand. You want to be able to go from SQL into language syntax thats like SQL, as well as language syntax thats idiomatic to whatever crazy rendered SQL that can get the job done. Personally I am a fan of the escape hatch abstraction. Something that's very thinly abstracting the SQL so you avoid vendor lock in, but close enough I can translate almost directly. Might not have to be related to the ORM really, just work relatively nicely with SQL generation in concert with the ORM. So perhaps a subquery is from an ORM expression, the other is some vendor specific directly optimised by hand query close to raw SQL, and the two are nicely combined to be rendered together with a parent query like object referencing either as if it was more ORM like.

@FCO
Copy link
Owner

FCO commented Dec 19, 2018

I think Red is tracking a way to transform perl6 code to SQL, something like:https://github.com/FCO/Red/wiki/Better-map-grep and https://gist.github.com/FCO/87831dd190e1bb12adb138ce866acb73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants