Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing query: Incorrectly capturing variables #19

Closed
tomjaguarpaw opened this issue May 16, 2015 · 21 comments
Closed

Crashing query: Incorrectly capturing variables #19

tomjaguarpaw opened this issue May 16, 2015 · 21 comments

Comments

@tomjaguarpaw
Copy link

SQL does not allow the right-hand clause of an INNER JOIN to reference columns defined in the left-
hand clause. However, relational-record does not enforce this restriction in the type system. It is trivial to create a well-typed yet crashing query:

import Data.Int (Int32)
import Database.Relational.Query

zero :: Relation () (Int32)
zero = relation $ return (value 0)

captureVariable :: Projection Flat Int32 -> Relation () Int32
captureVariable i = relation $ do
  z <- query zero
  wheres (i .=. z)
  return z

foo :: Relation () (Int32, Int32)
foo = relation $ do
  i <- query zero
  j <- query (captureVariable i)

  return ((,) |$| i ! id'
              |*| j ! id')

main :: IO ()
main = putStrLn $ show foo ++ ";"

The generated SQL is

SELECT ALL T0.f0 AS f0, T2.f0 AS f1 FROM (SELECT ALL 0 AS f0) T0
INNER JOIN (SELECT ALL T1.f0 AS f0 FROM (SELECT ALL 0 AS f0) T1
WHERE (T0.f0 = T1.f0)) T2 ON (0=0);

and Postgres reports

ERROR:  invalid reference to FROM-clause entry for table "t0"
LINE 1: ...L T1.f0 AS f0 FROM (SELECT ALL 0 AS f0) T1 WHERE (T0.f0 = T1...
                                                             ^
HINT:  There is an entry for table "t0", but it cannot
be referenced from this part of the query.

NB this is why Opaleye uses a query arrow arrow instead of a query monad.

@ocharles
Copy link

I think a LATERAL join would fix this.

@tomjaguarpaw
Copy link
Author

@ocharles I agree.

@ocharles
Copy link

I can confirm that this query works:

SELECT ALL T0.f0 AS f0, T2.f0 AS f1 FROM (SELECT ALL 0 AS f0) T0
INNER JOIN LATERAL (SELECT ALL T1.f0 AS f0 FROM (SELECT ALL 0 AS f0) T1
WHERE (T0.f0 = T1.f0)) T2 ON (0=0);

@tomjaguarpaw
Copy link
Author

NB Not all flavours of SQL support something equivalent to LATERAL.

@mboes
Copy link

mboes commented May 17, 2015

OK. Does this mean a monadic interface can be given sensible semantics for the SQL+LATERAL dialect? Or are there other issues (to be filed as separate tickets)?

@tomjaguarpaw
Copy link
Author

As I understand it LATERAL allows a monadic interface with sensible semantics. Whether there are additional semantics bugs with relational-record I do not know. I doubt there would be additional crash bugs.

@tomjaguarpaw tomjaguarpaw changed the title Crashing query: Aggregation can incorrectly capture variables Crashing query: Incorrectly capturing variables May 17, 2015
@tomjaguarpaw
Copy link
Author

I just realised this has nothing to do with aggregation! I changed the title accordingly.

@kazu-yamamoto
Copy link
Collaborator

@tomjaguarpaw Would you tell me that how the arrow interface of Opaleye prevent this issue? So far, I understand that the arrow interface of Opaleye is equivalent to a state monad. I must misunderstand something.

@tomjaguarpaw
Copy link
Author

The arrow interface of Opaleye is not equivalent to a state monad. The implementation of Opaleye's QueryArr is a "state arrow" (analogous to a state monad) but only a small subset of the state arrow's functionality is exposed. The interface to Opaleye is much weaker than a state arrow, let alone a state monad.

The problem with relational-record in this particular case seems to be that you can apply relation to a query block that captures columns from outside. Its type is

relation :: QuerySimple (Projection Flat r) -> Relation () r

in the code

captureVariable i = relation $ do
    ...

relation has no way to prevent the i argument coming from a previous clause. The alternative is to make QuerySimple an arrow, so you statically know it has no input. That means that the type of relation would become

relation :: QuerySimple () (Projection Flat r) -> Relation () r

In order to achieve the desired effect you would have to limit yourself to only arrow operations on QuerySimple and probably Relation too. If you're lucky and arrange things suitably relation might generalise to

relation :: QuerySimple a (Projection Flat r) -> Relation a r

In Opaleye I believe we have the invariant that a value q :: Column a -> QueryArr (Column b) (Column c) can never be combined into a query whereby the input Column a can depend on the result of any previous clause (essentially I think it means its value is fixed at compile time). On the other hand, Column b can depend on the result of previous clauses. For example Column a might be something like 1 + 2 * 3 whereas Column b might be table1.foo + table2.foo * 3. In this way we can ensure that we use references to previous columns correctly.

I'm afraid I can't say much more about relational-record because I don't really understand the setup of Relation, QuerySimple etc., but I hope that gives you a bit of a clue. Additionally, I have never found a convincing proof that arrows really are required but all my investigations so far convince me that they do address this issue of non-LATERAL joins very precisely.

@ocharles
Copy link

Not to derail the conversation too much but I still think HOAS or something
can prevent that problem, but I'm not sure how it fits in with having a
monad at the same time.
On 17 May 2015 2:33 pm, "tomjaguarpaw" notifications@github.com wrote:

The arrow interface of Opaleye is not equivalent to a state monad. The
implementation of Opaleye's QueryArr is a "state arrow" (analogous to a
state monad) but only a small subset of the state arrow's functionality is
exposed. The interface to Opaleye is much weaker than a state arrow, let
alone a state monad.

The problem with relational-record in this particular case seems to be
that you can apply relation to a query block that captures columns from
outside. Its type is

relation :: QuerySimple (Projection Flat r) -> Relation () r

in the code

captureVariable i = relation $ do
...

relation has no way to prevent the i argument coming from a previous
clause. The alternative is to make QuerySimple an arrow, so you
statically know it has no input. That means that the type of relation
would become

relation :: QuerySimple () (Projection Flat r) -> Relation () r

In order to achieve the desired effect you would have to limit yourself to
only arrow operations on QuerySimple and probably Relation too. If you're
lucky and arrange things suitably relation might generalise to

relation :: QuerySimple a (Projection Flat r) -> Relation a r

In Opaleye I believe we have the invariant that a value q :: Column a ->
QueryArr (Column b) (Column c) can never be combined into a query whereby
the input Column a can depend on the result of any previous clause
(essentially I think it means its value is fixed at compile time). On the
other hand, Column b can depend on the result of previous clauses. For
example Column a might be something like 1 + 2 * 3 whereas Column b might
be table1.foo + table2.foo * 3. In this way we can ensure that we use
references to previous columns correctly.

I'm afraid I can't say much more about relational-record because I don't
really understand the setup of Relation, QuerySimple etc., but I hope
that gives you a bit of a clue. Additionally, I have never found a
convincing proof that arrows really are required but all my investigations
so far convince me that they do address this issue of non-LATERAL joins
very precisely.


Reply to this email directly or view it on GitHub
#19 (comment)
.

@tomjaguarpaw
Copy link
Author

tomjaguarpaw commented May 17, 2015 via email

@mboes
Copy link

mboes commented May 17, 2015 via email

@tomjaguarpaw
Copy link
Author

tomjaguarpaw commented May 17, 2015 via email

@khibino
Copy link
Owner

khibino commented May 18, 2015

When passing projections into a query-building monad do statement from
external scope, HRR generates a broken SELECT statement that references
the left-side sub-query. We have known about this problem for two years.
(2f60660#diff-032f2e99a15be37134b7b04df8eb79e2R85)

This example suggests that this problem occurs not only when projections
are passed as arguments, but also when nested do statements are used.

It is easy to implement an Arrow wrapper for HRR.
(https://gist.github.com/khibino/57405584b168d98fd1e8)

When using this wrapper, the Arrow expression scope (between <- and -<)
is separate, making previous arrow results out-of-scope. In this case,
the issue is avoided, like Opaleye, but the issue still occurs when
projections are passed to the domain-side of the Arrow.

In Opaleye, passing columns (which correspond to HRR projections) to the
domain-side of the Arrow does not cause this problem. The reason,
however, is not that Opaleye uses Arrows, but that the WHERE clauses of
sub-queries in join-products are combined into an outer WHERE clause.

HRR can express each ON clause and UNION like(UNION, EXCEPT, INTERSECT) operations,
so the WHERE clause of sub-queries are
not combined automatically. (The purpose of this design is to make the
resulting SQL statements easier to understand.)

I do not yet have a good idea for how to solve this issue.

@kazu-yamamoto
Copy link
Collaborator

To be fair, we would like to include the ON vs WHERE issue in our HRR paper. That is, SQL generated by HRR is simpler but its type-safety is weaker than that of Opaleye.

@khibino
Copy link
Owner

khibino commented May 19, 2015

It may be possible to lift up and to combine WHERE clauses of sub-query join-products into an outer WHERE clause. I think that implementing query transformation with keeping its semantics is interesting in future works.

@chrisdone
Copy link

Having used HaskellDB for a couple years I can add my 2¢ that I don't care about looking at the generated SQL as long as it's correct, so prioritizing readability over correctness is not an aim I'd go for.

@kazu-yamamoto
Copy link
Collaborator

But what about performance? I heard that complex SQL statements generated by HaskellDB are slow with MySQL.

http://pseudofish.com/haskelldb-performance.html

@chrisdone
Copy link

With MySQL, yes. I admit I don't care at all about anything other than PostgreSQL.

@kazu-yamamoto
Copy link
Collaborator

:-)

@khibino
Copy link
Owner

khibino commented Jun 25, 2015

@tomjaguarpaw Thanks for reporing this issue.

I added an arrow-combinator module 'Database.Relational.Query.Arrow' to import the idea of Opaleye!

Building queries using combinators imported from 'Database.Relational.Query.Arrow'
instead of 'Database.Relational.Query' controls injection of previous local projections.

Combinator implementations are involved in
https://github.com/khibino/haskell-relational-record/blob/72de038ef6c645feb92c9d889d7a6ece896a93d1/relational-query/src/Database/Relational/Query/Arrow.hs .
Examples are involved in test code
https://github.com/khibino/haskell-relational-record/blob/72de038ef6c645feb92c9d889d7a6ece896a93d1/relational-query/test/sqlsEqArrow.hs

I think using this new module makes relational-record safer on the same level with Opaleye.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants