New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when composing ecto queries #1284

Closed
iwarshak opened this Issue Feb 28, 2016 · 30 comments

Comments

Projects
None yet
@iwarshak
Copy link
Contributor

iwarshak commented Feb 28, 2016

Environment

  • Elixir version (elixir -v): 1.1.1
  • Database and version (PostgreSQL 9.4, MongoDB 3.2, etc.): Postgresql
  • Ecto version (mix deps): 1.1.4
  • Database adapter and version (mix deps): postgrex 0.11.1
  • Operating system: mac

I am trying to write a set of composable queries. I am running into an issue when composing 2 queries where each query joins a separate table, and the second query has a where condition.

# comment.ex

  schema "comments" do
    field :title, :string
    field :body, :string
    field :likes, :integer
    belongs_to :author, HelloPhoenix.Author
    belongs_to :post, HelloPhoenix.Post

    timestamps
  end

  def with_popular_post(query) do
    query
    |> join(:inner, [comment], p in assoc(comment, :post))
    |> where([comment, p],  p.likes == 4)
  end

  def for_author(query, author) do
    query
    |> join(:inner, [comment], a in assoc(comment, :author))
    |> where([comment, a],  a.id == ^author.id)
  end


Comment |> Comment.for_author(author) |> Comment.with_popular_post() |> Repo.all

     ** (Ecto.QueryError) web/models/comment.ex:21: field `HelloPhoenix.Author.likes` in `where` does not exist in the model source in query:

     from c in HelloPhoenix.Comment,
       join: a in HelloPhoenix.Author,
       on: a.id == c.author_id,
       join: p in HelloPhoenix.Post,
       on: p.id == c.post_id,
       where: a.id == ^28,
       where: a.likes == 4

It seems to me that the root of the issue is that there is another binding available after the first query runs, and the second query having no idea about it does the where on the incorrect binding.

This would seem to make it difficult to compose queries, since they would need to know of all the previously joined tables.

Any ideas?

Full source with failing test. https://github.com/iwarshak/ecto_compose_error

@parkerl

This comment has been minimized.

Copy link
Contributor

parkerl commented Feb 28, 2016

@iwarshak sorry you are having issues. We'd be happy to help you with this on the ecto slack channel or on irc.

@iwarshak

This comment has been minimized.

Copy link
Contributor

iwarshak commented Mar 2, 2016

Thanks @parkerl - @josevalim asked me to make this into an issue after discussing it in IRC.

@parkerl

This comment has been minimized.

Copy link
Contributor

parkerl commented Mar 3, 2016

@iwarshak my bad. This is definitely a tough problem. Did you come up with any ideas in your conversation with @josevalim ?

@josevalim

This comment has been minimized.

Copy link
Member

josevalim commented Mar 3, 2016

Thank you @iwarshak. To recap what we have discussed on IRC, the issue happens because when with_popular_post runs after for author, the associated post is now the third binding. For example, we would need to rewrite the query as:

  def with_popular_post(query) do
    query
    |> join(:inner, [comment], p in assoc(comment, :post))
    |> where([comment, _author, p],  p.likes == 4)
  end

However that obviously does not compose well if we always need to track the previous bindings. The only solution I have in mind so far is to allow ... to be specified in the bindings as a way to say "everything between". So we would rewrite the query to:

  def with_popular_post(query) do
    query
    |> join(:inner, [comment], p in assoc(comment, :post))
    |> where([comment, ..., p],  p.likes == 4)
  end

And p would always bind automatically regardless if author was joined before or not. This is still a bit surprising because folks won't know about such possibilities upfront but it would solve the problem cleanly.

Finally, it is worth noting though this issue does not happen when using keyword queries because we can track bindings more efficiently there.

@michalmuskala

This comment has been minimized.

Copy link
Member

michalmuskala commented Mar 3, 2016

I'm not really sure the ... solution is clearer than keyword query.
I know we love the pipe, but it looks rather wired here and causes problems where a solution exists even today.

@josevalim

This comment has been minimized.

Copy link
Member

josevalim commented Mar 3, 2016

@michalmuskala right, that would have been my answer as well except people really, really, really prefer using the pipeline so to me "using keyword syntax" is not a good enough answer.

Another solution to this problem is to allow the association syntax to work without an explicit join:

  def with_popular_post(query) do
    where(query, [c], assoc(c, :post).likes == 4)
  end

When such is used we would automatically join on post. If you don't need to use bindings, the problem disappears altogether. But then we run into issues like: what happens if you use c.post and also explicitly join on it? Should it be added twice?

@parkerl

This comment has been minimized.

Copy link
Contributor

parkerl commented Mar 5, 2016

Is there always the implied assumption that if you are composing queries that the from binding must always be referenced in each part of the composition?

If I have Authors with Posts, and Posts with Comments and Tags, then I could/should not attempt the following?

     Author
     |> popular_posts
     |> with_tags(tags)

     def popular_posts(query) do
       query
       |> join(:inner, [author], post in assoc(author, :post))
       |> join(:inner, [author, post], comment in assoc(post, :comments))
       |> where([author, post, comment],  comment.likes == 4)
     end

     def posts_with_tags(query, tags) do
       query
       |> join(:inner, [post], tag in assoc(post, :tags))
       |> where([post, tag],  tag.tag_name in(^tags))
     end

but have...

    def posts_with_tags(query, tags) do
       query
       |> join(:inner, [author], post in assoc(author, :post))
       |> join(:inner, [author, post], tag in assoc(post, :tags))
       |> where([author, post, tag],  tag.tag_name in(^tags))
     end

which would become?

    def posts_with_tags(query, tags) do
       query
       |> join(:inner, [author], post in assoc(author, :post))
       |> join(:inner, [author,..., post], tag in assoc(post, :tags))
       |> where([author,..., tag],  tag.tag_name in(^tags))
     end
@iwarshak

This comment has been minimized.

Copy link
Contributor

iwarshak commented Jul 12, 2016

@josevalim can you explain this a bit more, please?

Finally, it is worth noting though this issue does not happen when using keyword queries because we can track bindings more efficiently there.

How does this get around the problem of a chained query not being aware of the previous bindings?

@josevalim

This comment has been minimized.

Copy link
Member

josevalim commented Jul 12, 2016

If you write:

from c in comment,
  join: p in assoc(c, :post)
  where: p.likes == 4

We know exactly where p belongs. Does it explain it?

@karlseguin

This comment has been minimized.

Copy link

karlseguin commented Aug 22, 2016

Running into the same issue. Your last comment isn't clear to me. It seems like a more appropriate example of composition would be:

from c in comment,
  join: p in assoc(c, :post)
  where: Post.is_popular()

which won't work. Knowing nothing about how Ecto works, my "solution" would involve allowing binding names to be meaningful and thus emulating SQL alias. Then you could pass the alias to the composition and somehow use it there.

@teamon

This comment has been minimized.

Copy link
Contributor

teamon commented Sep 5, 2016

My two cents (this has been causing me lots of trouble recently)

Have you considered some variation of the following syntax:

Post
|> join(:inner, [p], c in assoc(p, :comments))
|> join(:inner, [p], a in assoc(p, :authors))
|> where(%{comments: c}, c.id > 5) # bind by table name instead of index

? That way we could always explicitly say what table we are interested in.

@michalmuskala

This comment has been minimized.

Copy link
Member

michalmuskala commented Sep 5, 2016

The problem with such solution is - what if we join twice on the same table. It's a completely reasonable thing to do. The table name is completely ambiguous in that case. Another issue is that we loose backwards compatibility.

@teamon

This comment has been minimized.

Copy link
Contributor

teamon commented Sep 5, 2016

Regarding compatibility - afaik you can't use a map as binding right now, so all previous code should just work fine
Regarding multiple joins - right.. :/

@smoku

This comment has been minimized.

Copy link

smoku commented Sep 5, 2016

@iwarshak thanks for creating this issue. Right now we are in the process of migrating Ruby/Rails application to Elixir/Phoenix (with Ecto of course). We are also having a similar problem and it's very serious in our use case. In every resource we have a querable named visible_by that is responsible for access control and we use it everywhere we are getting data from Repo to avoid any data leakage . Pseudo code below:

def visible_by(query, current_admin) do
   case current_admin.role do
      "role_1" ->
        query
        |> join(:inner, [o], p in assoc(o, :placements))
        |> where([_, p], p.admin_id == ^current_admin.id)
      "role_2" ->
        query
        |> join(:inner, [o], p in assoc(o, :placements))
        |> join(:inner, [_, p], a in assoc(p, :assignments))
        |> where([_, _, a], a.admin_id == ^current_admin.id)
      _ ->
        query
    end
end

Currently this querable is useless since it's impossible to compose it because we don't know how many joins there are:

Offer 
|> Offer.visible_by(current_admin) 
|> join(:inner, [o], p in assoc(o, :placements))
|> where([_, p], p.followed == true)

We can go back to doing access control in every query manually but using visible_by is very elegant and avoids data leakage in case someone forgets to add a join.

I know we can do it like that but we have double joins if "role_1" or "role_2" and I'm not a big fan of "from" when we use a lot of composable queries .

query = Offer |> Offer.visible_by(current_admin) 

query =
   from o in query,
   join: p in assoc(o, :placements),
   where: p.followed == true

I like the idea proposed by @teamon, ActiveRecord works like that and is useful in most of the cases:

Post
|> join(:inner, [p], c in assoc(p, :comments))
|> join(:inner, [_, c], a in assoc(c, :authors))
|> where(%{comments: c}, c.id > 5)

or

|> where([comments: c], c.id > 5)

@michalmuskala I wouldn't be worried about backwards compatibility since this can be only an alternative syntax. If you need multiple joins on the same table you can use normal bindings (it's not that common to use multiple joins on the same table).

What do you think @josevalim @michalmuskala @parkerl? If you think it makes sense we can dedicate some resources, try to write it and make pull request.

Another possible solution would be to set visible_by as subquery if it is using any join(). That way it would be composable and we could join on it without any issues. It won't be as performant but maybe DB planner will optimize it.

UPDATE:

We went with subqueries approach. This seems to be the best approach in this use case without complicating the code.

@josevalim

This comment has been minimized.

Copy link
Member

josevalim commented Sep 7, 2016

I think using the names is a sensible idea although we will need to raise if you attempt to use duplicates.

@michalmuskala

This comment has been minimized.

Copy link
Member

michalmuskala commented Sep 7, 2016

I was thinking about this as well, and I see how we could use the names. Instead of table names, I would rather use the names given in bindings, e.g. x when you do join: x in Foo. Storing those in a map instead of a list would allow us to easily pattern match the bindings. If we additionally record the names in a list we would be able to convert list-style bindings into map style ones.

The only problem is what you already mentioned, @josevalim, duplicate bindings (as in binding name, not table/column) would need to raise. This is not backwards-compatible, I think it would be fine to break this. Potentially by introducing a warning in a pre-release and checking how many people would complain.

@josevalim

This comment has been minimized.

Copy link
Member

josevalim commented Sep 7, 2016

@michalmuskala we can raise only if the named binding is used. :)

@josevalim josevalim added this to the v2.1 milestone Sep 7, 2016

@teamon

This comment has been minimized.

Copy link
Contributor

teamon commented Sep 7, 2016

@michalmuskala I'm not sure if I got it correctly but:

def with_date(query, date), do: query |> where(%{c: c}, c.date == date)

query
|> joins(:inner, [p], x in assoc(p, :comments) # note the x
|> with_date(today)

this would not match (x vs c) right? With that in mind I think table names work better.

@parkerl

This comment has been minimized.

Copy link
Contributor

parkerl commented Sep 7, 2016

I am missing something...with this solution instead of remembering how many bindings there are between composing functions, I have to remember the name I gave the binding between functions? Could someone translate the original example above into what this new syntax might look like? Sorry, I'm a bit slow.

If you need multiple joins on the same table you can use normal bindings (it's not that common to use multiple joins on the same table)

IMHO, this is a fairly common use case. A solution that did not allow for this would seem a bit incomplete.

@josevalim

This comment has been minimized.

Copy link
Member

josevalim commented Sep 7, 2016

  def with_popular_post(query) do
    query
    |> join(:inner, %{comment: comment}, post in assoc(comment, :post))
    |> where(%{post: post},  post.likes == 4)
  end

IMHO, this is a fairly common use case. A solution that did not allow for this would seem a bit incomplete.

You would give different names to your bindings.

@josevalim

This comment has been minimized.

Copy link
Member

josevalim commented Sep 14, 2016

I have pushed a series of commits that add last binding match with .... So now you can write a query like this:

query
|> join(:inner, [p], a in assoc(p, :authors))
|> where([p, ..., a], p.category_id == a.category_id)

In the example above, you know that post is the first binding and that the author is the last one, so you can use ... to ignore what is in the middle (which may be nothing or many). I have updated the docs and everything else.

I would like to try this approach for now and, if it is not enough, we can add named bindings. In fact, the changes we did to support ... should make it easy to support named bindings in the future.

@josevalim josevalim closed this Sep 14, 2016

@teamon

This comment has been minimized.

Copy link
Contributor

teamon commented Sep 14, 2016

@josevalim Thanks! We will definitely try it out in our codebase :)

@zachdaniel

This comment has been minimized.

Copy link
Contributor

zachdaniel commented Jan 13, 2017

We write some pretty large queries, and we're going to have to end up writing a module that allows us to track what things have been joined onto a query, and allows us to join if we need to, or just find the binding and perform further filtering if not. This has probably been our only pain point with ecto query syntax. If we had named bindings, we could take functions that take a query the name of the binding for what you want to filter, and the parameters and do some really great stuff. The sort of interface I want is something like


defmodule MyQuery do
  def inserted_at_before(query, binding_name, join_function, datetime) do
    query
    |> join_unless_already_joined(binding_name, :inner, join_function)
    |> where(bindings, bindings[^binding_name].inserted_at < datetime)
  end
end

defmodule Post do
  #schema...

  def join_user(query, post_binding) do
     query |> join(bindings, user in User, bindings[^post_binding].user_id == user.id)
  end
  
  def join_comments(query, post_binding) do
    query |> join(bindings, comment in Comment, bindings[^post_binding].id == comment.post_id)
  end
end

Post
|> bind_as(:post)
|> inserted_at_before(:comments, &Post.join_comments(&1, :post), Timex.now())
|> inserted_at_before(:user, &Post.join_user(&1, :post), Timex.now())

So you only join if an existing binding exists with that kind of join, and then you can filter. These would be infinitely composable.

@alex88

This comment has been minimized.

Copy link
Contributor

alex88 commented Jan 22, 2017

Named bindings would be very appreciated, the ... syntax fixes the join/where problem, but what about selects?

We compose a query depending on user input, which may include one or more, at the end of the query it's impossible to actually select a join unless it's first because we don't know where it would be in the bindings array

@alex88

This comment has been minimized.

Copy link
Contributor

alex88 commented Jan 25, 2017

Another example

message_configs = case current_user.group_id do
      nil -> MessageConfig |> where(organization_id: ^current_user.organization_id)
      group_id -> MessageConfig |> where(groups: ^current_user.group_id)
    end
    |> join(:left, [m], c in MessageConfigUserDisabledNotification, c.user_id == ^current_user.id and c.message_config_id == m.id)
    |> select([m, c], {m, c})
    |> Repo.all

this query doesn't work at the moment because groups is a many to many association, the problem is that if I join the MessageConfig with group, the select will have wrong bindings in case the query involves the group join

@mononym

This comment has been minimized.

Copy link

mononym commented Jul 7, 2017

I'd really love to see named bindings introduced. The new dynamic macro is a dream come true for building dynamic where statements. However I ran into a real snag trying to build a semi-complex sql query.

My use case is translating a nested and/or query described in an application specific DSL into a sql query. The kicker is that there is a central object table which six other tables belong to, each one in a many-to-one relationship. Each query I build might have all or even none of these additional tables in it. It all depends on what is passed in.

I had hoped I could create a dynamic fragment like this:
dynamic([component], component.component == ^component_string)

The intent was to have a number of single line functions building up this where, while other functions were responsible for doing the proper nesting in parentheses. Only problem is the bindings are positional, and inside of these tiny functions I just don't know what has been bound already or not, and I don't know what order they were necessarily bound.

If multiple tables are joined to the object table, and my where query references one of them in the middle, it seems the ... syntax won't help me there. The only thing I could think of besides just writing fragments was to do this:

defp build_ecto_query(object_query) do
  dynamic = build_where(object_query)

  from object in ObjectModel,
    inner_join: callback in assoc(object, :callbacks),
    inner_join: command_set in assoc(object, :command_sets),
    inner_join: component in assoc(object, :components),
    inner_join: attribute in assoc(component, :attributes),
    inner_join: relationship in assoc(object, :relationships),
    inner_join: tag in assoc(object, :tags),
    select: object.id,
    where: ^dynamic
  end

  ...

  defp build_equality_check_dynamic({:tag, {category, tag}}) do
    dynamic([object, callback, command_set, component, attribute, relationship, tag],
            (tag.category == ^category and tag.tag == ^tag))
  end

Now I presume Postgres is smart enough to not join tables not asked for in the where statement...but it doesn't feel nearly as clean as I'd like as I do a join for everything even if I only need to access the key on the object itself.

@jeremyjh

This comment has been minimized.

Copy link
Contributor

jeremyjh commented Mar 12, 2018

To workaround this lack of named bindings in Phoenix Datatables, I actually use a macro to create 25 different functions (actually 50, we have 25 for order_by, 25 for where) that accept an ordinal to which position they will bind an update to a Dynamic. That ordinal we accept as either a library user-provided value, or by counting the join order in the query for the relation in question (this is defeated by subqueries but otherwise works ok).

I've run into a need for this in other circumstances though, and am considering spinning off a library to help do this sort thing more generically. This is madness. Please give us named bindings :)

@josevalim

This comment has been minimized.

Copy link
Member

josevalim commented Mar 12, 2018

Named bindings is already in master... :)

@Overbryd

This comment has been minimized.

Copy link

Overbryd commented Mar 15, 2018

I am currently on Ecto master, to see if named bindings solve my problem at hand.

My where clauses are built using dynamic/2.
I had to rewrite them to use a binding syntax like dynamic([..., t], t.column == ^value).

Now is this a by-product that just-works™ or intended behaviour/syntax?

In consequence dynamic/2 can only be used on the first binding (e.g. [t]) last binding (e.g. [..., t]) or a fully specified binding (e.g. [a, b, t, c]).

Being on the bleeding edge: What is the outlook/intention for dynamic in conjunction with named bindings?

@josevalim

This comment has been minimized.

Copy link
Member

josevalim commented Mar 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment