Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support joins with scala collections using temporary tables #799

Open
hvesalai opened this issue May 8, 2014 · 15 comments
Open

Support joins with scala collections using temporary tables #799

hvesalai opened this issue May 8, 2014 · 15 comments

Comments

@hvesalai
Copy link
Member

hvesalai commented May 8, 2014

Traditionally the way ORMs have supported joins over external data is the use of the SQL IN keyword, i.e.

select * from "foo" where "bar" IN ( /* list of values */ );

This is equivalent to creating a temporary table and doing an inner join on that table.

create temporary table "t1" ( "bar" text ) on commit drop;
-- insert the list of values to t1
select * from "foo" natural join "t1";

While it is true, the latter has a more complex SQL syntax, it has the following benefits:

  • it has potentially better performance
  • the select statement can be compiled and prepared
  • it is much more versatile

As to the last point, here is something you cannot do with the IN keyword.

create temporary table "t2" ( "bar" text, "zot" number ) on commit drop;
-- insert the list of tuples to t2
select * from "foo" natural join "t2";

I.e. selecting all the rows from foo where bar = t2.bar and zot = t2.zot.

What I'm requesting is that slick support a nice DSL for doing a join with a scala collection in which the user of the API wouldn't have to handle (or even know about) the createion of the temporary table and the insertion of the values to that table.

E.g. with the current inner join syntax:

val names = List("Heikki" -> "Vesalainen", "Stefan" -> "Zeiger")
val q = persons join names on { (p, n) => p.firstName === n._1 && p.lastName === n._2 }

would result in

create temporary table "t1" ( "x1" text, "x2" text ) on commit drop;
-- insert names to t1
select * from "persons" as "p" inner join "t1" as "n" on "p"."firstName" = "n"."x1" and "p"."lastName" = "n"."x2";
@cvogt
Copy link
Member

cvogt commented May 13, 2014

An alternative to this would be fetching all persons and doing the join in-memory. Which one is more efficient depends on which collections is larger, the in-memory one or the (permanent) database table. We will probably need an api for exposing this decision to the user, not only for temporary tables, but also for distributed queries joining tables from multiple dbs. An alternative could be an automatic decision based on profiling or heuristics or something like that.

@szeiger szeiger added this to the Future milestone May 20, 2014
@hvesalai
Copy link
Member Author

hvesalai commented Mar 9, 2015

I would have needed this again today.

@KLBonn
Copy link

KLBonn commented Jul 10, 2015

Me too.
For now it seems I have to do it in-memory.

Side note: I know of close to zero cases where the in-memory (application input) collection would exceed the respective database table size. By far the most cases of collection comparisons (should) have been located on db side.

@mboogerd
Copy link

mboogerd commented Aug 4, 2015

@cvogt, you're of course correct in that efficiency depends on the size of the scala set vs. the table size, however I agree with @KLBonn that the majority of use cases the former will be smaller. In our current project we do quite a bit of micro-batching, which would be helped tremendously if this was a slick feature!

@Pyppe
Copy link

Pyppe commented Aug 14, 2015

👍

@thirdy
Copy link

thirdy commented Apr 22, 2016

Would have helped me too. Any updates on this?

@francescopellegrini
Copy link

👍

@cvogt cvogt added the UPVOTED label May 18, 2016
@cvogt
Copy link
Member

cvogt commented May 18, 2016

maybe something @radsaggi can look at later this summer during GSOC

@alexander-myltsev
Copy link

👍

@virusdave
Copy link
Contributor

👍
This is something i run into all the time as well.

@kemmar
Copy link

kemmar commented Oct 9, 2019

I also would love this feature

@tobiatesan
Copy link

Just pointing out that in absence of this feature, people are more likely to store chunks of results inside a Scala variable, which carries a performance penalty, and potentially try to use it in an IN SET clause.

Which might then involve running into #1739

@WayneWang12
Copy link

it seems a quite good feature to me.

@hvesalai
Copy link
Member Author

damn I've had some great ideas in the past. 10 years ago.

@nafg
Copy link
Member

nafg commented Mar 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests