New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joins become nested selects, slow on MySQL #623

Closed
fernandezpablo85 opened this Issue Jan 23, 2014 · 26 comments

Comments

Projects
None yet
@fernandezpablo85

fernandezpablo85 commented Jan 23, 2014

Here's the full thread:

https://groups.google.com/forum/#!topic/scalaquery/EXihfXbHmSI

This is making some queries go from 11 seconds to 5 minutes (see "ijuma" comment on that thread).

@szeiger

This comment has been minimized.

Member

szeiger commented Feb 5, 2014

Related to #489

@szeiger szeiger added the improvement label Feb 10, 2014

@cvogt

This comment has been minimized.

Member

cvogt commented May 30, 2014

And to #714

@szeiger szeiger added this to the 2.2.0 milestone Jun 23, 2014

@papauschek

This comment has been minimized.

papauschek commented Jul 6, 2014

This has become a critical issue for us.

The simplest left joins will result in full table scans even on the newest MySQL 5.6 version, because it won't use indices in the subselects created by Slick.

Is there any chance this can be moved into 2.1?

Most likely we will have to migrate to PostgreSQL if this doesn't get solved, and I'm sure many other users will have to do this as well.

@cvogt

This comment has been minimized.

Member

cvogt commented Jul 6, 2014

2.1 is coming in the next few days. Impossible to improve this until then. Please tell me if emulating the left join using an inner join and a union works for you as a workaround. Example:

Instead of

A.leftJoin(B).on(_.id === _.aId)
   .map{ case (a,b) =>
    (a,
      (b.foo.?,b.bar.?)
    )
  }

use (after fixing left-over type errors ;))

A.join(B).on(_.id === _.aId)
  .map{ case (a,b) =>
    (a,
      (b.foo.?,b.bar.?)
    )
  }
  .union(
    A.filter(a => B.filter(_.aId === a.id).exists)
     .map(a =>
       (a,
         (LiteralColumn[Option[String]](None), LiteralColumn[Option[Int]](None))
       )
     )
  )
)

Assuming b.foo is String and b.bar is Int

@fernandezpablo85

This comment has been minimized.

fernandezpablo85 commented Jul 6, 2014

IMHO this is being incorrectly tagged as a "performance" improvement. It's
not. Not being able to perform a join it's not only slower but incorrect.

On Sun, Jul 6, 2014 at 1:03 PM, Jan Christopher Vogt <
notifications@github.com> wrote:

2.1 is coming in the next few days. Impossible to improve this until then.
Please tell me if emulating the left join using an inner join and a union
works for you as a workaround. Example:

Instead of

A.leftJoin(B).on(_.id === _.aId)
.map{ case (a,b) =>
(a,
(b.foo.?,b.bar.?)
)
}

use (after fixing left-over type errors ;))

A.join(B).on(_.id === .aId)
.map{ case (a,b) =>
(a,
(b.foo.?,b.bar.?)
)
}
.union(
A.filter(a => B.filter(
.aId === a.id).exists)
.map(a =>
(a,
(LiteralColumnOption[String], LiteralColumnOption[Int])
)
)
)
)

Assuming b.foo is String and b.bar is Int


Reply to this email directly or view it on GitHub
#623 (comment).

@cvogt

This comment has been minimized.

Member

cvogt commented Jul 6, 2014

As far as I understood, the generated query is semantically correct and produces the correct result. Only that it has slow runtime on MySQL, because it's optimizer doesn't know how to come up with an efficient query plan. As a consequence we should generate a semantically equivalent query, which MySQL is able to optimize better. Sounds like a performance improvement to me. Correct me if I am wrong.

@cvogt cvogt added the UPVOTED label Jul 6, 2014

@cvogt

This comment has been minimized.

Member

cvogt commented Jul 6, 2014

(I flagged this as UPVOTED because its a frequently requested feature by our quite significant MySQL audience)

@papauschek

This comment has been minimized.

papauschek commented Jul 11, 2014

@cvogt I will give the workaround a try next time we run into performance issues. Thanks.

@papauschek

This comment has been minimized.

papauschek commented Jul 14, 2014

@cvogt I tested the performance of your workaround using UNIONs with MySQL 5.6

Unfortunately, MySQL 5.6 will only use indices for the query if you filter INSIDE the unions. If you filter outside the union, slick will again generate a nested select, and MySQL won't try to find usable indices.

Filtering inside the unions drastically limits my ability to keep the queries DRY.

Maybe MariaDB offers a better query optimizer and a better performance than MySQL.

@cvogt

This comment has been minimized.

Member

cvogt commented Jul 14, 2014

@papauschek Thank you for the analysis! Good to know.

@naasir

This comment has been minimized.

naasir commented Oct 16, 2014

Any update on this issue? As stated above, this is a real show-stopper for using Slick with MySQL. Nonetheless, thanks for the awesome project and all the hard work.

@PanAeon

This comment has been minimized.

PanAeon commented Nov 16, 2014

Hi, situation a bit improved on MySQL 5.6 but still is very slow, a query with 5 inner joins executes 10 - 20 seconds on MySQL 5.5, 0.5 - 1 seconds on 5.6 and under a millisecond on postgres or rewritten without nested selects. I think the problem might be that mysql will create a temporary table for a nested select.

@wuservices

This comment has been minimized.

wuservices commented Nov 28, 2014

My findings are similar to @PanAeon, except that on 5.5, I take the whole server down instead of just being slower.

I can confirm that for me MySQL 5.6 seems to improve the situation, but on 5.5, I can't use it at all. My production database server runs a Percona MySQL based on 5.5 and letting Slick run its inefficient query basically takes down our entire server. For example, the server is always at a few % load, but running this one query for the following led to 100% CPU usage for 15 minutes after which I just gave up and had to kill the connection.

Here's the Scala I'm using

for {
  (i, p) <- Items leftJoin Properties on ((i, p) => (i.itemid === p.itemid && p.propname === "Property Name")) if i.available === 1.toByte
} yield (i.itemid, p.propvalue)

But here's what I'd actually want to happen:

SELECT i.itemid, p.propvalue
FROM items i
LEFT JOIN properties p ON i.itemid = p.itemid AND p.propname = "Property Name"
WHERE i.available

Even on 5.6, my manual query takes a few hundred ms vs 2-3 seconds for the Slick generated query. While not ideal, this works, but the fact that it takes down the whole server on 5.5 for a similar query for me makes this worse than not really working, it makes it dangerous.

@guillaumebadin

This comment has been minimized.

guillaumebadin commented Dec 17, 2014

+1

1 similar comment
@mohittt8

This comment has been minimized.

mohittt8 commented Dec 19, 2014

+1

@papauschek

This comment has been minimized.

papauschek commented Dec 19, 2014

@cvogt I don't think that "performance improvement" is a good classification for this issue. I think it's a deal breaker for MySQL users. Because you might as well use flat files to store your data and get similar performance (full table scans).

If I would have known about this issue from the start, I would have used PostgreSQL instead, and never had any issue.

Slick communicates that it supports MySQL, and that seems purely theoretical, and not very practical at this point.

@wuservices

This comment has been minimized.

wuservices commented Dec 20, 2014

@papauschek I fully agree and it's more of a bug in my eyes since it takes down our database server. I've worked around it with plain SQL, but for the simple query I was doing (above), a flat file would have been many times better since it wouldn't crash our database. Slow is one thing, not working at all and causing other issues is worse.

@szeiger szeiger modified the milestones: 3.1.0, 3.0.0 Jan 29, 2015

@szeiger

This comment has been minimized.

Member

szeiger commented Jan 29, 2015

Rescheduling to 3.1. Rewriting the last few phases of the query compiler to fix these issues should become one of the big topics for 3.1.

@papauschek

This comment has been minimized.

papauschek commented Jan 29, 2015

Glad to hear that this issue is being picked up :)

@kjim

This comment has been minimized.

kjim commented Apr 20, 2015

+1

1 similar comment
@nicky-isaacs

This comment has been minimized.

nicky-isaacs commented May 28, 2015

+1

@trobison

This comment has been minimized.

trobison commented May 28, 2015

Just reading this makes me fear adopting slick. MySQL has its flaws and I would love to have some beers and rant about them, but every other way I can talk to MySQL doesn't come bundled with this limitation.

+1

@szeiger

This comment has been minimized.

Member

szeiger commented May 28, 2015

@pathikrit This is the expected behavior. You take the first 50 elements from an unordered view, and then sort those elements. That's different from first sorting and then taking the first 50 from the sorted view. It's the same for Scala collections.

@cvogt

This comment has been minimized.

Member

cvogt commented May 28, 2015

@szeiger is this in the scope of your 3.1 compiler enhancement?

@szeiger

This comment has been minimized.

Member

szeiger commented May 29, 2015

@cvogt What exactly, eliminating the nesting? Sure, that's the whole whole point of it. Pretty much as requested in #623 (comment). Or the correct propagation of ORDER BY clauses through subqueries? Possibly, time permitting. Currently there are still some situations where it doesn't happen. Eliminating most of the subqueries should further marginalize these cases, but we could eliminate them entirely with some more work.

@szeiger

This comment has been minimized.

Member

szeiger commented Jun 24, 2015

Duplicate of #489

@szeiger szeiger closed this Jun 24, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment