Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WhereHas query is too slow #18415

Closed
AbiriAmir opened this issue Mar 19, 2017 · 30 comments
Closed

WhereHas query is too slow #18415

AbiriAmir opened this issue Mar 19, 2017 · 30 comments

Comments

@AbiriAmir
Copy link
Contributor

  • Laravel Version: 5.4
  • PHP Version: 7.0
  • Database Driver & Version: Mysql 5.6.33

Description:
As i tested, i founded out that whereHas query is much slower than the same query using join.
Imagine this domain model:
User - Like - Post - Tag
User likes posts
Post belongs to many tags
Tag belongs to many posts
Each post has several tags and each user may like several posts. We call a specified user interested in tag T if he/she had liked at least 10 posts which has tag T. Now, for a specified user U, we are going to find those tags which he/she is interested in.

With whereHas we have the following code:

App\Tag::whereHas('posts', function($q) {
    $q->whereHas('likes', function($q) {  // likes is a relation between posts and users (post_user)
        $q->where('users.id', 1);
    });
}, '>=', 10)->toSql()

Note that the relation between like and post is polymorphic.
It produces the following SQL statement:

select * from `tags` where 
    (select count(*) from `posts` 
        inner join 
            `post_tag` on `posts`.`id` = `post_tag`.`post_id` 
        where `tags`.`id` = `post_tag`.`tag_id` and 
            exists 
                (select * from `users`
                    inner join `likes` on `users`.`id` = `likes`.`user_id` 
                    where `posts`.`id` = `likes`.`likeable_id` and `likes`.`likeable_type` = 'App\\Post' and `users`.`id` = 1
                )
    ) >= 10

But the following equivalent SQL statement performs 10 times faster with joins:

select `tags`.*, COUNT(tags.id) AS interest from `tags` 
    inner join `post_tag` on `tags`.`id` = `post_tag`.`tag_id` 
    inner join `posts` on `post_tag`.`post_id` = `posts`.`id` 
    inner join `likes` on `likes`.`likeable_id` = `posts`.`id` and `likes`.`likeable_type` = 'App\\Post' 
    inner join `users` on `likes`.`user_id` = `users`.`id` where `users`.`id` = 1 
    group by `users`.`id`, `tags`.`id` having `interest` >= 10 order by `interest` desc

I think RDMSs do lots of work to perform joins faster and ORM should take advantage of this.

@Dylan-DPC-zz
Copy link

It could be because you are running a whereHas inside another whereHas

@mrliptontea
Copy link

mrliptontea commented Mar 20, 2017

@AbiriAmir joins add up to what query is selecting, while exists doesn't. In other words, when you join, and in your PHP have something like $post->where('id', '>', 42) it will cause an error because id is ambiguous, and that is not desired behaviour. You would have to prefix every statement with the table name. For that reason, Eloquent ORM is not using joins, and never will.

If you need improved performance for a specific use case, just create a custom scope or a repository class, where you can use the query builder to add joins.
Btw you can use dot syntax to access nested relationships, like this: whereHas('posts.likes', '>=', 10).

Also, in most cases, exists is actually faster, because it doesn't order RDBMS to fetch any data, just check whether relevant rows exist, which often can be done by a simple index lookup.

@Dylan-DPC-zz
Copy link

For that reason, Eloquent ORM is not using joins, and never will.

I disagree with that @mrliptontea or maybe it is not clear enough

@mrliptontea
Copy link

@Dylan-DPC I mean by itself, Eloquent cannot use joins because that breaks the API. Every statement would have to be prefixed with the table name in order to avoid errors like 'id' in where clause is ambiguous. But as a developer, you are free to modify queries however you please. Also, like I pointed above, in most cases exists, which is what Eloquent is using for whereHas, is faster.

@AbiriAmir
Copy link
Contributor Author

AbiriAmir commented Mar 20, 2017

@mrliptontea thanks for your answer but this block is not equivalent to my query.

whereHas('posts.likes', '>=', 10)

Also exists is only in the case that we are checking >=1 and in other cases count query will be executed which is 10 times slower in my test case.

Also, I think that there must be a solution for excluding the added fields in join statement

@mrliptontea
Copy link

mrliptontea commented Mar 20, 2017

this block is not equivalent to my query.

Right, I didn't read your query through 😉

there must be a solution for excluding the added fields in join statement

I didn't mean that join is adding fields, you still you need to specify them in select, but it is creating ambiguity when two tables have a column of the same name, while subqueries don't do that. This simple fact makes subqueries usable for the ORM that will work in all cases. Need something more fine-tuned? Write you own query. Simple as that.

@KristofMorva
Copy link
Contributor

You would have to prefix every statement with the table name.

Actually, I don't see it as an impossible task. Eloquent could automatize it.

Currently a developer has to replace all with, whereHas and other calls with database queries - but then why do these even exist? When is it irrelevant to run 5 queries instead of 1? Maybe on a website for your family photos, or for your child's kindergarten. But in production it can have big network / processing impact, and (at least according to my memory) nor the documentation notes this threat, you only realize it when it's slow as hell, or if you are logging queries. And then you have to go back, replace all your calls, and change all the relevant code (since the query won't return models anymore).

Maybe I just do something the wrong way, but it's exactly what I had to do a time ago, because I naively thought Eloquent will generate an efficient query.

@mrliptontea
Copy link

mrliptontea commented Apr 5, 2017

Actually, I don't see it as an impossible task. Eloquent could automatize it.

True, it could be done. However, it would require either you to provide a list of all the columns in the model or Laravel would have to run additional describe table query. Otherwise, it won't know which columns to prefix with what.

Currently a developer has to replace all with, whereHas and other calls with database queries

That is not true. As I stated above, in the majority of cases a subquery with exists, which is what Laravel does, is optimal for this use.

since the query won't return models anymore

When you use Eloquent query builder, it does. Obviously, to populate relationships correctly, it has to go through relationships but you can mix and match, i.e. you can have a join in the query, to fetch only desired rows, and still use eager loading.

There is no definitive list of rules on how you should write your queries. It all depends on your database structure, its size, and what you're trying to achieve. Whenever you have performance issues with your SQL queries, EXPLAIN is you only friend. Any ORM can take it only this far.
Don't get me wrong, I wish Laravel could do all that automagically but it's complex.

@jhdxr
Copy link
Contributor

jhdxr commented Apr 5, 2017

@KristofMorva it's very hard to generate an efficient query, especially you have to support diffierent rdbms. for example, there are lots of sql queries which are considered to slow down the performance for mysql, like sub-query, exists. but for oracle, they are just piece of cake.
IMHO, I'll keep as simple as possilbe when I use ORM. and if I need do something complicated with sql, just use the raw query (with the help of functions like selectRaw, whereRaw, orderByRaw or something similar).

@Dylan-DPC-zz
Copy link

maybe the slowness is because you are using 2 whereHas functions?. Better to judge it by making a separate test-case with a single whereHas and evaluating.

@KristofMorva
Copy link
Contributor

it's very hard to generate an efficient query, especially you have to support diffierent rdbms

Yes, I'm sure it'd be complex to solve.

IMHO, I'll keep as simple as possilbe when I use ORM. and if I need do something complicated with sql, just use the raw query.

From now on, same here; but it'd great if there was a warning about it on the Eloquent doc page :)

@Dylan-DPC-zz
Copy link

no you don't need to use "fully raw queries". You can achieve things in QB / Eloquent by using selectRaw, whereRaw etc

@KristofMorva
Copy link
Contributor

no you don't need to use "fully raw queries". You can achieve things in QB / Eloquent by using selectRaw, whereRaw etc

I think that's what @jhdxr meant. At least I'm combining Eloquent with QueryBuilder, so it's somewhere between efficiency and code readability.

@Dylan-DPC-zz
Copy link

@KristofMorva nope. I have seen a lot of people end up with using "raw queries" instead of raw-supported-functions

@jhdxr
Copy link
Contributor

jhdxr commented Apr 6, 2017

@Dylan-DPC fine. I'll edit my comments to prove @KristofMorva is right orz

@themsaid themsaid removed the database label Jun 7, 2017
@themsaid
Copy link
Member

Eloquent has some edge cases, when you hit these just switch to using the query builder.

Closing since there's no activity on the issue and it doesn't really indicate a bug.

@joshuaGlass808
Copy link

whereHas producing an AND EXISTS statement is causing tons of slowness (12+seconds) to my queries.. if I run a raw query using left joins it works in under a second.

@mrliptontea
Copy link

@joshuaGlass808 as explained above - it depends. For some queries EXISTS is faster for others JOIN is. Just use join if you find it better in your use case.

Also if you want to query only rows that have some relationship you probably want a full join rather than left join.

@vogtdominikgit
Copy link

vogtdominikgit commented Mar 8, 2018

For those who are interested in a specific case and to see how much improvement a simple change can make. I do not want to start a discussion about how this is bad programming by the developer. I want to bring focus on the bigger issue.

On one side a framework exists to help developers ship more features more frequently, but on the other side a framework can be a pain in terms of performance. This is due to every abstraction layer adding overhead and reducing control over the underlying code-design.

Don't get me wrong, I love Laravel. But I think it is important to address this issue in the laravel docs and add a disclaimer, that these kind of queries might cause a lot of performance-loss in the long term.

Here is my result by changing 5 lines of code:

DB (with indexes)
profiles table with 285.714 entries
devices table with 350.428 entries

with 2,610 calls per minute from various places

Code
code change

QueryTime
querytime drop

CPU
cpu drop

But due to my lack of understanding, can somebody please explain to me why this is such a big improvement?

@TabithaES
Copy link

@vogtdominikgit

Laravel is creating a temporary column on literally every row in the database, in which it fills with true or false. If any indexes can be used, they are probably used after this. Any query that uses whereHas is probably going to be a full table scan.

Here is a similar query, except it does not induce full table scans.

Profile::whereIn('profiles.id', function($query) use ($mac_address) {
    $query->from('devices')
        ->select('devices.profile_id')
        ->where('devices.mac_address', $mac_address);
})

Further reading:

#18415
#3543
https://stackoverflow.com/questions/46785552/poor-wherehas-performance-in-laravel

@alexjose
Copy link

alexjose commented Sep 2, 2019

I am also wondering why Laravel uses exists clause for whereHas() while the same could have been done with in clause with any overhead.

I had to avoid using whereHas and use raw whereIn to increase the performance from 20 s to 100 ms

Would like to know the reason why to stick with exists clause.

@mpyw
Copy link
Contributor

mpyw commented Jun 28, 2020

mpyw/eloquent-has-by-non-dependent-subquery: Convert has() and whereHas() constraints to non-dependent subqueries.

@richardfeliciano
Copy link

richardfeliciano commented Oct 2, 2020

After spend hours trying to solve this, i decided to make the query like the old way, it is ugly i know! But is blazing fast, the time to run the query drop from 28s to 300ms.
No need cache anymore.

@howtomakeaturn
Copy link

I met a similar issue today...

Collection has a many to many relationship with Image.

This is very slow:

$collections = Collection::orderBy('created_at', 'desc')
    ->whereHas('images')
    ->paginate(36);

So I fix the issue with this:

$collections = Collection::orderBy('created_at', 'desc')
    ->whereIn('collections.id', function($query) {
        $query->select('image_collection.collection_id')
            ->from('image_collection')
            ->join('images', 'images.id', 'image_collection.image_id')
            ->whereNull('images.deleted_at');
    })
    ->paginate(36);

Hope this helps.

@mpyw
Copy link
Contributor

mpyw commented Nov 29, 2020

@howtomakeaturn

With #18415 (comment), also you may be able to solve like this:

$collections = Collection::orderBy('created_at', 'desc')
    ->hasByNondependentSubquery('images')
    ->paginate(36);

@Bedivierre
Copy link

Bedivierre commented Dec 18, 2020

https://stackoverflow.com/questions/46785552/poor-wherehas-performance-in-laravel/65358549#65358549 may be of some help. whereHas() indeed is slow as hell

@biiiiiigmonster
Copy link
Contributor

biiiiiigmonster commented Feb 27, 2021

The reason for laravel has(whereHas) sometimes slowly is that implemented with where exists syntax.

For example:

// User hasMany Post
Users::has('posts')->get();
// Sql: select * from `users` where exists (select * from `posts` where `users`.`id`=`posts`.`user_id`)

The 'exists' syntax is a loop to the external table, and then queries the internal table (subQuery) every time.

However, there will be performance problems when the users table has a large amount of data, because above sql select * from 'users' where exists... unable to use index.

It can use where in instead of where exists here without damaging the structure.

// select * from `users` where exists (select * from `posts` where `users`.`id`=`posts`.`user_id`)
// =>
// select * from `users` where `id` in (select `posts`.`user_id` from `posts`)

This will greatly improve performance!

You can take a look at this package hasin, hasin just only use where in syntax instead of where exists compared with the framework has, but everywhere else is the same, such as parameters and call mode even the code implementation, and can be used safely.

@Bedivierre
Copy link

Bedivierre commented Feb 27, 2021

The reason for laravel has(whereHas) sometimes slowly is that implemented with where exists syntax.

For example:

// User hasMany Post
Users::has('posts')->get();
// Sql: select * from `users` where exists (select * from `posts` where `users`.`id`=`posts`.`user_id`)

The 'exists' syntax is a loop to the external table, and then queries the internal table (subQuery) every time.

However, there will be performance problems when the users table has a large amount of data, because above sql select * from 'users' where exists... unable to use index.

It can use where in instead of where exists here without damaging the structure.

// select * from `users` where exists (select * from `posts` where `users`.`id`=`posts`.`user_id`)
// =>
// select * from `users` where `id` in (select `posts`.`user_id` from `posts`)

This will greatly improve performance!

You can take a look at this package hasin, hasin just only use where in syntax instead of where exists compared with the framework has, but everywhere else is the same, such as parameters and call mode even the code implementation, and can be used safely.

And here we go. It still use subquery and if, for example, posts (or any other DB entity) counted in millions there will be problem. This solution have potential to be faster in some situations but on even moderately loaded database it could become an issue.

@biiiiiigmonster
Copy link
Contributor

biiiiiigmonster commented Feb 28, 2021

@Bedivierre Thanks for your reply!

Maybe you're misunderstand why where exists slowly, when the both table is big, the main reason is external table(users), not internal table(posts), this is determined by the mysql mechanism, you can see this comment3543#41595672, that has been explained and the solution is given.

And i saw your comment at stackoverflow, join can really improve performance, but it is difficult to apply in complex scenes. For example, users::has('posts.comments')->get(), and even 10 kinds of relations in the framework, it's hard to rewrite with join.

Finally, welcome to use hasin, give start!:smile:

@arixwap
Copy link

arixwap commented Sep 29, 2021

@vogtdominikgit

Laravel is creating a temporary column on literally every row in the database, in which it fills with true or false. If any indexes can be used, they are probably used after this. Any query that uses whereHas is probably going to be a full table scan.

Here is a similar query, except it does not induce full table scans.

Profile::whereIn('profiles.id', function($query) use ($mac_address) {
    $query->from('devices')
        ->select('devices.profile_id')
        ->where('devices.mac_address', $mac_address);
})

Further reading:

#18415 #3543 https://stackoverflow.com/questions/46785552/poor-wherehas-performance-in-laravel

Cool it work! Really smooth..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests