Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use WHERE ... IN for simple Wherehas() and Has() constraints #8720

Closed
kaymes opened this issue May 13, 2015 · 9 comments
Closed

Use WHERE ... IN for simple Wherehas() and Has() constraints #8720

kaymes opened this issue May 13, 2015 · 9 comments

Comments

@kaymes
Copy link

kaymes commented May 13, 2015

I already posted this as part of #3543 and #5328, but I probably should have known better than adding it to a closed issue. Thus, I'm opening a new one.

The issue is that the has() and whereHas() constraints in the eloquent query builder creates code that can be very slow and results in nested table scans for seemingly harmless queries. I wrote some code that rectifies the issue for the most common use case and it would be nice to see it or something similar included in Laravel.

What is the problem?

Note: I'm only referring to Mysql not sure how other databases optimise things.

Look at the task of "Find all users that commented on any post belonging to a certain thread". (Assuming Post, Comment and User models with the obvious relationships).

User::whereHas('comments', function($q) use ($threadId) {
  $q->whereHas('post', function($q) use ($threadId) {
    $q->where('thread_id', $threadId);
  });
})->get();

The way laravel crafts the has(...) queries using correlated subqueries often results in nested full table scans. That is, a full table scan of users where every row triggers a full table scan of comments where again every row triggers a full table scan of posts. In short, once your database grows, you are dead.

However, the task itself would be much faster if we first get the set of posts, then get the set of comments and then get the set of users. Doing it this way the execution time only depends on the number of records actually involved but not the overall size of the database.

Currently there are only two ways to achieve this:

  1. rewrite the whole thing using join statements and a final distinct.
  2. do it manually in PHP by first pulling out the IDs of the sub-queries manually and passing them on to another query using the whereIn(..., array) method.

As far as Mysql 5.5 is concerned that is actually where the story ends.
Mysql 5.5 turns a where ... in (select ...) statement into a where exists ... correlated subquery which again results in nested table scans.

However, Mysql starting from 5.6.6 has the "Semi Join" and "Materialization" optimisation strategies for where ... in (select ...) type queries. The former tries to resolve the query as a join type query. If that's not possible, the latter evaluates the sub-queries into temporary tables and takes it from there.

Thus, using where ... in (select ...) and Mysql 5.6 is the solution.

The patch

I wrote simple code that allows the has(...) method (which gets ultimately called by whereHas, orHas and all the other flavours) to take advantage of relationships that can craft a where ... in (select ...) constraint if the count value is >= 1.
I also included the constraint generating code for the HasOneOrMany, BelongsTo and BelongsToMany relationshipts. I didn't add the morphing ones because I'm not using them and thus cannot test them easily. But adding them should be straight forward.
The code is opportunistic in that it falls back to the old behaviour whenever the new where in facility isn't applicable.

I wrote and tested the code for Laravel 4.2, but as far as I can see there haven't been changes to this section of code in 5.0, so it should apply to both.

I apologise for not presenting a finished pull request. My own implementation in my project is by deriving from the relationship classes so I can leave the vendor folder unchanged. Thus, I'm simply pasting the relevant methods here.

\Illuminate\Database\Eloquent\Builder::has()

I added the big if block, the rest is unchanged.

    public function has($relation, $operator = '>=', $count = 1, $boolean = 'and', Closure $callback = null)
    {
        if (strpos($relation, '.') !== false)
        {
            return $this->hasNested($relation, $operator, $count, $boolean, $callback);
        }

        $relation = $this->getHasRelationQuery($relation);

        if ( (($operator == '>=') && ($count == 1)) || (($operator == '>') && ($count == 0)) ) {
            if (method_exists($relation, 'getWhereHasOneConstraints')) {
                $whereHasOneConstraints = $relation->getWhereHasOneConstraints($callback, $this);
                if (is_array($whereHasOneConstraints) && !empty($whereHasOneConstraints)) {
                    return $this->whereRaw($whereHasOneConstraints['sql'], $whereHasOneConstraints['bindings'], $boolean);
                }
            }
        }

        $query = $relation->getRelationCountQuery($relation->getRelated()->newQuery(), $this);

        if ($callback) call_user_func($callback, $query);

        return $this->addHasWhere($query, $relation, $operator, $count, $boolean);
    }

Illuminate\Database\Eloquent\Relations\BelongsTo::getWhereHasOneConstraints()

    public function getWhereHasOneConstraints(Closure $callback, $parent) {
        $parentKey = $this->wrap($this->getQualifiedForeignKey());
        $selectKey = $this->wrap($this->query->getModel()->getTable().'.'.$this->otherKey);

        if ($callback) call_user_func($callback, $this->query);
        $this->query->select(new Expression($selectKey));

        return array(
            'sql' => new Expression($parentKey .' in (' . $this->query->toSql() . ')'),
            'bindings' => $this->query->getBindings(),
        );      
    }

Illuminate\Database\Eloquent\Relations\HasOneOrMany::getWhereHasOneConstraints()

    public function getWhereHasOneConstraints(Closure $callback, $parent) {


        $parentKey = $this->wrap($this->getQualifiedParentKeyName());
        $selectKey = $this->wrap($this->getHasCompareKey());

        if ($callback) call_user_func($callback, $this->query);
        $this->query->select(new Expression($selectKey));

        return array(
            'sql' => new Expression($parentKey .' in (' . $this->query->toSql() . ')'),
            'bindings' => $this->query->getBindings(),
        );  
    }   

Illuminate\Database\Eloquent\Relations\BelongsToMany::getWhereHasOneConstraints()

    public function getWhereHasOneConstraints(Closure $callback, $parent) {

        if ($parent->getQuery()->from == $this->getRelated()->newQuery()->getQuery()->from) {
            // Table aliasing isn't implemented here. Return null to tell the caller to fall back
            // to the count query method.
            return null;
        }

        $parentKey = $this->wrap($this->getQualifiedParentKeyName());
        $selectKey = $this->wrap($this->getHasCompareKey());

        if ($callback) call_user_func($callback, $this->query);
        $this->query->select(new Expression($selectKey));

        return array(
                'sql' => new Expression($parentKey .' in (' . $this->query->toSql() . ')'),
                'bindings' => $this->query->getBindings(),
        );
    }   
@GrahamCampbell
Copy link
Member

Wow, thanks, though could you please provide this as an actual PR please. Posing code in here is no use to us I'm afraid.

kaymes added a commit to kaymes/framework that referenced this issue May 14, 2015
…ationship queries (has(), whereHas(), ...)

See laravel#8720 for details.

The facility is implemented for belongsTo, BelongsToMany, HasOne and HasMany. Other relationship types (i.e. Morph*) are not implemented yet.
The code falls back to the old behaviour whenever the new query strategy is not applicable or implemented.
@kaymes
Copy link
Author

kaymes commented May 14, 2015

Ok, here you go. I pasted the code into a PR.

@GrahamCampbell
Copy link
Member

@kaymes, thank you, but you seem to have sent this to the wrong branch. Laravel 4.2 is closed for all non-critical fixes. Sending a PR to 5.0 would be preferred if you have the time. :)

kaymes added a commit to kaymes/framework that referenced this issue May 14, 2015
…ationship queries (has(), whereHas(), ...) to avoid full table scans.

See laravel#8720 for details.

The facility is implemented for belongsTo, BelongsToMany, HasOne and HasMany. Other relationship types (i.e. Morph*) are not implemented yet.
The code falls back to the old behaviour whenever the new query strategy is not applicable or implemented.
@kaymes
Copy link
Author

kaymes commented May 14, 2015

@GrahamCampbell, here you go.

@GrahamCampbell
Copy link
Member

Taylor doesn't seem to like this change. I assume he thinks it's too hacky?

@Charkhan
Copy link

It's a real shame, in large projects you can't rely on whereHas method, you have to make a upfront query to get the IDs you want (with massive join if you have to filter through other tables) PLUS on your returned results request -> you have to put some eager load filters (to match the join filter on your first query)

That's very hacky, and a simple whereHas on a 350K rows table take about 1.5sec to run on Mysql :/

@crissi
Copy link
Contributor

crissi commented May 20, 2015

+1

@GrahamCampbell
Copy link
Member

Closing as this was rejected by Taylor.

@Argo232
Copy link

Argo232 commented Jun 15, 2015

This decision is very frustrating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants