Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule for detecting expensive calls on a Collection #538

Merged
merged 13 commits into from Apr 24, 2020

Conversation

Daanra
Copy link
Contributor

@Daanra Daanra commented Apr 18, 2020

Description

This pull request serves as a proof of concept for a new rule I'd like to have added to Larastan in light of #326.

This rule would detect method calls such as count() on a collection that could have been called immediately on a query builder instance. Using the query builder to determine the result uses less memory and can be much faster.

Example 1

To determine the number of users, one could query all users into a Collection and then call count() on that Collection:

$num_users = User::all()->count();

But it is much more efficient to just user the query builder to determine the count:

$num_users = User::count();

Example 2

To determine if a certain $user has a role with a particular name, we could query all names of the roles that belong to that user and then loop over the collection to find if it contains that particular name:

$user->roles()->pluck('name')->contains('a role name');

But again, we can also do this in a single query:

$user->roles()->where('name', 'a role name')->exists();

More examples

There are plenty more scenario's where only a query could be used to more efficiently determine something. Below are some more examples. On the left side is the incorrect / inefficient form, on the right side is the more efficient form (only using a query).

User::all()->first(); // User::first()
User::all()->find([1, 2]); // User::whereIn('id', [1, 2])->get()
User::all()->pluck('id'); // User::pluck('id')
User::whereActive()->get()->only([1, 2]); // User::whereActive()->whereIn('id' [1, 2])->get()
$user->roles()->whereSomething()->get()->isEmpty(); // $user->roles()->whereSomething()->doesntExist()
User::query()->get()->modelKeys(); // User::query()->pluck('id');
User::all()->max('age'); // User::max('age');
User::all()->take(3); // User::take(3)->get();
Post::pluck('num_likes')->sum(); // Post::sum('num_likes');
User::get()->last(); // <Flip the applied ordering> User::orderBy('id', 'desc')->first()
User::all()->slice(4, 2); // User::take(2)->skip(4)->get();
User::all()->where('name', 'Taylor'); // User::where('name', 'Taylor')->get()
User::all()->contains('id', '>', 5); // User::where('id', '>', 5)->exists()
DB::table('users')->where('age', '>', 90)->get()->isNotEmpty(); // DB::table('users')->where('age', '>', 90)->exists()

Implementation

The implementation I provide checks all method calls on any object that is a subtype of Illuminate\Support\Collection. It then checks whether that collection was obtained by calling some method on a query builder instance. Depending on what method is called on the collection and what arguments are supplied to that method, the rule might produce an error. For example, if you call sum with a closure:

User::all()->sum(function (User $user): int { return $user->age; });

the rule does not produce an error. But if you use the shorthand notation using a string:

User::all()->sum('age');

It will throw an error. The rule actually checks whether the argument exists as a property on your model. The following, will not report an error:

User::with(['roles'])->get()->pluck('roles');

Considerations

Overriding methods on Custom Collections

Since this implementation supports custom Collection classes, it is possible that a user has overridden some of the native collections methods with a nonintuitive implementation. The max() method might do something that is not possible with a regular query and Larastan will report a false positive. We could disable the rule for methods that are called on a custom collection and are also overridden, but I do not think it's worth it as this should be, as far as I know, an extremely rare scenario.

Proper testing

As far as I could see, there isn't a proper way to test rules yet in this repository, is there? The current tests are not that useful as they can only check whether no error message is reported, but we should really be checking the inverse.

Here some tests that should produce an error message:

<?php

declare(strict_types=1);

namespace Tests\Features\Rules;

use App\User;
use Illuminate\Support\Collection;

class NoUnnecessaryCollectionCallRuleTest
{
    public function testCallCountWrong(): int
    {
        return User::where('id', '>', 5)->get()->count();
    }

    public function testCallGetPluckWrong(): Collection
    {
        return User::query()->get()->pluck('id');
    }

    public function testCallCountWrongly(): int
    {
        return User::all()->count();
    }

    public function testCallFirstWrongly(): ?User
    {
        return User::all()->first();
    }

    public function testCallRelationTakeWrongly(): \Illuminate\Database\Eloquent\Collection
    {
        return User::firstOrFail()->accounts()->get()->take(2);
    }

    public function testDbQueryBuilder(): int
    {
        return DB::table('users')->get()->count();
    }

    public function testVarWrong(): bool
    {
        $query = User::query()->limit(3)->where('status', 'active');

        return $query->get()->isEmpty();
    }

    public function testVarWrongFirst(): ?User
    {
        return User::where('id', 1)->get()->first();
    }

    public function testContainsWrong(): bool
    {
        return User::all()->contains(User::firstOrFail());
    }

    public function testPluckCountWrong(): int
    {
        return User::query()->pluck('id')->count();
    }

    public function testCallWhereWrong(): Collection
    {
        return User::all()->where('id', '<', 4);
    }

    public function testCallDiffWrong(): Collection
    {
        return User::all()->diff([1, 2, 3]);
    }

    /**
     * @return int[]
     */
    public function testCallModelKeysWrong(): array
    {
        return User::all()->modelKeys();
    }

    public function testContainsStrictWrong(): bool
    {
        return User::query()->get()->containsStrict('id', 1);
    }

    public function testMaxWrong(): int
    {
        return User::query()->get()->max('id');
    }
}

Disclaimer

I have no previous experience with Larastan, PHPStan or PHP-parser. My implementation might be really shitty. This pull request serves more as a proof of concept. You are free to throw this implementation away. As long as something like this gets added to Larastan, I'd be really happy. If you have any suggests for improvements, I'd be happy to implement them.

@mfn
Copy link
Contributor

mfn commented Apr 18, 2020

I just ran this on a private project, only found one entry:

 ------ ------------------------------------------------------------------
  243    Called 'first' on collection, but could have been retrieved as a
         query.
 ------ ------------------------------------------------------------------

Code:

        return User
            ::select('users.*')
            ->join('…')
            ->where(…)
            ->orderBy('users.id', 'asc')
            ->get()
            ->first();

Kazinga! Indeed this is a technical bug!

My single-hit-verdict: this has potential!

@canvural
Copy link
Collaborator

Hi,

Thanks for the idea! This sounds really interesting! I'm on board with this.

Right now, I don't know much time to look at the implementation, but I can say some stuff about testing this.

Yes, right now we just have the high-level integration tests. But rules should not be tested in the same way. Check out the PHPStan\Testing\RuleTestCase and how are the rules tested in PHPStan. This way we can also have some feature/integration tests for levels, and also can check the failures.

@szepeviktor
Copy link
Collaborator

Very little remaining :) https://github.styleci.io/analyses/ADe979

@mfn
Copy link
Contributor

mfn commented Apr 21, 2020

Always remember: don't fiddle manually around with styleci, just download the patch from "the button"❗️
image

This is such a time safer and keeps the annoyance out :)

@szepeviktor
Copy link
Collaborator

Rain of arrows!

🌧️

Copy link
Collaborator

@canvural canvural left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!! Looks great. I just have small nitpicks.

I'm also fine with enabling this by default 👍

src/Rules/NoUnnecessaryCollectionCallRule.php Outdated Show resolved Hide resolved
src/Rules/NoUnnecessaryCollectionCallRule.php Outdated Show resolved Hide resolved
tests/Rules/NoUnnecessaryCollectionCallRuleTest.php Outdated Show resolved Hide resolved
src/Rules/NoUnnecessaryCollectionCallRule.php Outdated Show resolved Hide resolved
src/Rules/NoUnnecessaryCollectionCallRule.php Outdated Show resolved Hide resolved
src/Rules/NoUnnecessaryCollectionCallRule.php Outdated Show resolved Hide resolved
src/Rules/NoUnnecessaryCollectionCallRule.php Show resolved Hide resolved
@Daanra Daanra requested a review from canvural April 22, 2020 18:33
Copy link
Collaborator

@canvural canvural left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good!

Just I'm not sure about having [NoUnnecessaryCollectionCallRule] in the error output. All the other PHPStan rules don't have anything like this. So I think we should follow the same style.

What everybody else thinks?

@BertvanHoekelen
Copy link
Contributor

❤️ this PR.

I think having [NoUnnecessaryCollectionCallRule] in the error output can come in handy as this rule's behaviour can be customised in the settings.

@Daanra
Copy link
Contributor Author

Daanra commented Apr 23, 2020

The reason why I prepended [NoUnnecessaryCollectionCallRule] to the error output is that it allows a developer to easily google and find the correct documentation concerning this rule.

When a developer only sees Called 'first' on collection, but could have been retrieved as a query., it might not be apparent what the actual issue is. They might have to dig through issues on GitHub to find the meaning behind this rule and how to configure it.

I think it might be a good idea to display the name of the rule in the error output for all of the rules that we add. I think that if we were to add more rules, it is quite likely that these rules will be a bit opinionated and for that reason, it is also likely that they will be configurable. Prepending the rule name just makes the life of a developer a bit easier (and we will probably have fewer developers opening issues here).

I do agree with you though that having the inconsistency with PHPStan's way of formatting rule errors is not that great.

@mfn
Copy link
Contributor

mfn commented Apr 23, 2020

PHPStan's way of formatting rule errors

Does an error formatter have a access to this information (phpstans options --error-format)? I'm just wondering if a custom formatter could solve this on a general level.

But OTOH, the vast majority of violations I had to deal with, the name of the rule didn't matter so this one, being configurable, truly stands out I guess.

@canvural
Copy link
Collaborator

@mfn I don't think PHPStan error formatters provide that information.

I also think displaying the rule name is a good idea. But there are a lot of PHPStan rules out there and as far as I can see none of them have it in the error message like that. Maybe it's PHPStan's job to display it with the errors.

What about changing error message to Called 'first' on Laravel collection, but could have been retrieved as a query or Called 'first' on Eloquent collection, but could have been retrieved as a query ? It'll be clear this error is related to Laravel, thus Larastan.

@Daanra
Copy link
Contributor Author

Daanra commented Apr 24, 2020

@canvural I removed the prefix from the error and changed the error message to your first suggestion. I'm guessing that should be clear enough.

Copy link
Collaborator

@canvural canvural left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed one last thing. If you can fix that it's good to go!

src/Rules/NoUnnecessaryCollectionCallRule.php Show resolved Hide resolved
@canvural
Copy link
Collaborator

@canvural I removed the prefix from the error and changed the error message to your first suggestion. I'm guessing that should be clear enough.

It was just a suggestion. But if no one objected, I guess we can go with that 😄 Thanks!

@canvural canvural changed the title [POC] Add rule for detecting expensive calls on a Collection Add rule for detecting expensive calls on a Collection Apr 24, 2020
@canvural canvural merged commit e0e141a into larastan:master Apr 24, 2020
@canvural
Copy link
Collaborator

Thank you again for this!

@Daanra
Copy link
Contributor Author

Daanra commented Apr 24, 2020

Thanks for taking the time and merging it! It's much appreciated. 😄

@deleugpn
Copy link
Contributor

I just got this on my CI complaining about a case where there's a pluck after a get. Truly amazing addition to Larastan. Thanks Daanra and the whole team of maintainers.
Larastan is coming up really nice even though Laravel isn't optimized for static analyses.

@nunomaduro
Copy link
Collaborator

@Daanra Do you have twitter? I would like to make a tweet about this, but can't find your twitter handle. :/

@Daanra
Copy link
Contributor Author

Daanra commented Apr 28, 2020

@nunomaduro Unfortunately, I do not have Twitter :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants