Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6.0] Add the ability select a subquery using addSelect() #29567

Merged
merged 2 commits into from
Aug 16, 2019
Merged

[6.0] Add the ability select a subquery using addSelect() #29567

merged 2 commits into from
Aug 16, 2019

Conversation

reinink
Copy link
Contributor

@reinink reinink commented Aug 14, 2019

This PR adds the ability to select a sub query directly from the addSelect() method. This is currently only possible to do via the selectSub($query, $column) method, which has a few issues:

1. selectSub() has a less than ideal argument order

Since sub queries are written as closures, it's much nicer to have the column alias ($as) first. This is particularly important when you have longer, more complex sub queries.

With this change, the subquery is the second parameter: addSelect($as, $query)

2. selectSub() requires you to specify columns

When you add a sub query using selectSub() you must also now specify your columns (*), otherwise you'll only get the sub query column back, not all the columns on the table. This behaviour is somewhat confusing, and also adds extra work each time you use a sub select.

With this change, if you pass a subquery to addSelect() as the second argument, it will automatically add "table".* to the query if you haven't already explicitly specified any columns. This is inline with the existing withCount() behaviour:

if (is_null($this->query->columns)) {
$this->query->select([$this->query->from.'.*']);
}

3. selectSub() isn't consistent with the framework

As a general rule, the query builder already automatically treats closures and query builder instances as sub queries. For example, if you pass a closure to the second argument in a where() clause, it will be treated as a sub query:

User::where('id', function ($query) {
    $query->select(...);
});

However, currently, if you'd like to add a subquery select to your query, you have to use selectSub(). I believe making this possible via addSelect() is more natural, and more consistent with the framework in general.

Motivation

I think sub queries are an under-utilized feature in Laravel. I think we can improve both the API and also add more documentation around sub queries. For example, I have been using sub queries to create dynamic relationships. Consider the following example:

class User extends Model
{
    public function lastLogin()
    {
        return $this->belongsTo(Login::class);
    }

    public function scopeWithLastLogin($query)
    {
        $query->addSelect('last_login_id', Login::select('id')
            ->whereColumn('user_id', 'users.id')
            ->latest()
            ->limit(1)
        )->with('lastLogin');
    }
}

$users = User::withLastLogin()->get();
$user->lastLogin->created_at;

/*
[
  [
    "id" => 1
    "first_name" => "Clark"
    "last_name" => "Kent"
    "email" => "clark@krypton.io"
    "last_login_id" => 666666666666
  ]
]
*/

Future considerations

This PR adds this functionality in the simplest possible way. If the addSelect() method receives two arguments, and the first is a string, and the second is a valid subquery, then it treats it as a subquery.

However, the addSelect() method currently supports both arrays and multiple arguments as well. I could see this method being extended in the future to also allow that type of behaviour. For example:

$users = User::select(['last_name', 'last_login_at' => Login::select('created_at')
    ->whereColumn('user_id', 'users.id')
    ->latest()
    ->limit(1)
]);

However, to keep things simple, I think it's better to just start here.

@staudenmeir
Copy link
Contributor

staudenmeir commented Aug 14, 2019

I think it makes more sense to adjust selectSub() (or selectRaw()) and add if (...) { $this->select($this->from.'.*'); } there. Then withCount() wouldn't have to select the default columns itself.

If we adjust addSelect(), we now have two ways of selecting subqueries that behave slightly differently and expect the parameters in a different order.

I also don't think it's a good idea to have one method with the parameter order $as, $query, while other methods like fromSub() and joinSub() use $query, $as.

@reinink
Copy link
Contributor Author

reinink commented Aug 14, 2019

I think it makes more sense to adjust selectSub() (or selectRaw()) and add if (...) { $this->select($this->from.'.*'); } there. Then withCount() wouldn't have to select the default columns itself.

Yeah not opposed, but wouldn't that be a breaking change? Anyone using selectSub() directly right now would experience a change in behaviour (ie. getting all columns instead of just the sub query column).

If we adjust addSelect(), we now have two ways of selecting subqueries that behave slightly differently and expect the parameters in a different order.

I agree, having different subquery methods with different behaviour isn't ideal. But neither is the current implementation of selectSub(). So, how do we make improvements to the API, without creating breaking changes? This felt like the most pragmatic approach.

I also don't think it's a good idea to have one method with the parameter order $as, $query, while other methods like fromSub() and joinSub() use $query, $as.

Good point. We could update fromSub() and joinSub() to support both argument orders. I feel like $as comes second because that's how you do it in SQL, but practically speaking in PHP it doesn't work out nearly as nice. Thoughts?

@staudenmeir
Copy link
Contributor

Yeah not opposed, but wouldn't that be a breaking change? Anyone using selectSub() directly right now would experience a change in behaviour (ie. getting all columns instead of just the sub query column).

It would be a breaking change, but it should only affect very few people and I think the new behavior makes much more sense.

Are there use cases where you would want to select only a subquery without any other columns?

@reinink
Copy link
Contributor Author

reinink commented Aug 14, 2019

It would be a breaking change, but it should only affect very few people and I think the new behavior makes much more sense.

Right, so maybe this is an appropriate change for 6.x?

Are there use cases where you would want to select only a subquery without any other columns?

Yes, I think it's possible. My thinking what that you could still use selectSub() to accomplish this. But obviously that won't work if we change things. I think we'd need to add support for subqueries to select() as well then.

@GrahamCampbell GrahamCampbell changed the title Add the ability add a sub query select [6.0] Add the ability add a sub query select Aug 14, 2019
@reinink reinink changed the title [6.0] Add the ability add a sub query select [6.0] Add the ability select a subquery using addSelect() Aug 14, 2019
@reinink
Copy link
Contributor Author

reinink commented Aug 15, 2019

@staudenmeir Hey! What if we tried to fix the more core issue here, which is that addSelect() actually does select() behaviour if no columns were previous assigned.

I've started hacking on a PR for this and would love to get your thoughts:

#29588

Obviously some breaking changes here, but I think they won't be that bad, and might be a worthwhile change for 6.0.

Thoughts?

@reinink
Copy link
Contributor Author

reinink commented Aug 15, 2019

Okay folks, I've reworked this PR and I think it's now a lot better. 🎉

First off, I have not changed subSelect() at all. At first I thought about adding the table.* logic here, but in the end it didn't actually make sense there. The reason is because when we make a select() query with a subquery, we don't want the table.* added. However, it's a very safe assumption that when we use addSelect(), we do want table.* added. I almost see subSelect() as a private method that's (now) used by select() and addSelect().

Second, I removed the two argument option for addSelect(). In the end I agreed with @staudenmeir that it felt odd for one method to have $as first, with all the other subquery methods having this argument come second. Instead, I added keyed array syntax for this. This is similar to how with() works in Laravel. For example:

$users = User::select(['id', 'name', 'last_login_at' => function ($query) {
    $query->select('created_at')
        ->from('logins')
        ->whereColumn('user_id', 'users.id')
        ->latest()
        ->limit(1);
}])->get();

You can also use this syntax with addSelect(). However, with addSelect(), it will automatically add table.* should no other columns be specified yet.

$users = User::addSelect(['last_login_at' => function ($query) {
    $query->select('created_at')
        ->from('logins')
        ->whereColumn('user_id', 'users.id')
        ->latest()
        ->limit(1);
}])->get();

What's also cool about this approach over my first implementation is that you can technically add multiple subqueries at a time, or a mix of regular columns and subqueries.

It's important to note here that there are zero breaking changes with this PR (at least from what I can tell).

$this->columns = array_merge((array) $this->columns, $column);
$this->selectSub($column, $as);
} else {
$this->columns = array_merge((array) $this->columns, [$column]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$this->columns = array_merge((array) $this->columns, [$column]);
$this->columns[] = $column;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm yup, so obvious. I did that in select(). Just missed it here. Thanks!

$columns = is_array($columns) ? $columns : func_get_args();

foreach ($columns as $as => $column) {
if (is_string($as) && (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a case where the key is a string but the column is not a subquery?

Someone using select() with an associative array (for whatever reason)? $query->select(['foo' => 'bar'])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.

Biggest reason I am doing the checks is for safety. My concern was that without the additional checks, someone may accidentally send user provided data as the value ($column) with a named key, and then it would hit the subquery logic. So, for now, to be extra safe, I check to make sure it's either a closure or query builder instance.

That said, in the future we could potentially use this as a way of doing aliasing for all columns, not only subqueries. However, that was outside of the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aliasing "normal" columns this way is a great idea. I think we should consider it for this PR since we are already working in the code.

@dillingham
Copy link
Contributor

dillingham commented Aug 16, 2019

woohoo! to array handling

Having both the two parameter and array syntax approach might be nice

$lastLoginAt = Login::select('created_at')->latest()->limit(1);

User::query()
    ->select($lastLoginAt, 'last_login_at')
    ->orderBy($lastLoginAt)
    ->get();

Also, like withCount, maybe an alias can be assumed.. to avoid a 2nd parameter

$user = User::addSelect($lastLoginAt)->orderBy($lastLoginAt)->first();

$user->logins_created_at 
$query->from . '_' . implode('_', $query->columns);

@staudenmeir
Copy link
Contributor

@reinink I think that's a great approach!

@reinink
Copy link
Contributor Author

reinink commented Aug 16, 2019

@staudenmeir Awesome! Thanks for your input. 💪

@dillingham Yeah I considered that, but I think that could be added as a subsequent PR. To be honest, I actually prefer the array format—it feel like it's more inline with the "Laravel way". And the auto-aliasing is neat, but feels unnecessary.

@dillingham
Copy link
Contributor

dillingham commented Aug 16, 2019

I see your point. If from($query, ‘as’) happens it might be worth revisiting for parallelism.

Edit: Not sure if orderBy requires the array, but if not then that’s another opportunity to synchronize

@reinink
Copy link
Contributor Author

reinink commented Aug 16, 2019

I see your point. If from($query, ‘as’) happens it might be worth revisiting for parallelism.

Agreed.

@reinink
Copy link
Contributor Author

reinink commented Aug 16, 2019

@driesvints Can you restart the Travis build? I think that fail was a fluke. 🙏

@driesvints
Copy link
Member

@reinink done. I really need to have a look at that at some point 🤔

@taylorotwell taylorotwell merged commit c23bb6d into laravel:master Aug 16, 2019
@iraklisg
Copy link

iraklisg commented Nov 5, 2019

Is the ability to select a subquery using select() with keyed array syntax described somewhere in the documentation, because I cannot find something relevant.
If not, maybe we could update it 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants