Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Add one-of-many relationship (inner join) #37362

Merged
merged 43 commits into from May 17, 2021
Merged

Conversation

@cbl
Copy link
Contributor

@cbl cbl commented May 13, 2021

Background

This pr provides a solution for creating one-to-one relations that are a partial relation of a one-to-many relation.

These are: hasOne ∈ hasMany, morphOne ∈ morphMany

(Especially useful for event sourcing approaches)

This pr solves the same problem as #37252, but using an inner join instead of a subselect.

On stackoverflow there is the tag greatest-n-per-group, which deals with this kind of queries.

Problem

To explain the problem I use the latest_login example. A user has many logins but only one latest_login (latest_login ∈ logins). One might think that this can build as follows:

public function latest_login()
{
    $this->hasOne(Login::class)->orderByDesc('id');
}

However this does not work for several reasons.

1. This creates a n+1 problem when eager loading the relationship.

If we take a look at the query that is executed when eager loading the relationship we see that all logins for the given users are loaded where we only need one for each user:

select * from "logins" where "logins"."user_id" in (?) order by "id" desc

Adding a ->limit(1) obviously does not solve this since only one login is loaded where we need all latest logins for the given users:

select * from "logins" where "logins"."user_id" in (?) order by "id" desc limit 1

2. Querying works not as expected.

  • e.g.: latest_login()->get() gets all logins (Same problem with count, ...)
  • e.g.: latest_login()->is($firstLogin) returns true even if there are newer logins.
  • ...

The Solution

Filter the intersection of the relation joined with itself:

SELECT *
FROM `logins`
INNER JOIN (
    SELECT MAX(id) AS id
    FROM logins
    GROUP BY logins.user_id
) AS latest_login 
ON latest_login.id = logins.id

This way only the needed models are eager loaded and all relation related eloquent methods provide the wanted result:

$user->latest_login()->is($login);
$user->latest_login()->isNot($login);
$user->latest_login()->where('foo', 'bar')->first();

User::with('latest_login')->get();
User::whereHas('latest_login')->get();

// Filter only users who last logged in on the iphone.
User::whereHas('latest_login', fn($q) => $q->whereDevice("iphone"))->get();

Examples

This repository cbl/laravel-one-of-many contains all the examples described below, to try out locally


The following describes various use cases and how they are build:

latest_login/first_login

public function latest_login()
{
    return $this->hasOne(Login::class)->ofMany('id', 'max'); // id and max are default
}
public function first_login()
{
    return $this->hasOne(Login::class)->ofMany('id', 'min');
}

payment_state

This example is an event sourced state. All state changes are stored in the states table and the payment_state relation of the model is the latest created state with the type payment_state.

public function payment_state()
{
    return $this->hasOne(State::class)->ofMany(
        [ 'id' => 'max'], 
        function($q) { 
            $q->where('type', 'payment_state');
        }
    );
}
The constraint for `type` needs to be added to the inner join subselect to not load an id for another type that is higher.
SELECT *
FROM `states`
INNER JOIN (
    SELECT MAX(id) AS id
    FROM states
    WHERE type = 'payment_state'
    GROUP BY states.model_id
) AS state 
ON state.id = states.id

As a morphOne:

public function payment_state()
{
    return $this->morphOne(State::class, 'stateful')->ofMany(
        [ 'id' => 'max'], 
        function($q) { 
            $q->where('type', 'payment_state');
        }
    );
}
See SQL query...
SELECT *
FROM `states`
INNER JOIN (
    SELECT MAX(id) AS id
    FROM states
    WHERE type = 'payment_state'
    GROUP BY states.stateful_id, states.stateful_type
) AS state 
ON state.id = states.id

price

This example contains an event sourced price for a product. The price for the product is the latest published price with the maximum id, so it depends on multiple columns:

public function price()
{
    return $this->hasOne(Price::class)->ofMany([
        'published_at' => 'max',
        'id' => 'max'
    ], function($q) {
        $q->where('published_at', '<', now());
    });
}

The problem that comes with this use case is that the first column published_at is not unique, there may be duplicates, therefore we want to get the row that has the max id. To solve this, nested inner join clauses are added for each column and the associated aggregate.

First inner join.
INNER JOIN (
    SELECT MAX(publish_at) AS publish_at, prices.product_id
    FROM prices
    WHERE published_at < '2021-05-13 17:00:32'
    GROUP BY prices.product_id
) AS price 
ON  price.publish_at = prices.publish_at AND price.product_id = prices.product_id
Second inner join.
INNER JOIN (
    SELECT MAX(id) AS id, prices.product_id
    FROM prices
    GROUP BY prices.publish_at
) AS price 
ON price.id = prices.id AND price.product_id = prices.product_id
The following query combines both inner joins and loads the latest published_at with the maximum id.
SELECT *
FROM `prices`
INNER JOIN (
    SELECT
    MAX(id) AS id, prices.product_id
    FROM prices
    INNER JOIN (
        SELECT MAX(publish_at) AS publish_at, prices.product_id
        FROM prices
        WHERE published_at < '2021-05-13 17:00:32'
        GROUP BY prices.product_id
    ) AS price 
    ON price.publish_at = prices.publish_at AND price.product_id = prices.product_id
    GROUP BY prices.publish_at
) AS price 
ON price.id = prices.id AND price.product_id = prices.product_id

How It Works

Explained how it works under the hood ...

Building Inner Joins

This is where the nested joins are built:

foreach ($columns as $column => $aggregate) {
$groupBy = isset($previous) ? $previous['column'] : $this->foreignKey;
$sub = $this->newSubQuery($groupBy, $column, $aggregate);
if (isset($previous)) {
$this->addJoinSub($sub, $previous['sub'], $previous['column']);
} elseif (isset($closure)) {
$closure($sub);
}
if (array_key_last($columns) == $column) {
$this->addJoinSub($this->query, $sub, $column);
}
$previous = [
'sub' => $sub,
'column' => $column,
];
}

The first inner join is grouped by the foreign key, the following inner joins are grouped by the previous aggregate:

foreach ($columns as $column => $aggregate) {
$groupBy = isset($previous) ? $previous['column'] : $this->foreignKey;
$sub = $this->newSubQuery($groupBy, $column, $aggregate);

The closure always receives the subselect for the first aggregate from the array:

if (isset($previous)) {
$this->addJoinSub($sub, $previous['sub'], $previous['column']);
} elseif (isset($closure)) {
$closure($sub);
}

Existence Queries

In order for existence queries to be working, the join must be added in getRelationExistenceQuery:

public function getRelationExistenceQuery(Builder $query, Builder $parentQuery, $columns = ['*'])
{
if (! $this->isOneOfMany()) {
return parent::getRelationExistenceQuery($query, $parentQuery, $columns);
}
$query->getQuery()->joins = $this->query->getQuery()->joins;

CompareRelatedModels

Since the relations are a partial relation of has-many, it is not enough to check if the keys match, it must be checked via an existence query if the given model matches that of the relation:

public function is($model)
{
return ! is_null($model) &&
$this->compareKeys($this->getParentKey(), $this->getRelatedKeyFrom($model)) &&
$this->related->getTable() === $model->getTable() &&
$this->related->getConnectionName() === $model->getConnectionName() &&
$this->compareOneOfMany($model);
}

protected function compareOneOfMany($model)
{
if (! $this instanceof PartialRelation) {
return true;
}
if (! $this->isOneOfMany()) {
return true;
}
return $this->query
->whereKey($model->getKey())
->exists();
}

Tests

The integration test case tests, whether the n+1 problem is solved for eager loading and only required models are loaded:

public function testItOnlyEagerLoadsRequiredModels()
{
$this->retrievedLogins = 0;
User::getEventDispatcher()->listen('eloquent.retrieved:*', function ($event, $models) {
foreach ($models as $model) {
if (get_class($model) == Login::class) {
$this->retrievedLogins++;
}
}
});
$user = User::create();
$user->latest_login()->create();
$user->latest_login()->create();
$user = User::create();
$user->latest_login()->create();
$user->latest_login()->create();
User::with('latest_login')->get();
$this->assertSame(2, $this->retrievedLogins);
}

Furthermore, the behavior of all use cases described above is tested here:

https://github.com/laravel/framework/blob/571db720abbb41c3631577037039c89d85b54be0/tests/Database/DatabaseEloquentHasOneOfManyTest.php

@taylorotwell
Copy link
Member

@taylorotwell taylorotwell commented May 13, 2021

So... which PR am I supposed to review?

@cbl
Copy link
Contributor Author

@cbl cbl commented May 13, 2021

@taylorotwell Both prs solve the same problem in a different way, It is up to you which one you prefer. However this solution is much faster and less complex which makes it better for users and maintainers.

I actually think that the other solution offers few advantages, so I think you should review this pr.

@taylorotwell
Copy link
Member

@taylorotwell taylorotwell commented May 14, 2021

How would this be ported to support MorphOne as well?

@taylorotwell
Copy link
Member

@taylorotwell taylorotwell commented May 14, 2021

I've pushed up a handful of formatting fixes - removed methods that were never called at all (tests still pass when I remove them, so I assume not needed).

@cbl
Copy link
Contributor Author

@cbl cbl commented May 14, 2021

How would this be ported to support MorphOne as well?

The query would look like this:

SELECT *
FROM `states`
INNER JOIN (
    SELECT MAX(id) AS id
    FROM states
    GROUP BY states.stateful_id, states.stateful_type,
) AS state 
ON state.id = states.id

For examples that do not have the key as aggregate, the morph id and type need bo added to the ON constraints:

...
INNER JOIN (
    SELECT MAX(foo) AS foo
    ...
    GROUP BY states.stateful_id, states.stateful_type,
) AS state 
ON state.foo = states.foo AND state.stateful_id = states.stateful_id AND state.stateful_type = states.stateful_type

To the CanBeOneOfManyTrait can be added an abstract function that builds GROUP BY for subselects and one that builds ON constraints for inner joins, so the different implementations can be built in the respective relations classes.

@sblawrie
Copy link
Contributor

@sblawrie sblawrie commented May 20, 2021

@cbl I changed your fiddle so that one of the drafts has a different page count, and it doesn't work. See here: http://sqlfiddle.com/#!9/cef514/1/0.

By the way, my suggested fix will also fix @dianfishekqi's issue above.

@Jefemy
Copy link

@Jefemy Jefemy commented May 21, 2021

Trying to implement this and running into an issue

Using the latestOfMany and eager loading I am getting a query like this

SELECT *
FROM   `forum_posts`
       INNER JOIN (SELECT Max(id) AS id,
                          `forum_posts`.`thread_id`
                   FROM   `forum_posts`
                   GROUP  BY `forum_posts`.`thread_id`) AS `latestPost`
               ON `latestPost`.`id` = `forum_posts`.`id`
                  AND `latestPost`.`thread_id` = `forum_posts`.`thread_id`
WHERE  `forum_posts`.`thread_id` IN ( 4794, 4797, 4811, 4816 )

It looks fine and works but the inner join appears to be selecting the max for every row and gets very inefficient for large tables

I believe it should be generating something similar to

SELECT *
FROM   `forum_posts`
       INNER JOIN (SELECT Max(id) AS id,
                          `forum_posts`.`thread_id`
                   FROM   `forum_posts`
                   WHERE  `forum_posts`.`thread_id` IN
                          ( 4794, 4797, 4811, 4816 )
                   GROUP  BY `forum_posts`.`thread_id`) AS `latestPost`
               ON `latestPost`.`id` = `forum_posts`.`id`
                  AND `latestPost`.`thread_id` = `forum_posts`.`thread_id`
WHERE  `forum_posts`.`thread_id` IN ( 4794, 4797, 4811, 4816 ) 

This query runs much faster on my local database and returns the same data

@cbl
Copy link
Contributor Author

@cbl cbl commented May 21, 2021

@Jefemy This is wip see #37431 (comment)

@Sergiobop
Copy link

@Sergiobop Sergiobop commented May 21, 2021

Hey, first of all, awesome job!

However, i'm trying the feature on pgsql, using uuid's as PK, produces this error:

Illuminate\Database\QueryException: SQLSTATE[42883]: Undefined function: 7 ERROR: function max(uuid) does not exist

@cbl
Copy link
Contributor Author

@cbl cbl commented May 21, 2021

@Sergiobop seems like max is not available for your PostgreSQL version.

@dianfishekqi
Copy link

@dianfishekqi dianfishekqi commented May 21, 2021

Hey, first of all, awesome job!

However, i'm trying the feature on pgsql, using uuid's as PK, produces this error:

Illuminate\Database\QueryException: SQLSTATE[42883]: Undefined function: 7 ERROR: function max(uuid) does not exist

@Sergiobop MAX(UUID) max function does not have a implementation for uuid, you can use it on a numeric or datetime column

@cbl
Copy link
Contributor Author

@cbl cbl commented May 21, 2021

@Sergiobop @dianfishekqi

Regarding UUIDs, they are sortable by using (string) Str::orderedUuid() as your model UUIDs instead of the typical (string) Str::uuid().

@Sergiobop
Copy link

@Sergiobop Sergiobop commented May 21, 2021

@dianfishekqi Nope, im doing this in my Model.php (I'm not even using the id, im using other column, of type 'datetime')

public function current_other_model(): HasOne
{
    return $this->hasOne(OtherModel::class)->ofMany(['date' => 'max']);
}

And i get the same error. (I tried ->ofMany('date', 'max') too)

I had to write my own AGGREGATE to skip the error (maybe some docs should be added, novice pgsql users who use uuid as primary keys will have to deal with this)

But now, i have this:
Grouping error: 7 ERROR: column other_models.model_id; must appear in the GROUP BY clause or be used in an aggregate function

Maybe this error? #37362 (comment)

@dianfishekqi
Copy link

@dianfishekqi dianfishekqi commented May 21, 2021

@Sergiobop #37436 this should fix the err


if (! array_key_exists($keyName, $columns)) {
$columns[$keyName] = 'MAX';
}

This comment has been minimized.

@ibrasho

ibrasho May 23, 2021
Contributor

Is this needed for proper functionality? It doesn't allow this feature to be used with UUID columns.

@cbl
Copy link
Contributor Author

@cbl cbl commented May 23, 2021

@ibrasho A unique column must be added, otherwise duplicate values would lead to unwanted behaviour.

@ibrasho
Copy link
Contributor

@ibrasho ibrasho commented May 24, 2021

The only way to get this to work with UUID in Postgres specifically is to cast the column type to text. I can't do this without overriding CanBeOneOfMany::ofMany.

The line that force-add the key column of the model makes this impossible.

I think a better approach than forcing the ID to be used could help solve this problem.

@cbl
Copy link
Contributor Author

@cbl cbl commented May 24, 2021

@ibrasho Can you please open an issue, giving a detailed description including a sql-fiddle and the code that fixes the issue for you? Then I can provide a fix with a test.

@gthedev
Copy link

@gthedev gthedev commented May 24, 2021

When trying to use this with a date $this->hasOne(Test::class)->latestOfMany('taken_at') I am getting an error:

SQLSTATE[42000]: Syntax error or access violation: 1055 Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'dbname.tests.user_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Is this an expected behaviour if sql_mode=only_full_group_by?

@cbl
Copy link
Contributor Author

@cbl cbl commented May 24, 2021

@gthedev yes this is expected, you need to disable only_full_group_by in that case.

@ibrasho
Copy link
Contributor

@ibrasho ibrasho commented May 25, 2021

Here is the current failing query, To fix it you need to cast some values:

select * from "product_sale_prices" 
inner join (
-  select MAX(id) as id , "product_sale_prices"."product_id"
+  select MAX(id::text) as id , "product_sale_prices"."product_id"
  from "product_sale_prices"
  group by "product_sale_prices"."product_id"
) as "salePrice"
-  on "salePrice"."id" = "product_sale_prices"."id"
+  on "salePrice"."id" = "product_sale_prices"."id"::text
    and "salePrice"."product_id" = "product_sale_prices"."product_id"
where "product_sale_prices"."product_id" = '8d90f3a6-1bb8-4576-b553-c146465974c0'
  and "product_sale_prices"."product_id" is not null
limit 1
@ibrasho
Copy link
Contributor

@ibrasho ibrasho commented May 25, 2021

I found a dirty solution that might lead to some better ideas.

The following changes in Illuminate\Database\Eloquent\Relations\Concerns\CanBeOneOfMany make this work with non-int key types:

    /**
     * Get a new query for the related model, grouping the query by the given column, often the foreign key of the relationship.
     *
     * @param  string|array  $groupBy
     * @param  string|null  $column
     * @param  string|null  $aggregate
     * @return \Illuminate\Database\Eloquent\Builder
     */
    protected function newOneOfManySubQuery($groupBy, $column = null, $aggregate = null)
    {
        $subQuery = $this->query->getModel()
            ->newQuery();

        foreach (Arr::wrap($groupBy) as $group) {
            $subQuery->groupBy($this->qualifyRelatedColumn($group));
        }

        if (! is_null($column)) {
+            $alias = $column;
+            if ($this->getParent()->hasCast($column, ['string'])) {
+                $column = "$column::text";
+            }
+
+            $subQuery->selectRaw($aggregate.'('.$column.') as '.$alias);
-            $subQuery->selectRaw($aggregate.'('.$column.') as '.$column);
        }

        $this->addOneOfManySubQueryConstraints($subQuery, $groupBy, $column, $aggregate);

        return $subQuery;
    }

    /**
     * Add the join subquery to the given query on the given column and the relationship's foreign key.
     *
     * @param  \Illuminate\Database\Eloquent\Builder  $parent
     * @param  \Illuminate\Database\Eloquent\Builder  $subQuery
     * @param  string  $on
     * @return void
     */
    protected function addOneOfManyJoinSubQuery(Builder $parent, Builder $subQuery, $on)
    {
        $parent->joinSub($subQuery, $this->relationName, function ($join) use ($on) {
+            $subselectColumn = $this->qualifySubSelectColumn($on);
+            $relatedColumn = $this->qualifyRelatedColumn($on);
+
+            if ($this->getParent()->hasCast($on, ['string'])) {
+                $relatedColumn = new Expression($this->newQuery()->getGrammar()->wrap($relatedColumn).'::text');
+            }
+
+            $join->on($subselectColumn, '=', $relatedColumn);
-            $join->on($this->qualifySubSelectColumn($on), '=', $this->qualifyRelatedColumn($on));

            $this->addOneOfManyJoinSubQueryConstraints($join, $on);
        });
    }
@cbl
Copy link
Contributor Author

@cbl cbl commented May 26, 2021

@ibrasho this probably only works for certain versions of progresql 🤔

@tomwelch
Copy link

@tomwelch tomwelch commented May 26, 2021

Hi @cbl

Thanks for this feature, I've already been busy putting it to use!

Using your price example, I was wondering how I would retrieve all products with their price on a given date using with rather than whereHas so that any products without a price on that day return null for the relationship.

Thanks 😎

@ibrasho
Copy link
Contributor

@ibrasho ibrasho commented May 28, 2021

@ibrasho this probably only works for certain versions of progresql 🤔

AFAIK this syntax has been supported since Postgres 9 (released in 2010). But again I'm trying to think of a way to make it DB-independent.

chu121su12 added a commit to chu121su12/framework that referenced this pull request May 29, 2021
@ahmedsayedabdelsalam
Copy link
Contributor

@ahmedsayedabdelsalam ahmedsayedabdelsalam commented May 31, 2021

really awesome work and discussion 🔥

can we support the many to many and morph many relationship too?

i have the following tables

  • workflows
    • id
  • worflow_objects
    • id
    • workflow_id
    • flowable_type // Device::class | User::class
    • flowable_id
    • meta // json column
  • devices
    • id
  • users
    • id

every workflow has only one device and one user

so i can convert the following relationship to be device and return only the first one

public function devices()
{
    return $this->morphedByMany(Device::class, 'flowable', 'workflow_objects);
}
@Sergiobop
Copy link

@Sergiobop Sergiobop commented Jun 1, 2021

Hey @cbl, i'm trying again after the patch #37436

I'm getting this now, what am i doing wrong?

Illuminate\Database\QueryException: SQLSTATE[08P01]: <>: 7 ERROR: bind message supplies 0 parameters, but prepared statement "pdo_stmt_00000008" requires 1

Model 1: (Model)

public function current_other_model(): HasOne
{
    return $this->hasOne(OtherModel::class)->ofMany(['date' => 'max']);
}

Model 2: (OtherModel)

 public function model(): BelongsTo
    {
        return $this->belongsTo(Model::class)->withTrashed();
    }

Code causing the error:
$model = Model::with(['current_other_model'])->first();

@cbl
Copy link
Contributor Author

@cbl cbl commented Jun 2, 2021

Hey @Sergiobop this seems to be an error with your sql connection. What connection are you using? And what is the result if you run the following:

Model::with(['current_other_model'])->dd();
@Sergiobop
Copy link

@Sergiobop Sergiobop commented Jun 2, 2021

@cbl doing that i get this:

"select * from "models" where "id" = ? and "models"."deleted_at" is null"
array:1 [
0 => "c8c52dda-7d39-4359-871f-292a87a6b28b"
]

I'm using pgsql

@Sergiobop
Copy link

@Sergiobop Sergiobop commented Jun 4, 2021

Hi again @cbl , i tried to follow up the problem, and i achieved to log these queries being executed; the problem is the last one, (the bindings are empty):

SQL: "select * from "models" where "id" = ? and "models"."deleted_at" is null limit 1"
Bindings: ["c8c52dda-7d39-4359-871f-292a87a6b28b"]

SQL:

select 
  * 
from 
  "other_models" 
  inner join (
    select 
      max(id) as id, 
      "other_models"."model_id" 
    from 
      "other_models" 
      inner join (
        select 
          max(date) as date, 
          "other_models"."model_id" 
        from 
          "other_models" 
        where 
          "other_models"."model_id" in (?) 
          and "other_models"."deleted_at" is null 
        group by 
          "other_models"."model_id"
      ) as "current_other_model" on "current_other_model"."date" = "other_models"."date" 
      and "current_other_model"."model_id" = "other_models"."model_id" 
    where 
      "other_models"."deleted_at" is null 
    group by 
      "other_models"."model_id"
  ) as "current_other_model" on "current_other_model"."id" = "other_models"."id" 
  and "current_other_model"."model_id" = "other_models"."model_id" 
where 
  "other_models"."deleted_at" is null

Bindings: []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet