New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFR] Eager loading support of Phalcon\Mvc\Model #1117

Closed
hilyjiang opened this Issue Aug 19, 2013 · 79 comments

Comments

Projects
None yet
@hilyjiang

hilyjiang commented Aug 19, 2013

Once we use model, we always define relationships between models.

In most cases, we need to implement a list view, displaying objects with its foreign key field's value.

This will cause a N+1 selects problem (http://use-the-index-luke.com/sql/join/nested-loops-join-n1-problem), slowing down the performance.

So I think it is necessary to implement the eager loading feature of Phalcon\Mvc\Model.

@yewjs

This comment has been minimized.

yewjs commented Aug 28, 2013

+1

2 similar comments
@Raistlfiren

This comment has been minimized.

Raistlfiren commented Oct 11, 2013

+1

@ghost

This comment has been minimized.

ghost commented Dec 4, 2013

+1

@ovr

This comment has been minimized.

Contributor

ovr commented Dec 4, 2013

Your can use leftJoint at another ORM but in phalcon leftJoin relations isn`t selected to object only run in sql #1111?

@niden niden added Unassigned and removed Unassigned labels Apr 3, 2014

@nazieb

This comment has been minimized.

nazieb commented Apr 15, 2014

+1

1 similar comment
@kamote

This comment has been minimized.

kamote commented Apr 20, 2014

+1

@olegrom32

This comment has been minimized.

olegrom32 commented Jun 16, 2014

What's the status of this one? Or do I have to give up phalcon?..

@dcwq

This comment has been minimized.

dcwq commented Sep 18, 2014

+1

2 similar comments
@mrthebob

This comment has been minimized.

mrthebob commented Sep 18, 2014

+1

@calinrada

This comment has been minimized.

calinrada commented Sep 19, 2014

+1

@dexx086

This comment has been minimized.

dexx086 commented Sep 22, 2014

+1

Can we expect anything to happen with it? It's quite a basic feature I think, it should be handled with high priority if I may categorize its severity.

But at least a tiny feedback would be good about it to know if we can expect it to be implented or should learn a different ORM. Because without this feature this ORM handles only separate entities, not entities with relations. Lazy loading is only a tiny help but doesn't care about data integrity and mostly not about performance. But on the other hand, Phalcon is so good, should make its ORM to be usable as well, big +1 vote for this feature.

@boedy

This comment has been minimized.

boedy commented Sep 22, 2014

+1

1 similar comment
@AndreBaumeier

This comment has been minimized.

AndreBaumeier commented Sep 24, 2014

+1

@boedy

This comment has been minimized.

boedy commented Sep 24, 2014

Is there an alternative to solving this N+1 problem?

@AndreBaumeier

This comment has been minimized.

AndreBaumeier commented Sep 24, 2014

I could think of creating stubs and populating explicitly joined data in those stubs, for other unavailable attributes I'd fall back to lazy loading/ load them once requested.

@tanagorns

This comment has been minimized.

tanagorns commented Oct 4, 2014

+1

5 similar comments
@giftymalik

This comment has been minimized.

giftymalik commented Oct 7, 2014

+1

@scarbo87

This comment has been minimized.

scarbo87 commented Oct 7, 2014

+1

@titus-toia

This comment has been minimized.

titus-toia commented Oct 24, 2014

+1

@fatihkurt

This comment has been minimized.

fatihkurt commented Oct 27, 2014

+1

@mapirelli

This comment has been minimized.

mapirelli commented Jan 15, 2015

+1

@fsieduc

This comment has been minimized.

fsieduc commented Jan 19, 2015

+10

@noelmrnd

This comment has been minimized.

noelmrnd commented Feb 5, 2015

+1

This is the only workaround I could find:

$rows = $this->modelsManager->createBuilder()
    ->from(array(
        'Persona'=>'Personas', // model: Personas, alias: Persona
        'Grupo'=>'Grupos' // model: Grupos, alias: Grupo
    ))
    ->where('Persona.grupo_id=Grupo.id')
    ->getQuery()
    ->execute();

foreach($rows as $row) {
    $persona = $row->Persona;
    $grupo = $row->Grupo;
}
@fsieduc

This comment has been minimized.

fsieduc commented Feb 6, 2015

Many Thanx

@erickskrauch

This comment has been minimized.

erickskrauch commented Apr 5, 2015

Is there at least some news regarding this functional? It really must have functional.

@Surt

This comment has been minimized.

Contributor

Surt commented Apr 8, 2015

+1
phalcon really needs it

@erickskrauch

This comment has been minimized.

erickskrauch commented Apr 18, 2015

2.0 is released. Please add this functionality, it is the only thing that spoils the impression from working with the framework. This problem for 2 years (in fact).

@erickskrauch

This comment has been minimized.

erickskrauch commented Aug 10, 2015

@andresgutierrez, yes, it can also be done, but that would be correct if I do something:

$comments = Comments::query()->with(['author', 'author.skin'])->...;
foreach($comments as $comment) {
    echo $comment->text;
    echo $comment->author->nickname;
    echo $comment->author->skin->faceUrl;
}

At least in Yii2 it is applied such method than to access relations through StdClass.

@sszdh

This comment has been minimized.

sszdh commented Aug 10, 2015

I think the age of using DEEP RELATIONs an also eager loadig is done!

We should think different. Redesign UXs an data manipulations...

Eager loading has so much disadvantages nowadays!

Sorry but that is my POV.

@andresgutierrez

This comment has been minimized.

Member

andresgutierrez commented Aug 10, 2015

@erickskrauch But, if you only need $comment->text why do you have to fetch the full Comments instance, the full Author instance? Are we writing high performance applications or are we wasting memory fetching and hydrating data that we aren't using at all?

@erickskrauch

This comment has been minimized.

erickskrauch commented Aug 10, 2015

@andresgutierrez, I just show you example of real using eager loading in real project.

Many people are puzzled, using PhalconPHP, how to get one select for model and related data. Perhaps in PhalconPHP a different approach to it (as shown by you through PHQL), but no such examples in the documentation.

On the other hand Yii (Yii2), Laralev, etc. provide a method with() and the developers would like to have this in PhalconPHP. Otherwise would not have been much of an issue here on github.

@ghost

This comment has been minimized.

ghost commented Aug 10, 2015

Eager loading in laravel and yii isn't real eager loading if you look at the database logs. I always see eloquent make two queries. One for the base model, then another for the related models via an IN(). What would really be nice in Phalcon is to have the same syntax, but that syntax be a join on a relationship. We would, of course need to specify columns, but I feel like this grants the best mix of utility and performance. We hydrate the result set as instances of the base model with the related columns attached via properties. I.e. I join do Robot->with("parts",["name"]) I would be able to reference that as a result by $robot->parts->name. That's a bad example because it's a many to one, and that's really the thought behind a one to one.

@Surt

This comment has been minimized.

Contributor

Surt commented Aug 11, 2015

@andresgutierrez usually the eager loading is done through multiple queries, with the use of IN

Eager loading

$results = User::with('avatar', 'posts')->find_many();
//will use 3 querys to fetch the users and the relationships:

  SELECT * FROM user
  SELECT * FROM avatar WHERE user_id IN (....)
  SELECT * FROM posts WHERE user_id IN (....)

// so you already have all the data in place, ready to be used

  foreach($results as $result){
      echo $result->avatar->img;
      foreach($result->posts as $post){
         echo $post->title;
      }
  }

Working over the example: If your abstraction sets the indexes of the users resultset as the primary key of the row you just need to use array_keys to generate the second query. Voila, you have the first depth for each relationship with a simple array_keys

Of course, you can calculate the number of queries if I have 100 users with 100 avatars and 100 post each one. With eager loading there is only 3 queries. And there is no memory wasted, I need that data, I will show it on a Json or a web page.

The advantages are what I expect of programming: to go through already implemented abstractions so I don't need to rewrite it every time. In this case I prefer to define wich relationships I want to fetch and I want to fetch them easily, with the less queries to database and having them in memory to use them when I want.

Imagine if I want to get users with posts and from each post the tags related.... In phalcon I would need to write 3 foreachs to fetch the relationships. Not to mention deeper relations.
Lazy loading does not give me any advantages if for example I want to return a JSON from the last example. I want that data without the need of writing the foreachs, I want a clean interface to specify what I need and of course, I need it when I ask for it, I don't want to ask for it on each iteration.

@erickskrauch But, if you only need $comment->text why do you have to fetch the full Comments instance, the full Author instance? Are we writing high performance applications or are we wasting memory fetching and hydrating data that we aren't using at all?

Of course eager loading is to use it as another tool, it does not eliminate the need to think. In the case you expose maybe you don't need to use it. (but imagine that the comments entity does have any kind of domain logic or behaviour needed to get the text... then the use of eager will be an advantage.

On Web each time I need data I know exactly wich data I need, let me post a quote from here http://blog.bemycto.com/software-architecture/2015-05-17/doctrine-orm-not-suited-php/

Lazy Loading is a nonsense
In a stateless environment, Lazy Loading data is a bad practice. Nothing to say more. There may be some cases where it could be slightly more performant to use it, but there are edge cases.

Eloquent does it in a great way: http://laravel.com/docs/5.1/eloquent-relationships#eager-loading and if you read the documentation you can understand the need of any ORM to have it. Nested eager loading is where the abstraction really shines.

Of course, there is advantages on lazy loading for other purposes as background process with high memory consumption.

@andresgutierrez There is a great implementation here https://github.com/stibiumz/phalcon.eager-loading althought it still needs more developing and would be great to sync with phalcon to solve some drawbacks.

@sszdh disadvantages? I don't understand what are the disadvantages of making less database queries through an intelligent management of the database. Let me say that when I work with objects I hate to work with lower abstractions like arrays or database queries. (usually that's my problem with legazy code). The same paradigm I use when I need to work with sql or PHQL.... I don't want data, I want entities with behaviours and domain logic not arrays that I need to fix on each iteration.

P.D. sorry for the rush :p I'm really used to eager loading and I would like to see it on phalcon, because when you use it and understand it you know when to make use of it and you need it. It is just another tool and would be great to have it.

@erickskrauch

This comment has been minimized.

erickskrauch commented Aug 11, 2015

@Surt, +1

@alrik11es

This comment has been minimized.

alrik11es commented Aug 11, 2015

@Surt, +1 (Clap, clap)

@valVk

This comment has been minimized.

valVk commented Aug 11, 2015

CakePHP has params called "recursive" for this.

Correct me if I'm wrong.

does this code produces selects in db?


foreach ($this->Robots as $robot) {
    foreach ($robot as $parts) { // SELECT * FROM parts WHERE robot_id = $robot->id
         foreach ($parts as $part) {
               echo $part->name;
         }
    }
}
@andresgutierrez

This comment has been minimized.

Member

andresgutierrez commented Aug 11, 2015

IN queries

@Surt That implementation is only suitable for small/medium resultsets (maybe large depending on the values). If you fetch the parent records (in this case users), then you have to traverse this resultset to obtain the "unique" user IDs (found across few, dozens, hundred, thousands or records?). Then these IDs must to be bound to the IN operator to obtain the list. This means keep in memory the list of IDs, then create and send a large SQL statement to the database (ie. using 50 values):

SELECT `robots_parts`.`id`, `robots_parts`.`robots_id`, `robots_parts`.`parts_id` 
FROM `robots_parts` WHERE `robots_parts`.`robots_id` IN (:APR0, :APR1, :APR2, :APR3, :APR4, :APR5, :APR6, :APR7, :APR8, :APR9, :APR10, :APR11, :APR12, :APR13, :APR14, :APR15, :APR16, :APR17, :APR18, :APR19, :APR20, :APR21, :APR22, :APR23, :APR24, :APR25, :APR26, :APR27, :APR28, :APR29, :APR30, :APR31, :APR32, :APR33, :APR34, :APR35, :APR36, :APR37, :APR38, :APR39, :APR40, :APR41, :APR42, :APR43, :APR44, :APR45, :APR46, :APR47, :APR48, :APR49)

While I'm pretty sure this should be faster than individual SELECTs, some database systems start having problems with thousands of values passed in an IN operator:

A database like PostgreSQL always create a temporary table to store the values in the IN list to make the comparison using an inner join (which seems to be a good idea). A database like Oracle transform the values in OR comparisons and also it doesn't allow more than 1000 items in an IN clause (to fix it requires split the values into ORs of 1000 items each one).

This variation in behaviors in which large list of values in an IN operator is handled lead me to think
that the behavior on large resultsets could be unpredictable in terms of performance or functionality.

Caching

Regarding caching, eager loading always retrieves the values from the database as I see. There's no way to cache related records (unless we think up a way to do that) and retrieve data from the database engine instead of a cache backend (Redis, Memcached, APC, etc) can impact performance (or at least it loses the opportunity to improve it). For instance a customer has a location (id, city, country, etc) this is stored in a second table:

$robots = Customers::query()
                      ->with("location") // always hit the database
                      ->execute();
foreach ($customers as $customer) {
   // ...
}

Locations never change and they could be stored in a fastest cache backend reducing unnecessary access to relational databases. Only the parent record can be cached, if a caching policy is attached to the query it does not necessary apply to the related records.

Hydration

In eager loading, related records are fully fetched in memory, this means every request needs to allocate more memory:

$users = Users::query()
                      ->with("posts") 
                      ->execute();
foreach ($users as $user) { // fetch one user from the database + n posts
    foreach ($user->posts as $post) { 
       // ...       
    }
}

In the other hand using lazy loading we have just one user + one post at the same time in memory:

$users = Users::find();
foreach ($users as $user) { // fetch one user from the database
    foreach ($user->posts as $post) { // fetch one post from the database
       // ...
    }
}

Of course, this depends on the number of related records fetched in associated relationships. Just added this part to know if you are aware of this.

My conclusion

I can go ahead and implement this feature using IN queries however it will not be perfect and a solution to everyone's problem. As I mentioned before, IMHO, the best option is always fetch exactly what you need using PHQL or the QueryBuilder (currently available in Phalcon).

@Surt

This comment has been minimized.

Contributor

Surt commented Aug 12, 2015

@valVk if I don't remember it wrong CakePHP fetch relationships with eager loading by default, unless you specify lazy loading, so, in your example it does not produce a query for each iteration. In the other hand it uses JOINS instead of IN when it's possible, but I think that it's really difficult to accomplish in terms of complexity.

@andresgutierrez

IN queries

Usually on web applications or APIs there is a need for pagination of results. If you make a query with thousands or records you probably are doing something wrong.
By the way, the transverse across the resultset to obtain the ID is the SAME transverse you do with lazy loading when you want to access to a relationship for the model. So no diference in time here. (except that you will end with just 1 query for each relationship instead of thousands.) All we are doing is to hide the foreachs as a framework implementation. Of course, at the cost of memory.

Hydration

In the other hand using lazy loading we have just one user + one post at the same time in memory

Most of the time, on web pages applications you will take those user + post and stash into a variable to use on the template, or for example, to write a JSON. So in the end your memory consumption is the same => all users with posts on a variable.
Of course you can transverse the models on the template but then probably you will end with domain logic inside templates.

On the other hand I'm really afraid when I look at the Database queries log and I see thousand of queries. I'm afraid of responsiveness, concurrency...

Actually, the https://github.com/stibiumz/phalcon.eager-loading (added to the phalcon incubator) is doing a great job, but it can be merged with phalcon to fully develop its potential. For example, if I don't remember it wrong, the new relationship with conditions on the definition does not work with the eager loading functionality)

@williamokano

This comment has been minimized.

williamokano commented Aug 12, 2015

In my case I was writing a API using phalcon and using the "concept" of resources, which is a model on Phalcon.

On my approach, I had the following syntax to fetch a resource:

http://ENDPOINT/resource_name?parameters

As the resources have the same superclass (Phalcon\Model) I can write this to fetch the data dynamically using a generic code, something like that

public function handleResource($resourceName)
{
$resource = new $resourceName($someAdditionalParam);
$resource-> ...... (get the query from the parameters in url)
echo json_encode($resource->toList());
}

This is just an example and doesn't reflect actual phalcon code!

Now, my problem is: lets suppose I need to fetch the Books resource, but I only need the books that was written by George R. R. Martin, so my URL should be something like that

http://ENDPOINT/books?q=author.name=George%20R.%20.%20Martin

I know I could do something like Author::FindFirst('name=blablalba')->books, but I will have to change a lot of things on my resource handler and, in my humble opinion, looks semantically wrong to fetch Author to find the authors to only then fetch the book, since the straight forward idea is find the books from this author.

@andresgutierrez

This comment has been minimized.

Member

andresgutierrez commented Aug 12, 2015

@williamokano You can always use a join to fetch all the data in a single query

@nsossonko

This comment has been minimized.

Contributor

nsossonko commented Nov 13, 2015

+1 for this. My need for this is only for climbing the parent chain of a record. Eg:

$grandChild->$parent->$grandParent results in 3 SQL queries now:

(assuming the ids of each is 1)

  1. for the grandChild (SELECT * FROM GrandChild WHERE id = 1)
  2. for the parent (SELECT * FROM Parent WHERE id = 1)
  3. for the grandParent (SELECT * FROM GrandParent WHERE id = 1)

I have a need to check the entire chain sometimes, not just the grandParent or grandChild, but the parent as well. I would much rather do this once as follows:

SELECT grandChild.*, parent.*, grandParent.*
FROM GrandChild as grandChild
JOIN Parent as parent
  on parent.id = grandChild.parent_id
JOIN GrandParent as grandParent
  on grandParent.id = parent.grandParent_id
WHERE grandChild.id = 1

Obviously all an abstraction, but that's the general idea. If I run that now with PHQL it will return a Row instead of a ResultSet or Model with the relations set.

@mzf

This comment has been minimized.

mzf commented Feb 4, 2016

+1 for this feature from box

@borisdelev

This comment has been minimized.

Contributor

borisdelev commented Feb 19, 2016

+1

1 similar comment
@tmihalik

This comment has been minimized.

Contributor

tmihalik commented Feb 19, 2016

+1

@sergeyklay

This comment has been minimized.

Member

sergeyklay commented Feb 19, 2016

👎
Eager Loading is a really bad idea popularized by frameworks that do not care about performance.

@sergeyklay sergeyklay closed this Feb 19, 2016

@Jurigag

This comment has been minimized.

Member

Jurigag commented Feb 19, 2016

I agree.

@Surt

This comment has been minimized.

Contributor

Surt commented Feb 21, 2016

Well another eager loading issue closed. Anyway the https://github.com/stibiumz/phalcon.eager-loading works well and we can work with zephir to a lower level implementation.
Actually, on web programming, persistence read-write are the most used functionality and it's the most looked tool (routers, request, responses,caché, etc are nowadays practically standardars (even more thanks to phpfig)
About performance.... It just means memory vs processing time... Making thousand of calls to database to prevent bad use of memory .. Well, is just one side of a coin.

@ghost

This comment has been minimized.

ghost commented Feb 21, 2016

You can't just dismiss this out of hand Sergey. I don't disagree that most implementations of eager loading in frameworks are memory hogging abominations because, like stibiumz's implementation, they just pull in the entire model. But I assure you that with careful consideration to performance, an imementation could be done in Phalcon that is no more memory intensive or detrimental to performance that a query builder with joins.
Eager loading isn't a bad idea, it's an idea that is implemented poorly in most if not all of its current incarnations. Phalcon already redefined the performance that a framework can deliver, so why are we so unwilling to redefine the performance of the ORM?

@josefguenther

This comment has been minimized.

Contributor

josefguenther commented Feb 25, 2016

You could say relational databases in general are a bad idea. However, many of us are using SQL variants with success. We use relationships. We do not get the best performance, but we are not programming in ASM either. For us (probably 90% of phalcon users), eager loading makes sense.

Just a quick note. The current implementation of ResultSet makes it VERY difficult to do eager loading manually. Because of the way it does not store all results in memory, it is impossible to add in the eager loaded data to the result set array.

I myself use the stibiumz project linked to above but it is buggy and no longer in active development...

@Jurigag

This comment has been minimized.

Member

Jurigag commented Feb 25, 2016

Just whats the problem with joins ?

@josefguenther

This comment has been minimized.

Contributor

josefguenther commented Feb 25, 2016

So far I have only been able to use joins as conditionals in queries, I have not been able to retrieve data from the joined table. Am I missing something?

@Jurigag

This comment has been minimized.

Member

Jurigag commented Feb 25, 2016

I guess you are missing modelsManager ? In my project i normally retrieve data from some table and joined table in one query(instruction).

@nsossonko

This comment has been minimized.

Contributor

nsossonko commented Feb 25, 2016

Make sure you select the columns of the joined tables as well, by default they are not selected.

@josefguenther

This comment has been minimized.

Contributor

josefguenther commented Feb 26, 2016

Aha, I see. This returns the Complex resultset... This is possible but could be a lot of work.

I do not understand why eager loading is a bad idea. Hibernate (java) allows you to eager load with a separate SELECT or with a JOIN. I think that would be an excellent model to work off of. Just because it may negatively impact performance does not mean it would not be helpful.

@Green-Cat

This comment has been minimized.

Contributor

Green-Cat commented Feb 26, 2016

Hibernate (java) allows you to eager load with a separate SELECT

You can already do that: $model->getRelated('othermodel');

@phalcon phalcon locked and limited conversation to collaborators Feb 26, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.