Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about relationship? #13

Closed
fantix opened this issue Jul 22, 2017 · 32 comments
Closed

What about relationship? #13

fantix opened this issue Jul 22, 2017 · 32 comments
Labels
help wanted question A community question, closed when inactive.

Comments

@fantix
Copy link
Member

fantix commented Jul 22, 2017

Honestly I have no idea yet.

Update:

GINO now has a loader mechanism to load a result matrix into objects on need, please see examples below. It allows relationship be implemented in a primitive way. Next, we'll try to introduce some high level relationship encapsulation.

@fantix fantix added the question A community question, closed when inactive. label Jul 22, 2017
@jonahfang
Copy link

我建议不要实现relationship, 应该优先实现标量子查询(scalar subqueries):

 一个标量子查询是一种圆括号内的普通SELECT查询,它刚好返回一行一列.该子查询可以从周围的查询 中引用变量,这些变量在该子查询的任何一次计算中都将作为常量
SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name) FROM states;

我的所有生产项目中都使用这一方法从关联表中获取数据,从来不使用外键关联。

@jonahfang
Copy link

Django中有能很优雅地使用标量子查询,例如:

    f1 = "SELECT COUNT(*) FROM t_MessageReply WHERE master_id = t_MessageLog.id"
    objs = MessageLog.objects.filter(checked=False).extra(select={'reply_count':f1})

@fantix
Copy link
Member Author

fantix commented Aug 27, 2017

因为 GINO 本质上是一个 SQLAlchemy core 的 asyncpg 方言,所以理论上 SQLAlchemy core 可以做到的查询,GINO 都是可以的,这里就包含了 scalar subquery,比如这种用法

其实这里更重要的问题是,要不要把数据库关系——不管是有外键约束还是无外键约束——通过专门的代码来表示和操作,而非通过通用的 SQLAlchemy 查询语句,因为通用的 SQLAlchemy 语句 GINO 已经支持了。以及如果要做的话,怎么做,要不要强制外键约束,是否支持多对多,等等。

@fantix
Copy link
Member Author

fantix commented Aug 27, 2017

Generally we are talking about using scalar subquery and how to deal with object-level relationship.

@jonahfang
Copy link

理论上 SQLAlchemy core 可以做到的查询,GINO 都是可以的.

能否在文档中指明差异在哪里?我正在补课学习SQLAlchemy core.

@fantix
Copy link
Member Author

fantix commented Aug 30, 2017

嗯,可以的,我先在这里尝试找一下。SQLAlchemy 的设计是,用尽量通用的 Clause 代码来写查询语句,然后用不同的 dialect 去翻译及适配不同的数据库,也包括同一种数据库的不同连接程序,比如 psycopg2,或者 asyncpg。因此,通过 GINO 使用 SQLAlchemy 时的差异就在于,dialect 翻译和适配之后的结果不同,比如 psycopg2 使用 %(xx)s 作为 SQL 中的变量占位符,而 asyncpg 则使用 $1。而这些差别对于使用通用 Clause 编写查询语句的人员来讲应该是尽量透明的,他们不需要知道具体的 SQL 规则或者连接程序的要求,也可以写查询并且执行。然而,当需要用到数据库的独特特性,或者特定连接程序的独特特性时,调用者就会体验到明显的差异。举个最明显的例子,asyncpg 底层是异步的,用 GINO 的时候,就需要各种 await,这与传统 SQLAlchemy 是有明显差异的。涉及到具体语句用法的差异,比如说,psycopg2 对于 JSON/JSONB 类型的数据是可以自动做 JSON 解析的,而 asyncpg 则直接返回原始
JSON 字符串。然而,SQLAlchemy 有一个 result process 机制,即先对连接程序返回的结果做处理再返回给调用者,GINO 也是沿用了这一机制,所以到头来,GINO 的用户也是能够拿到解析后的 JSON 对象的,在使用上并没有差异。最后,GINO 暂还不支持 DDL、execute many 等 SQLAlchemy 的复杂功能,后续可能会逐渐补上。

@fantix fantix added this to the v1.1.x milestone Sep 3, 2017
@abondar
Copy link

abondar commented Feb 10, 2018

What is the current status for this?
Regarding user interface for it, how about implementing methods similar to .prefetch_related() and .select_realted() which will contain making subqueries/additional queries to get data from related tables.

@fantix
Copy link
Member Author

fantix commented Feb 10, 2018

Thanks for the suggestion! As the refactor of #59 and some personal distractions, relationships may have to get a lower priority for me, apologizes for that. However I do appreciate if someone would like to push it forward by PRs PoCs or even suggestions :) I’d be more than willing to review them, just need some time.

@martell
Copy link

martell commented Mar 15, 2018

Hey, @fantix nice work, glad to see v0.6 (aka 1.0) being tagged.
Wanted to bump this and ask what do you think is needed to implement relationship?
I can put in some time to get this working if I have direction on what and how it should be built or implemented or is this on your radar now that 0.6 has been reached, this does seem like the only major blocker left for people to switch from sqalchamy

@fantix
Copy link
Member Author

fantix commented Mar 16, 2018

Your help is much appreciated! I've been also thinking about how this can be done in any possible way. Looking into sqlalchemy.orm.relationship, there are so many features about it and they are not quite likely feasible to be fully ported to async. Peewee is another reference worth checking.

I think we can divide this big question into smaller ones:

  1. How to objectively define relationships?

First of all, I think we should not define the column (usually with foreign key constraint) for the user - that is just too implicit. Second I think user should define relationship on both models, not using the backref magic. In order to reference each other without getting into the circular reference trap, the reference should either be a string parsed after all models are defined, or be put in a method wrapped into attribute and invoked on use. I prefer the second. An example would be something like this (just a proposal):

class Parent(db.Model):
    __tablename__ = 'parents'

    id = db.Column(db.BigInteger(), primary_key=True)

    @relationship
    async def children(self):
        # details about how to load children? For a very conceptual example:
        return await Child.query.where(Child.parent_id == self.id).gino.all()


class Child(db.Model):
    __tablename__ = 'children'

    id = db.Column(db.BigInteger(), primary_key=True)
    parent_id = db.Column(db.ForeignKey('parents.id'))

    @relationship
    async def parent(self):
        # details about how to load parent? For a very conceptual example:
        return await Parent.get(self.parent_id)

And yes I prefer not to depend on information from a foreign key constraint, and let the user decide how to get connected. For example, some users just prefer not to have any foreign key constraints at all.

  1. How to load related data?

Let me try to summarize the ways loading relationshipts:

  1. Use N+1 queries
  2. Use joined queries
  3. Use scalar subqueries
  4. Anything else?

Then the question becomes, should we support only one way, or several optional ways?

  1. How to use relationship as query conditions?
  2. How to make changes to relationships?

To get started, I think we can experiment on making this:

  • Start with the most ordinary many-to-one / one-to-many relationship with only one foreign key on the "many" end
  • WIP

By the way, GINO 0.6 is 1.0b2, 0.7 will be 1.0b3, I'll hope 0.8 could be 1.0rc, then 1.0 final.

@fantix
Copy link
Member Author

fantix commented Mar 28, 2018

I've made a few attempts in different directions, it looks like "unpacking rows from database into expected objective structure" might be an essential feature for CRUD. For example to achieve the same as:

await User.query.gino.all()

We are actually wishing to have a list of User objects:

await User.query.gino.get([User])

Then first() would be:

await User.query.gino.get(User)

And scalar() could be:

await User.query.gino.get(User.id)

Or to have a list of user IDs:

await User.query.gino.get([User.id])

Or tuples:

await User.query.gino.get([(User.id, User.name)])

Load only a part of User:

await User.query.gino.get([User.load('id', 'name')])

Load a many-to-one relationship:

await User.outerjoin(Department).select().gino.get([User.load(dept=Department)])

Two levels of many-to-one relationships:

await User.outerjoin(Department).outerjoin(Company).select().gino.get(
    [User.load(dept=Department.load(company=Company))])

Adjacency list:

Manager = User.alias()
await User.outerjoin(Manager, User.manager_id == Manager.id).select().gino.get(
    [User.load(manager=Manager)])

Alternatively, use map():

Manager = User.alias()

def unpacker(result):
    rv = result[User]
    rv.manager = result[Manager]
    return rv

await User.outerjoin(Manager, User.manager_id == Manager.id).select().gino.map(unpacker)

Might also have reduce() for one-to-many collections.

(Proposal Only)

@wwwjfy
Copy link
Member

wwwjfy commented Mar 28, 2018

I fall behind a lot, but as far as I see, there are no radical changes in the relationship. So here are the two cents, hope it's not too late.

I expect GINO to

  1. Provide basic operations to get the related models, pretty much the ones shown above. And I hope it has to be explicit. For example, to get the manager property of a user, you need to specify N+1 queries or join. The idea is to get the results, please tell me you want it and how you want it, so no surprises.
    What I don't like SQLAlechemy's relationship is usually it does everything for you and not as flexible.

  2. Provide the ability to execute raw SQL and convert the result into models if applicable, and for existing models, we can do basic type checking optionally.
    I've met some cases that, for some complex queries, it was not quite difficult for me to use raw SQL to get the result, but I had to fight with SQLAlchemy to get the correct expressions, and usually I failed. :(
    Even when I succeeded, the readability compromised a lot.

The proposal LGTM, a few comments though:

  1. await User.query.gino.get([User]), await User.query.gino.get(User). I get the wonderfulness of adopting unified entry point and using the type of the argument to tell what the caller wants, but two User seems duplicate, and what if the argument is a different type, say Company? Yes, we can throw exceptions, but using different methods looks more clear to me.

  2. Would the related object have the reference to the current object (or a list)? (In the example, user.manager.user or user.manager.memebers)
    I'd prefer no, based on the same no surprises principle.

Looking forward to the implementation. I just started a little project that let me try GINO in real life.

@fantix
Copy link
Member Author

fantix commented Mar 28, 2018

托尼睡的够晚的啊哈哈,我大概看明白了,意见基本一致,明早详细回复

@fantix
Copy link
Member Author

fantix commented Mar 29, 2018

There's no concrete implementation yet, so it is a good time. I cannot agree more on your two expectations, I'll try to make up an example about unpacking rows into objects generated by raw textual SQL. For the two comments:

  1. Oh yes you are right about it, and I think too we should keep the existing all(), first() and scalar() methods and keep recommending them over the proposed way.

  2. I prefer no too. However it might be handy to be able to optionally have the reverse reference set automatically by enabling a back_populate switch or something like that.

fantix added a commit that referenced this issue Mar 29, 2018
fantix added a commit that referenced this issue Mar 29, 2018
@fantix
Copy link
Member Author

fantix commented Mar 29, 2018

A bit surprised by that SQLAlchemy core is actually quite ORM-ready, so this PR isn't that tough after all. Based on this PR, it should be much easier to implement relationships.

@fantix
Copy link
Member Author

fantix commented May 23, 2018

Just added a feature for loader to populate distinct values, works for subloaders too. That means one-to-many and many-to-many relationships can be achieved in a manual way.

Example of one-to-many:

class Parent(db.Model):
    __tablename__ = 'parents'

    id = db.Column(db.BigInteger(), primary_key=True)

    def __init__(self, **kw):
        super().__init__(**kw)
        self._children = set()

    @property
    def children(self):
        return self._children

    @children.setter
    def add_child(self, child):
        self._children.add(child)
        # optionally set backref here manually
        child.parent = self


class Child(db.Model):
    __tablename__ = 'children'

    id = db.Column(db.BigInteger(), primary_key=True)
    parent_id = db.Column(db.ForeignKey('parents.id'))


async def main():
    query = Child.outerjoin(Parent).select()
    loader = Parent.distinct(Parent.id).load(add_child=Child)
    parents = await query.gino.load(loader).all()
    for parent in parents:
        print(f'{parent} has {len(parent.children)} children')

Here the setter property in parent is unusually named to let loader work without confusion (parent.add_child = child is actually parent.children.add(child)).

Example of many-to-many:

class Parent(db.Model):
    __tablename__ = 'parents'

    id = db.Column(db.BigInteger(), primary_key=True)

    def __init__(self, **kw):
        super().__init__(**kw)
        self._children = set()

    @property
    def children(self):
        return self._children

    @children.setter
    def add_child(self, child):
        self._children.add(child)
        # optionally set backref here manually
        child.parents.add(self)


class Child(db.Model):
    __tablename__ = 'children'

    id = db.Column(db.BigInteger(), primary_key=True)

    def __init__(self, **kw):
        super().__init__(**kw)
        self._parents = set()

    @property
    def parents(self):
        return self._parents

    @parents.setter
    def add_parent(self, parent):
        self._parents.add(parent)
        # optionally set backref here manually
        parent.children.add(self)


class Association(db.Model):
    __tablename__ = 'associations'

    parent_id = db.Column(db.ForeignKey(Parent.id))
    child_id = db.Column(db.ForeignKey(Child.id))
    

async def main():
    query = Parent.outerjoin(Association).outerjoin(Child).select()
    parents_loader = Parent.distinct(Parent.id).load(add_child=Child.distinct(Child.id))
    parents = await query.gino.load(parents_loader).all()
    for parent in parents:
        print(f'Parent<{parent.id}> has {len(parent.children)} children: {[c.id for c in parent.children]}')

    query = Child.outerjoin(Association).outerjoin(Parent).select()
    children_loader = Child.distinct(Child.id).load(add_parent=Parent.distinct(Parent.id))
    children = await query.gino.load(children_loader).all()
    for child in children:
        print(f'Child<{child.id}> has {len(child.parents)} parents: {[p.id for p in child.parents]}')

By far I think the infrastructure is ready, what is left for relationship is all kinds of reasonable shortcuts for convenience. I'll leave this issue open until 0.8 is out.

fantix added a commit that referenced this issue May 24, 2018
@jekel
Copy link
Contributor

jekel commented Aug 16, 2018

Is there anyway to use this loaders inside execution_options? for complex query with several joins

@fantix
Copy link
Member Author

fantix commented Aug 16, 2018

Is what you want included in https://python-gino.readthedocs.io/en/latest/relationship.html?

@jekel
Copy link
Contributor

jekel commented Aug 16, 2018

I think that it is about ... load(add_child=Child)
But in my case, i cant understand how can implement it.
Look, i have query like:

db.select([ModelOne, agg_function_on_aliased_query]).select_from(
ModelOne.join(aliased_query, join_options...) \
.join(ModelTwo, join_options...) \
.join(ModelThree, join_options)
) \
.execution_options(loader=(ModelOne, ColumnLoader(agg_function_on_aliased_query)))

Relations are:
ModelOne one-to-many ModelTwo
ModelTwo many-to-one ModelThree

i'd like to have inside ModelOne some property with all its children and in ModelTwo to have parent property filled with ModelThree
how can i achieve this?

@fantix
Copy link
Member Author

fantix commented Aug 22, 2018

Relations are:
ModelOne one-to-many ModelTwo
ModelTwo many-to-one ModelThree

i'd like to have inside ModelOne some property with all its children and in ModelTwo to have parent property filled with ModelThree

ModelTwo and ModelThree sounds like a regular many-to-one relationship, so:

model_two_loader = ModelTwo.load(parent=ModelThree)

Then the one-to-many relationship for ModelOne and ModelTwo:

model_one_loader = ModelOne.distinct(ModelOne.id).load(add_child=model_two_loader)

Finally you just need to write the query right:

ModelOne.outerjoin(ModelTwo).outerjoin(ModelThree).execution_options(loader=model_one_loader)

Please find the definition of add_child() in above docs link.

@jekel
Copy link
Contributor

jekel commented Aug 31, 2018

I have found a bug when you trying to select data from subquery and add outerjoin to it(like eagerjoin strategy in sqlalchemy), Loader can't handle it properly, because part of model names can overlap, like id, name etc...
And i've made workaround with corresponding_column selectable method, but i don't like very much how it looks like
please take a look on my commit 626e534
where i have modified ModelLoader to accept not only column names as strings
And also modified do_load in TupleLoader method to handle distinct from all chained Loaders correctly
Any comments are welcome ;)

@wwwjfy
Copy link
Member

wwwjfy commented Sep 2, 2018

@jekel If not too much trouble, could you show an example? The usage, the expected result and the actual result. I want to understand better in which situation you met this bug.

About the distinct part, it's not related to this relationship issue, right? If so, could you create another pull request, to avoid confusion? Thanks :)

@jekel
Copy link
Contributor

jekel commented Sep 3, 2018

@wwwjfy for now you can look in the tests part where it is covered - result from one table overlaps another - id column actually
it is working in this way, because modelloader is iterating over columns by their names, not Column objects
i will make two different pull requests for this bug above, and distinct loader today later

@wwwjfy
Copy link
Member

wwwjfy commented Sep 3, 2018

@jekel I've checked the test, only that it's using alias subquery while in the loader it uses the model User. If we replace the subquery with User, it can give us the result correctly.
So I wonder about your use case.

@jekel
Copy link
Contributor

jekel commented Sep 3, 2018

@wwwjfy my use case is more complex, where i have one base query like: aliased (Model, Aggregate function), + some joins, filter and one global outerjoin, and as result i have aliased query with column names like: anon_1.id,... aggregated_query_column_name, outer_joined_model.id

@jekel
Copy link
Contributor

jekel commented Sep 3, 2018

@wwwjfy i have made PR for Columns object loader support
and i'd like to have some help about searching columns in queries chain to correctly handle them
i dont like how it looks like User.load(*(map(subquery.corresponding_column, User) may be there is another way how to do it, how it does on joinedload sqlalchemy orm...

@wwwjfy
Copy link
Member

wwwjfy commented Sep 4, 2018

Thanks for those PRs. Allow me some time to investigate while I have a few doubts in mind.

wwwjfy added a commit to wwwjfy/gino that referenced this issue Sep 4, 2018
wwwjfy added a commit to wwwjfy/gino that referenced this issue Sep 4, 2018
fix the issue that Alias methods like outjoin invoked Model, which
caused the loader couldn't find corresponding columns.
wwwjfy added a commit that referenced this issue Sep 12, 2018
@mojimi
Copy link

mojimi commented Jul 19, 2019

Is the current state of relationships satisfactory for the maintainers?

When I reached the docs parts on how to deal with relationships that was a major deal breaker, it's way too complex and honestly I could barely visualize how it works, it seems more complicated than just executing a raw SQL with a join.

In my opinion it should be automatic on access or upon passing a parameter that the relationship should be returned.

@fantix
Copy link
Member Author

fantix commented Jul 19, 2019

First of all, I believe there is room to improve the overall experience of utilizing relationships, but it shall be a v1.1.x enhancement, unlikely in v0.8.x and v1.0.x.

I think it is the nature that handling relationship is complex between RDBMS and application code. Making the API simpler would have to necessarily hide some details, thus against the rule of being explicit. Actually, I do believe having raw SQL occasionally is not a bad thing in the context of using RDBMS directly - as far as you can easily load its result.

Pls lmk if that explains your concerns or you have any ideas.

@jodusan
Copy link

jodusan commented Aug 28, 2019

@fantix In the One-to-many scenario is it possible to get a parent even if no children are pointing at him?

@fantix
Copy link
Member Author

fantix commented Aug 28, 2019

@Dulex123 If you are talking about this particular one-to-many scenario, the answer is no because of the child OUTER JOIN parent. But maybe it is possible to reverse the relationship as many-to-one, so that it'll be possible to load parents without children through parent OUTER JOIN child.

@fantix
Copy link
Member Author

fantix commented Apr 20, 2020

I'm closing this issue now, as the loader could provide manual relationship support. We're still open to any proposals or new designs to support relationships at a higher abstraction level.

@fantix fantix closed this as completed Apr 20, 2020
@fantix fantix removed this from the v1.1.x milestone Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted question A community question, closed when inactive.
Projects
None yet
Development

No branches or pull requests

8 participants