Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associations & Joins #319

Merged
merged 28 commits into from May 1, 2018
Merged

Associations & Joins #319

merged 28 commits into from May 1, 2018

Conversation

groue
Copy link
Owner

@groue groue commented Mar 11, 2018

This pull request enhances the query interface so that it can build and consume some joined requests. There is much room for future enhancements, but we have a shippable API already.

Documentation: GRDB Associations

@hartbit, @smhk, @Sroka, @cfilipov, @hdlj, @mtissington, @pakko972, @palpatim, @pierlo, @schveiguy, and @swiftlyfalling, you have shown your interest in GRDB in general and sometimes on join requests in particular: I'd be glad if you would have a look at the proposed documentation. I'll be happy answering your questions, addressing your concerns, and generally improve this PR with your help.

@groue groue added this to the GRDB 3.0 milestone Mar 11, 2018
@hartbit
Copy link
Contributor

hartbit commented Mar 12, 2018

This looks amazing! 🤩 Is there any chance this API won’t make it in GRDB 3? What is the big-picture ETA for GRDB 3? The best way for me to start playing with this now is to integrate it into the application we’re working on. But I can only do so if I won’t have to completely rip the join code out later on and if GRDB 3 is released production-read-ish before we release.

@groue
Copy link
Owner Author

groue commented Mar 12, 2018

This looks amazing! 🤩 Is there any chance this API won’t make it in GRDB 3?

Thanks David! Yes, it will ship in GRDB3. I'm now confident enough that the design is not a dead-end. As the benevolent dictator of this lib, and after two failed attempts at addressing joins, I now that this is the way to go.

I have four concerns on the proposed API:

  • that it has not been bike-shedded and polished by several eyes.
  • the convention that database table names should be singular.
  • that it is too open, and allows writing unsound code that we'll never be able to support and will bite users
  • that the only "warning" you'll find in the doc hides a huge conceptual bug.

What is the big-picture ETA for GRDB 3?

Weeks.

There are unchecked items in TODO.md in the GRDB3 section. Not all of them will ship, because I may change my mind. But none of them is as hard as associations. Yet they all take a little of my free time, and there is also a lot of documentation to update, including the Medium articles. And let's not forget the precious RxGRDB.

The best way for me to start playing with this now is to integrate it into the application we’re working on. But I can only do so if I won’t have to completely rip the join code out later on and if GRDB 3 is released production-read-ish before we release.

I understand. It's difficult to commit to beta software.

A welcome contribution would be a witty QA mind.

@hartbit
Copy link
Contributor

hartbit commented Mar 12, 2018

Thanks David! Yes, it will ship in GRDB3.

Great!

Weeks.

That sounds promising. As we are not planning on releasing before June, I feel confident going forward with integrating those new APIs and beta-testing them at the same time. Worst case scenario, GRDB 3 won't be completely out yet, but I'm confident enough with the level of quality of GRDB to go ahead with that.

I'll try to document my comments here as I use those new APIs.

@groue
Copy link
Owner Author

groue commented Mar 12, 2018

@freak4pc, since this PR may also matches some of your interests, it may a be a good idea to have your opinion, too?

@groue groue mentioned this pull request Mar 12, 2018
29 tasks
@freak4pc
Copy link
Contributor

Hey @groue , Awesome work!

I started reading but had to stop to get some work done, here is some of the feedback / questions I had time to go through:

let books = try author.fetchAll(db, Author.books)

How would you feel abbout

let books = try author.books.fetchAll(db) ?

given the hasMany type having that method?


Also, hasMany is a method of one of the protocols or a free function? If the latter that might not be the best idea I think ?

Edit: Noticed its a static method on the record, seems fine to me. I'm trying to think if there's a "prettier" syntax but I think this actually looks great.


For this API:
Book.including(optional: Book.author)

What is the expected output here? I find the name a bit non-descriptive. Would it get a book including its optional author if exists? Would it get a Book that itself includes some author?


Not sure If I missed this but is there a way to fetch an object along with its relation?
e.g. for the book example with the hasMany() on top being able to do let authors = try Author.fetchAll(db) that would also fetch the underlying books connected by relation? Or would that have to still be an imperative/manual operation?

I saw the "BookInfo" structure that seems to try and solve that, but introducing a third data type isn't necessarily what I'd imagine doing (if it's at all possible).


Docs-wise I would bold terms specific to the GRDB codebase such as "The BelongsTo association between a book...", personal taste :)


I had a problem with the fact tables must be singular, until I got to the last row there and noticed you can actually define a custom name. I'm not sure I'm fully clear on the documentation there, it notes an example regarding "plural" but the written code is singular. What am I missing?


The syntax for manually defining a foreign key is a bit verbose/strange IMO :

enum ForeignKeys {
    static let author = ForeignKey(["authorId"]))
    static let translator = ForeignKey(["translatorId"]))
}

Why use an enum if using static lets? Could I use a struct if I'd want to? Why does that "ForeignKey" initializer accept an array?

@hartbit
Copy link
Contributor

hartbit commented Mar 12, 2018

I saw the "BookInfo" structure that seems to try and solve that, but introducing a third data type isn't necessarily what I'd imagine doing (if it's at all possible).

I find this solution fairly elegant. I guess we could have a solution where a tuple is returned. But I actually quite like being able to define my own structured types with the types I expected to get back.

I had a problem with the fact tables must be singular, until I got to the last row there and noticed you can actually define a custom name. I'm not sure I'm fully clear on the documentation there, it notes an example regarding "plural" but the written code is singular. What am I missing?

I've also had a personal convention of plural table names, but I'm okay changing my convention if it means easier support for joins. Concerning the example in the documentation, I think its just a typo.

The syntax for manually defining a foreign key is a bit verbose/strange IMO [...] Why use an enum if using static lets? Could I use a struct if I'd want to? Why does that "ForeignKey" initializer accept an array?

I'm fairly sure you're free to use what you want: the APIs just expect a ForeignKey. The way you define it is up to you. Using an enum with static let is a standard "hack" used in Swift for namespacing identifiers. I think ForeignKey accepts an array to support foreign keys to tables with multi-column primary keys.

@freak4pc
Copy link
Contributor

@hartbit For the first piece, I would still imagine, if we finally have a Join, that it would actually be a Join. Right now it seems you will still need to sort of do two separate fetches (one for the parent and one for the association), unless I'm missing something.

@hartbit
Copy link
Contributor

hartbit commented Mar 12, 2018

The APIs are not doing two separate fetches. Where did you get that impression from? The documentation shows the raw sql generated by each API and they all use joins:

https://github.com/groue/GRDB.swift/blob/GRDB3-Associations/Documentation/AssociationsBasics.md#joining-methods

@freak4pc
Copy link
Contributor

I'll try to clarify my point :)

This example:
let books = try author.fetchAll(db, Author.books)

Would fetch all books of a specific author.

The question is what happens if I want to fetch all authors and their respective books - that is the join piece exactly. Based on the examples I saw in the docs I couldn't find a straightforward example of this.

@hartbit
Copy link
Contributor

hartbit commented Mar 12, 2018

That example is just an example of a query that uses the foreign key information to get all books of a specific author, which would have required Book.filter(Book.authorId = author.id) beforehand. As for doing actual joins, the first section shows a nice example:

struct BookInfo: FetchableRecord, Codable {
    let book: Book
    let author: Author?
}

let request = Book.including(optional: Book.author)
let bookInfos = BookInfo.fetchAll(db, request)

For more details, see the section titled Fetching Values from Associations:

https://github.com/groue/GRDB.swift/blob/GRDB3-Associations/Documentation/AssociationsBasics.md#fetching-values-from-associations

@freak4pc
Copy link
Contributor

Yes but this is exactly the structure I had a problem with -
Why do I need an intermediary object at all in this case? This isn't a many-to-many relationship, there shouldn't be a "central" object IMO. Unless this is a limitation of GRDB, but a true SQL JOIN can fetch an author with its connected books in its respective objects without any "intermediary" table (or object, in this case)

@hartbit
Copy link
Contributor

hartbit commented Mar 12, 2018

For me, this fits my mental model of SQL very well: there is an intermediary object because it represents every row of the request result. Its a solution that evolves well whatever the kind of JOIN result you have, even to multiple joins and many-to-many relationships. A solution which would store books in a books property on Author would look "nice" for one-to-many relationships but would fall short for any more general cases.

@freak4pc
Copy link
Contributor

So I guess we are in a disagree about this - even if it fits your mental model of this, this isn't how SQL Joins look like usually.

A SQL Join would usually provide an object with its foreign-keyed objects, without any intermediary object.

Using an intermediary object is very much like a having SQL View (e.g. a virtual table). So it highly depends on what kind of solution you're trying to architect here, but this isn't exactly a join per-se (structure-wise, even though the underlying is a query)

@hartbit
Copy link
Contributor

hartbit commented Mar 12, 2018

So I guess we are in a disagree about this - even if it fits your mental model of this, this isn't how SQL Joins look like usually.

Imagine a more real-world database where books can be authored by multiple authors (many-to-many relationship). In that case, an SQL JOIN like this:

SELECT authors.*, books.*
FROM authors
INNER JOIN authorBooks ON authorBooks.authorId = authors.id
INNER JOIN books ON books.id = authorBooks.bookId

couldn't be represented without an intermediate object.

For an "expert" API like GRDB joins, I think we need to embrace that joins are complicated beasts and provide an API that allows maximum flexibility.

@freak4pc
Copy link
Contributor

The question is- are we trying to solve the 10% or the 90% ? Would you agree a double inner-join isn't that common of a use case?

What I'm asking about is:

SELECT authors.*, books.*
FROM authors
INNER JOIN books ON books.authorId = authors.id

E.g. an author and his book

@hartbit
Copy link
Contributor

hartbit commented Mar 12, 2018

I think its very personal, but my joins are never that simple.

@freak4pc
Copy link
Contributor

freak4pc commented Mar 12, 2018

And that's absolutely fine - but I can honestly tell you the majority of people would want an object, and its directly-relationed objects with it, and not a more complex join necessarily, so I think its definitely possible to model a solution that would allow for the 90% to have a more concise syntax, while the more complex issues can be solved with an intermediary object.

But using an intermediary object for this basic scenario is quite the bummer TBH.

@groue
Copy link
Owner Author

groue commented Mar 12, 2018

Hey @groue , Awesome work!

I started reading but had to stop to get some work done, here is some of the feedback / questions I had time to go through:

Hello Shai, thanks for your feedback (thanks David, too)! Let's answer your questions!!

let books = try author.fetchAll(db, Author.books)

How would you feel abbout

let books = try author.books.fetchAll(db) ?

given the hasMany type having that method?

You have seen that Author.books is a static propery of Author (it contains an association). There is no books property on Author instance that is automatically defined.

I have been wondering if requesting all books from an author had to be done from an author (give me all books given this association), or from the association (give me all books given this author).

Since other ORMs usually load associated records by the mean of methods on the source record, it's been my choice too:

// Editor's pick: Fetch books from author
let books = author.fetchAll(db, Author.books)

// Discarded: Fetch books from the association:
let books = Author.books.fetchAll(db, from: author)

Now fetching methods are not enough. We also need requests, which can feed database observation tools. That's why we also have:

let request = author.request(Author.books)
request.rx.fetchAll(in: dbQueue).subscribe { books in
    print("new books: \(books)")
}

So: author.fetchAll(db, Author.books) is a subjective choice to put the origin record on the right of the expression, like other ORMs (but with stranger syntax because there is no automatic generation of properties).

And the documentation on this topic has to be improved, because it mixes requests (advanced feature) and fetching methods (that everybody needs) on the same level.


Also, hasMany is a method of one of the protocols or a free function? If the latter that might not be the best idea I think ?

Edit: Noticed its a static method on the record, seems fine to me. I'm trying to think if there's a "prettier" syntax but I think this actually looks great.

Well spotted, and thanks. Yes, "hasOne", "hasMany", "belongsTo" have the nice historical log of abstractions that are well understood by developers.


For this API:
Book.including(optional: Book.author)

What is the expected output here? I find the name a bit non-descriptive. Would it get a book including its optional author if exists? Would it get a Book that itself includes some author?

It is described here:

"This request fetches all books, including their author's information. Books that don't have any author (a null authorId column) are fetched as well."

I'm not sure I understand your question, which may mean that I totally failed at explaining the four joining methods, and that you built a mental model of them that is at odds with their actual behavior. No offense, this happens all the time when APIs are not clear enough.

I don't know if would still ask this same question now. If you do, would you be able to ask it again, but with different words so that I can figure out what's in your mind, and give a proper answer?


Not sure If I missed this but is there a way to fetch an object along with its relation?
e.g. for the book example with the hasMany() on top being able to do let authors = try Author.fetchAll(db) that would also fetch the underlying books connected by relation? Or would that have to still be an imperative/manual operation?

Excellent question. This deserves a new entry in the Future Directions paragraph.

No. Today, there is no API to load all authors will all their books, and you'd have to do it manually. I had the gut feeling that it was not a feature required for an initial release.

I saw the "BookInfo" structure that seems to try and solve that, but introducing a third data type isn't necessarily what I'd imagine doing (if it's at all possible).

1000 points question again. Not only are those third data types the very reason why we can ship this joining API. But they also are a very recommendable way to consume database values.

The previous attempt at providing an API that generates joins would not use those "third data types". It would return tuples instead. It did address, for example, the "all authors will all their books" request by returning [(Author, [Book])].

But without third data types, we have big issues:

  1. Swift has no variadic tuples or generics. This means that we're stuck with hard-coded limits (join 2 types, join 3 types, end.). This just doesn't scale.
  2. Tuples are handy. But "third data types" are even more handy. Types have an identity. Types can be extended with convenience properties and methods. They can be fed from the database, but also remote JSON data, whatever. They provide excellent support for view models.

And "Third-party types" have another benefit (because I have a hidden agenda): by fostering the decoding of joined requests into third-party types, I grant GRDB users with data consistency for free. And the topic of lazy loading of associated records vanishes. What's better than this "BookInfo" type that contains exactly what I need? Why feed it manually with several db requests performed at various times in my app when I can get it for free?

As David said:

I find this solution fairly elegant. I guess we could have a solution where a tuple is returned. But I actually quite like being able to define my own structured types with the types I expected to get back.

That's exactly my expectation.

You later asked:

[Shai]: The question is what happens if I want to fetch all authors and their respective books - that is the join piece exactly. Based on the examples I saw in the docs I couldn't find a straightforward example of this.

Unless I'm mistaken, such a requests needs two SQL queries, not a single one (please correct me if I'm wrong):

SELECT * FROM authors WHERE ...
SELECT * FROM books WHERE authorId IN (...)

Such requests can currently not be expressed with the FetchRequest protocol (and not be observed by observation tools, btw). FetchRequest has to build a single SQLite statement in its prepare() method.

I was expecting that future additions would add another protocol on top of FetchRequest, in order to support more general requests. Such general requests could not be iterated with cursors, but they would still be able to feed observation tools.

This makes me think that I may have to rename FetchRequest into StatementRequest, so that FetchRequest can later be used for the more general "fetching" protocol. (edit: this subject is more detailed at RxSwiftCommunity/RxGRDB#27 (comment))

You again:

a true SQL JOIN can fetch an author with its connected books in its respective objects without any "intermediary" table (or object, in this case)

I know that you like model objects to have properties to their associated objects. You call that "modelling", and it's a fair habit that is widely supported by many years of software design, ORM APIs, accross many years and thousands of lines of code:

class Book {
    var title: String
    var author: Author
}

I want to break this habit 🤘

It's OK when your models are managed, uniqued, and lazy-loaded.

GRDB records are much more low-level: a fetched record is an in-memory cache of database values. It can be a plain Swift struct.

That's why we need cache invalidation, and why I put so much energy on providing easy-to-use database observation techniques :-)

This means that the above Book class with its author property, when it exists, belongs to the application. The application is sole responsible of its management (which is not easy).

By ditching uniquing, lazy loading, and not managing records, GRDB was able to simplify multi-threaded code. The cost is that records have authorId properties instead of author. And we have those "third-party types".

I wholeheartedly understand your point. And I jump with passion in the opposite direction, with many many types focused on precise tasks.

My personal experience (which guides me a lot) shows that those types are often private types inside view controllers (or view models or whatever). Add I like to focus on a local and isolated type, instead of bloating a unique model type with convenience methods used by a single screen of an app.


Docs-wise I would bold terms specific to the GRDB codebase such as "The BelongsTo association between a book...", personal taste :)

Very good idea. There are so many new concepts in this doc!


I had a problem with the fact tables must be singular, until I got to the last row there and noticed you can actually define a custom name. I'm not sure I'm fully clear on the documentation there, it notes an example regarding "plural" but the written code is singular. What am I missing?

As David has noticed, it's surely a typo. I have tried to be consistent (with the only exception of the "demographics" table which is for me the needle that reminds me that something is not "extra clear" here).

I deeply regret this "singular table" convention. I really want to remove it.

There it lies:

let author = Book.belongsTo(Author.self)
let request = Book.including(required: author)
let row = try Row.fetchOne(db, request)!
print(row.debugDescription)
// ▿ [id:1, authorId:2, title:"Moby-Dick"]
//   unadapted: [id:1, authorId:2, title:"Moby-Dick", id:2, name:"Herman Melville"]
//   - author: [id:2, name:"Herman Melville"]

You see the author: [id:2, name:"Herman Melville"] line? We need this "author" key to come from somewhere. And it's better be singular, because it then fits the author property of "third-party types" like a glove.

This key comes from the association Book.belongsTo(Author.self).

That association has not much anything to chew on: the Book type (irrelevant), and the Author type, which comes with its databaseTableName static property, which contains "author" when the table has a singular name. And that's why singular names are preferred. There is no other reason.

I've tried to build "author" from the Author type itself, by playing with String(describing: Author.self). But it often returns an awful string that requires careful parsing.

To remove the "singular table" convention, we just need to be able to generate "author" from the Author type.


The syntax for manually defining a foreign key is a bit verbose/strange IMO :

enum ForeignKeys {
    static let author = ForeignKey(["authorId"]))
    static let translator = ForeignKey(["translatorId"]))
}

Why use an enum if using static lets?

Right. This "namespace" enum has no purpose, and pollutes the doc.

Could I use a struct if I'd want to? Why does that "ForeignKey" initializer accept an array?

As David said above, because foreign keys can span several columns.

About the "foreign key" name. It's hard-core, very low-level. I kept it because the two types at both ends of an association use the same foreign keys so disambiguiate their own associations:

struct Book: TableRecord {
    // Define foreign keys
    static let authorForeignKey = ForeignKey(["authorId"]))
    static let translatorForeignKey = ForeignKey(["translatorId"]))
    
    // Use foreign keys to define associations:
    static let author = belongsTo(Person.self, using: authorForeignKey)
    static let translator = belongsTo(Person.self, using: translatorForeignKey)
}

struct Person: TableRecord {
    static let writtenBooks = hasMany(Book.self, using: Book.authorForeignKey)
    static let translatedBooks = hasMany(Book.self, using: Book.translatorForeignKey)
}

I agree that this is clearly not the most elegant part of the API.


Thanks for your feedback so far! I already a nice list of improvements to do, and I hope I could answer your questions and concerns, even if it may look like we disagree sometimes.

@groue
Copy link
Owner Author

groue commented Mar 12, 2018

But using an intermediary object for this basic scenario is quite the bummer TBH.

I've given my point of view above, but yours is still interesting. Do would have an idea of what it would look like (the definition of two book/author types, and the ideal API that loads what you need)?

@freak4pc
Copy link
Contributor

I've given my point of view above, but yours is still interesting. Do would have an idea of what it would look like (the definition of two book/author types, and the ideal API that loads what you need)?

Given your explanation on top I don't think you'd really want to do this :)

I would imagine for a behavior that causes the "secondary" object to be fetched along with the first without an intermediary object, but you've fully explained your stance on it and I'm fully OK with it. I don't think I have any serious beef with it at this point, it's just a API design choice.

@groue
Copy link
Owner Author

groue commented Mar 12, 2018

@freak4pc I'm happy we understand each other. Now if I write bold words and use the 🤘 emoji, this does not mean that I don't want to listen (although heavy rock music can damage ears).

Maybe I can smooth your concerns by telling that supporting classic modeling is just an init(row:) away:

class Book: FetchableRecord {
    static let author = belongsTo(Author.self)
    
    var title: String
    lazy var author: Author { /* fetch the author if not loaded yet */ }
    private var authorId: Int64 // support for the author lazy property
    
    init(row: Row) {
        self.title = row["title"]
        self.authorId = row["authorId"]
        if let authorRow = row.scoped(on: "author") {
            self.author = Author(row: authorRow)
        }
    }
}

let books = try Book.including(required: Book.author).fetchAll(db)
for book in books {
    print("\(book.title) by \(book.author.name)")
}

I just can't really foster such pattern in the documentation, because lazy loading of associated records breaks the data consistency guarantee (and I morally can't give dangerous advice). It's a little too smart, you see?

@freak4pc
Copy link
Contributor

Yeah I'm aware of the lazy method but I can get this done without this PR, so I didn't even mention it :)

No worries, I totally what your architectural standpoint is and it's 100% good! 💯

@groue
Copy link
Owner Author

groue commented Mar 13, 2018

For information, some tests fail because the various tested SQLite versions don't all agree on the DatabaseRegion fetched by joined requests. Database region define regions such as "columns a and b from table t". They provide support for transaction observers that want to spot changes in the results of a request (FetchedRecordsController and RxGRDB).

As far as I can see, tests of database regions have to be adapted, but they don't reveal any observer breakage (although I'll have to add a few integration tests that lift remaining doubts).

@groue
Copy link
Owner Author

groue commented Mar 13, 2018

I did start using @freak4pc's feedback:

How would you feel about

let books = try author.books.fetchAll(db) ?

After a good second read, I find your idea stellar.

The Requesting Associated Records chapter has been rewritten accordingly. author.fetchAll(db, Author.books) and similar methods that "fetch the contents of a relation from a source record" have been removed.

For this API: Book.including(optional: Book.author)

What is the expected output here? I find the name a bit non-descriptive.

The Joining Methods chapter has been fully rewritten, and I hope it's much more clear now.

@groue
Copy link
Owner Author

groue commented Mar 13, 2018

Docs-wise I would bold terms specific to the GRDB codebase such as "The BelongsTo association between a book...", personal taste :)

Good idea, done.

The syntax for manually defining a foreign key is a bit verbose/strange IMO :

enum ForeignKeys {...}

Yes. I have removed the surprising enum in the Foreign Keys chapter.

@freak4pc
Copy link
Contributor

After a good second read, I find your idea stellar.

The Requesting Associated Records chapter has been rewritten accordingly. author.fetchAll(db, Author.books) and similar methods that "fetch the contents of a relation from a source record" have been removed.

Wooo! 🎉 Love this new API ! Looking forward :) Also documentation updates look awesome.

@groue groue merged commit 7a30eb2 into GRDB3 May 1, 2018
@groue groue deleted the GRDB3-Associations branch May 1, 2018 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants