Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scoped stores and the type property #106

Closed
gr2m opened this issue Dec 26, 2016 · 21 comments
Closed

scoped stores and the type property #106

gr2m opened this issue Dec 26, 2016 · 21 comments

Comments

@gr2m
Copy link
Member

gr2m commented Dec 26, 2016

The current Hoodie Client provides a convenience method to get a store API which is scope to documents with the .type property set to the passed scope

var scope = 'todo'
var todoStore = hoodie.store(scope)
todoStore.add({text: 'Remember the milk!'})
// creates document with {id: 'uuid567', type: 'todo', text: 'Remember the milk!'}

The problem with the .type property is that people might use it for other things than scoped stores. We decided to work around this with a lot of effort, among others we had to implement virtual add / remove events when a user changes the .type property

var todoStore = hoodie.store('todo')
var noteStore = hoodie.store('note')
todoStore.on('change', function (event) {console.log('todoStore event:', event})
noteStore.on('change', function (event) {console.log('noteStore event:', event})
hoodie.store.on('change', function (event) {console.log('store event:', event})
hoodie.store.update('uuid567', {type: 'note'})
// logs:
// todoStore event: remove
// noteStore event: add
// store event: update

We made all this in the spirit of Dreamcode, we discussed what the nicest possible API would be and then tried to make it work backwards. And we did.

Shortcomings

Implementing the above APIs adds a lot of complexity and has lead to many very tricky to find errors in the past. For example hoodiehq/hoodie#612 and hoodiehq/hoodie-store-client#95

Another major problem is that this behavior is confusing and not predictable. Today I think we tried to be too smart. People don’t really understand what the hoodie.store(scope) API does, how it behaves and how it goes together with the rest of the hoodie.store.* APIs.

But maybe the biggest problem a.k.a. learning I had in the past based on many discussions with the Hoodie Community and people from other projects: a simple API is not helpful if it does not scale. It’s not enough to create Dreamcode for people to get going quickly. It must remain clear and usable, even as the application grows in complexity and usage. And I think in that regard we failed. Not only because of confusing, complexity and bugs, but because we did not embrace the Nr. 1 tip when it comes to storing data using PouchDB / CouchDB: use the _id property as much as possible for querying, to avoid secondary indices as much as possible.

I’m working with CouchDB since over 5 years now, and I have not worked on a single project that does not use the _id properties to big extend in order to make the apps and analytics as performant as possible. All projects where we use PouchDB / CouchDB, we would definitely decide to store the above document with _id prefixes, like "_id": "todo/uuid567". Yes that means that the "scope" cannot be changed, but that is by design. Using the .todo property was a bad design decision.

I’d like to hear all your thoughts on this, but I think we should come up with something better now, because a change will result in requiring to change every single document stored by Hoodie apps, and we better do this before there are more apps in production.

Embracing the primary index

I think we better embrace the best practise of using the _id property from the beginning in Hoodie, with good documentation, API design and mental models, instead of trying to hide away. For the most simple applications people won’t probably need to use it, but for more complex one there should be a clear API.

I thought a lot about this and my recommendation is we remove the hoodie.store(scope) API entirely in favor of hoodie.store.withIdPrefix(prefix). Example:

var todoStore = hoodie.store.withIdPrefix('todo/')
todoStore.add({text: 'Remember the milk!'})
// creates document with {id: 'todo/uuid567', text: 'Remember the milk!'}

In most ways it will work very similar to hoodie.store(scope), without the need of a .type property. We don’t need to explain that the id property is special and that it cannot be change, only trough removal and re-creation with a new one. And I think that the meaning prefix is clear to most and can be quickly looked up by others. .withIdPrefix is fairly self-explanatory, it tells me that all APIs that it returns imply the given idPrefix.

Here are a few things that I don’t find ideal and maybe together we can come up with a smart way to workaround it, or maybe a good documentation has to do

  1. I am not sure if the it would be better if we would find a way to remove the slash / from todo/ and make it an internal implementation detail. But on the other side, I think it’s okay to document and explain that using / as separator is the recommended way by Hoodie apps
  2. the _id prefix is often times used for one-to-many relations. Say my todo app has both lists and todos that belong to the lists. Best practise would be to have id’s like "_id": "list/uuid567" for the lists and "_id": "list/uuid567/todo/uuid8910" for the todos that belong to that list (unless moving items is very common, but let’s leave that out of scope for this example). Now I no longer can find all lists only with hoodie.store.withIdPrefix('todo/').findAll(), the result would also include all todos. I have no simple solution to this. But maybe this is such advanced usage that Hoodie can clearly explain both the best pratises and the .withIdPrefix API, and it’s up to the developers to filter out the minutes themselves. It is a feature after all to get a list and all its todos with something like hoodie.store.withIdPrefix('list/uuid567').findAll(). Or maybe we want to add an extra property which would allow us to query the list documents only. See also the suggestion in Store: Hoodie properties added to documents #105 which would give us a namespace so there no longer would be a conflict with document properties used by the app.

This is how far I got, I’d love to hear your thoughts on this :)

@gr2m
Copy link
Member Author

gr2m commented Dec 26, 2016

ping @hoodiehq/maintainers I would love to hear your thoughts on this ^^ It’s a discussion on a significant Hoodie Client API change, something we have to get right now, so we avoid changes in the future, especially ones that would imply changes in how documents are stored / retrieved

@rogeriochaves
Copy link
Member

rogeriochaves commented Dec 28, 2016

use the _id property as much as possible for querying, to avoid secondary indices as much as possible.

I don't know couchdb very much, could you explain why is that? CouchDB doesn't play well with secondary indices? Performace?


Well, until the middle I was thinking "we don't really need to change the API for that do we? We can make hoodie use the _id rather than type behind the curtains, right?". But then the list/uuid567/todo/uuid8910 example struck me.

I believe nested stores like that are a very important use-case, which I've needed myself in offline-first apps in the past, and the withIdPrefix name definitively make it obvious. And the cool thing is that devs could get creative on how they organize that, for instance calling it
in-list/uuid567/todo/uuid8910, so it would be possible to get all lists without its todos, and all todos in a specific list separately, although that would be 2 requests.

So... yeah, this works for me 👍

@gr2m
Copy link
Member Author

gr2m commented Dec 28, 2016

I don't know couchdb very much, could you explain why is that? CouchDB doesn't play well with secondary indices? Performace?

Performance and harddrive usage. I’m sure there are great resources on that topic out there, but don’t know out of my head. Maybe @janl can point us to some?

@pmbanugo
Copy link
Member

pmbanugo commented Jan 1, 2017

People don’t really understand what the hoodie.store(scope) API does, how it behaves and how it goes together with the rest of the hoodie.store.* APIs.

I still don't understand yet, how it goes with the rest of the API and so can't offer any suggestion on this.

Yes that means that the "scope" cannot be changed, but that is by design. Using the .todo property was a bad design decision.

the scope shouldn't be easy to change. Maybe a move API which drops and recreates it with another id in the new scope.

I am not sure if the it would be better if we would find a way to remove the slash / from todo/ and make it an internal implementation detail. But on the other side, I think it’s okay to document and explain that using / as separator is the recommended way by Hoodie apps

I think the slash (/) should be internal implementation details communicated how it's stored/used under the curtain.

@inator
Copy link

inator commented Jan 3, 2017

Well I'm inclined to trust the pros on this one, as I don't personally have a lot of experience with the performance impacts attributed to secondary indexes.

I'm also not opposed to the use of the slashes, but think we need to keep the sausage factory doors closed as much as possible - as these implementation details can be confusing for the typical targeted developer.

In the end, the same goal is achieved and if scaling is improved, that's a plus for everyone. Clearly its a matter of coming up with an understandable way of documenting it, or abstracting that complexity away. Unfortunately, I'm straining my brain to come up with another option. :/

@minrwhite
Copy link
Member

My thoughts (as a relative CouchDB newbie):

My preferred terminology as opposed to 'Prefix' would be 'Namespace' e.g. .withIdNamespace('contact').

I'd like this API to be extendible with extra arguments in the same way as, for example, Node's fs.join. This would allow you to do things like .withIdNamespace('contact', uniqueContactId, 'phoneNumber'), resolving internally to seek CouchDB _ids beginning with contact/abc...567/phoneNumber/. This could also potentially allow the use of alternate DB backends that are better optimised for querying via multiple indexes, by breaking this namespacing into multiple fields, while presenting the same facade to the user. This API usage would turn the slash (or alternate separator) into an implementation detail, but I'm with @inator on keeping this kind of detail internal as much as possible. Finally calling like this would easily allow what I think @rogeriochaves is suggesting, in the application of some kind of transform to the higher-level namespace names on subdocuments' _ids so that it is obvious when it is todos that are sought, and when it is lists (as per the example by @gr2m )

@gr2m
Copy link
Member Author

gr2m commented Jan 5, 2017

I like .withIdNamespace and the idea of passing multiple arguments into it!

What I like about the word "prefix" is that it makes clear that it is still a part of the id property, but I guess namespace is the same?

If you are interested in playing around with it, I made a quick-and-dirty implementation of the .withIdPrefix(prefix) API at hoodiehq/pouchdb-hoodie-api#122. It feels pretty cool in terms of functionality, I’m open regarding the name

@pmbanugo
Copy link
Member

pmbanugo commented Jan 5, 2017

I like the idea with the multiple arguments too.

When you say namespace are you referring to the properties on the other issue about where to put hoodie generated properties?

@gr2m
Copy link
Member Author

gr2m commented Jan 5, 2017

When you say namespace are you referring to the properties on the other issue about where to put hoodie generated properties?

no this is just about the id property. store.withIdNamespace('list', 'uuid567', 'item') would match all documents that have id properties starting with list/uuid567/item/

@pmbanugo
Copy link
Member

pmbanugo commented Jan 5, 2017

.withIdPrefix for me would be easy to understand compared to withIdNamespace since it'll be a prefix of the id

@inator
Copy link

inator commented Jan 5, 2017

I also prefer .withIdPrefix and really like the idea of multiple arguments!

@minrwhite
Copy link
Member

I'm glad the multiple arguments idea has been well received!

Apologies, @pmbanugo about the confusion with the 'namespace' terminology, I missed that conflict between the hoodie property namespace of a store object and what I would call the namespace of the object, or perhaps the address of the object.

Of course we don't want to hide the fact that, as part of the implementation for CouchDB, the id is prefixed with it's address to make retrieval easier; however my advocacy for the terminology 'Namespace' (or something else) over 'Prefix' is based around leaving some abstraction of the implementation from the API. If at some point in the future hoodie starts using an alternative DB engine, or CouchDB is changed in some way, so that retrieving store objects with a particular object address (a.k.a prefix, a.k.a namespace) is possible using a more performant way than including it in the _id field, then to take advantage of this, the API name would either require changing or run the risk of becoming counter-intuitive.

@gr2m
Copy link
Member Author

gr2m commented Jan 6, 2017

I gave this some more thought. No breakthrough :) but maybe it helps to share:

I really like the idea of passing in multiple arguments (or an array of strings) as segments of the id prefix, so that we can leave out the "/" separator as an implementation detail.

hoodie.store.withIdPrefix('list') // matches all documents with IDs starting with "list/"
hoodie.store.withIdPrefix('list', 'uuid567','item')  // matches all documents with IDs starting with "list/uuid567/item/"

The downside is that I can think of cases where I want to listen to changes in both the list document as well as all its items. But hoodie.store.withIdPrefix('list', 'uuid567') would not match the list, because of the trailing / ("list/uuid567/" while the list id is "list/uuid567").

I don’t mind finding a better name that .withIdPrefix(). Instead of .withIdNamespace(), I would suggest to simply call it .namespace(). I find the definition from Wikipedia very fitting for our case

A namespace in computer science [..], is an abstract container or environment created to hold a logical grouping of unique identifiers or symbols [..]. An identifier defined in a namespace is associated only with that namespace.

With hoodie.store.namespace() we could also allow to pass in a document with an .id property, like so

var list = {id: 'list/uuid123', title: 'shopping list'}
var listNamespace = hoodie.store.namespace(list)
listNamespace.on('change', handleListAndItemChange)

This API could workaround the problem I mentioned above with the trailing /. And if we want to support that, I prefer store.namespace(list) over store.withIdPrefix(list). Reading the latter is rather confusing me. This API would also allow to do store.namespace(list, 'item') which is pretty neat I think.

@gr2m
Copy link
Member Author

gr2m commented Jan 6, 2017

relevant discussion from PouchDB slack

screen shot 2017-01-06 at 10 05 43

@janl
Copy link
Member

janl commented Jan 7, 2017

To sum this up a little. Using the document _id in CouchDB and PouchDB gives you some performance advantages for some types of queries as opposed to other mechanisms to query.

Especially, these ”some types” are very common ones and they are solved by what is called internally as the “by-id” index. You know this as /db/_all_docs in CouchDB or db.allDocs() in PouchDB.

The biggest advantage here is that C/PouchDB need this index internally, so it is always there and implementations will try to make sure it is most efficient. Essentially, any queries we can handle using this index, we get ostensibly for free.

What’s in an index?

There are two main modes of operation for an index:

  1. index creation
  2. index querying

2. is the one reason why there are indexes at all. In other words, the index is there to be queried. And as a result, generally, and index is optimised in a way that querying it is very fast.

1. is the problem here. Especially in PouchDB, but to some extend in CouchDB as well. Maintaining secondary indexes require extra work (in terms of setup, e.g. creating an index definition), and creating the actual index by going through the main database and extracting whatever information is required for queries.

PouchDB-Next

Now from the quoted Slack discussion above. There is a slow moving project inside PouchDB that would make secondary index creation A LOT more efficient than it is today. Like a lot a lot, like even better than CouchDB does it.

The downside is: we can’t bet on it just yet. This is a work in progress, it won’t be as widely cross-platform/storage as the current PouchDB implementation and only supported IndexedDB. In fact, not supporting different storage backends like WebSQL etc. is what makes index creation performance A LOT better. Some cross-platform/storage compatibility could be added later of course. The main point here is though: this is going to take a while.

hoodie.store.query()

The real question here now is, what should a Hoodie query mechanism look like, behave like, and be implemented like?

I have no answer to this yet, but I feel once we have a bit of an idea here, we can tell how the withIdPrefix/namespace idea fits into it.

While I’ve argued elsewhere that we shouldn’t use this particular term, we could introduce the concept of “collections” (the name comes from MongoDB) for what we currently call scoped store. A collection would simply be a list of documents with an _id-prefix (just like we are discussing here) and it’d allow the usual grouping by whatever type. We can even do nested collections, like shown with the multiple arguments to withIdPrefix().

The point here though is to make this a thing separate from queries. You can later query individual collections, or the default collection of all the objects your store, or only a sub-collection, and Hoodie can abstract away all the _id prefix magic there*.

This would also mean that we’d hide this implementation details again, as opposed to the original intent here exposing the C/PouchDB implementation detail of using the _id for clever things, which I liked from motivational and educational points of view.

*One question though, I’m not sure I understand how hoodie.store.withIdPrefix() would avoid the intricate issues you referenced (hoodiehq/hoodie#612 and hoodiehq/hoodie-store-client#95) that hoodie.store(scope) had. It seems we just renamed the method here (and the implementation of course), but the complexity remains, does it not?

@janl
Copy link
Member

janl commented Jan 7, 2017

One more thing: on the flip-side, if we have a flexible and efficient hoodie.store.query(), maybe we just don’t need withIdPrefix()/et.al., because it can handle all/most queries anyone needs, including getting a list of docs by type.

@gr2m
Copy link
Member Author

gr2m commented Jan 8, 2017

Amazing comment 👏

*One question though, I’m not sure I understand how hoodie.store.withIdPrefix() would avoid the intricate issues you referenced (hoodiehq/hoodie#612 and hoodiehq/hoodie-store-client#95) that hoodie.store(scope) had. It seems we just renamed the method here (and the implementation of course), but the complexity remains, does it not?

Yes, we can ignore these. These problems are more related to the fact that hoodie.store() can be called as a function directly instead of having a method like hoodie.store.scope(), sorry for bringing these up.

The real question here now is, what should a Hoodie query mechanism look like, behave like, and be implemented like?

The main motivation for the current hoodie.store() or the suggested .withIdPrefix() / .namespace() APIs is simpler integration with frontend frameworks, that very often have a concept of models / collections.

I think the question about querying is out of scope here and should be addressed separately? The scoped APIs (or however we want to refer to them) are not only about finding data, but also about creating / updating and about listening to changes.

@janl
Copy link
Member

janl commented Jan 8, 2017

Yes, we can ignore these. These problems are more related to the fact that hoodie.store() can be called as a function directly instead of having a method like hoodie.store.scope(), sorry for bringing these up.

So solving this is an implementation detail (not to diminish the work to fix it, but conceptually, we know how to fix this)?

The main motivation for the current hoodie.store() or the suggested .withIdPrefix() / .namespace() APIs is simpler integration with frontend frameworks, that very often have a concept of models / collections.

That makes perfect sense. Maybe we should adopt “collection” for this?

I think the question about querying is out of scope here and should be addressed separately?

For now I’d like to at least understand how these two will interoperate. Should users be able to make a query on a collection (very nice feature, not easy to implement, but probably doable) just like they would query all their data (very easy with native PouchDB indexing), or would we have the latter only?

We can figure out all the nasty details later, I just want to understand your thinking here.

@gr2m
Copy link
Member Author

gr2m commented Jan 8, 2017

Just want to say: I very much enjoy this discussions about Hoodie APIs <3 I’m sure something great will come out of this.


Yes, we can ignore these. These problems are more related to the fact that hoodie.store() can be called as a function directly instead of having a method like hoodie.store.scope(), sorry for bringing these up.

So solving this is an implementation detail (not to diminish the work to fix it, but conceptually, we know how to fix this)?

Yes, but it will be a breaking change, we can’t solve it with the current API I think. Instead of hoodie.store() we need something like hoodie.store.<method name>(). I think what ever we agree on in this thread, it will also resolve hoodiehq/hoodie#612 and hoodiehq/hoodie-store-client#95

Maybe we should adopt “collection” for this?

Calling it hoodie.store.collection() was my initial impulse, I like it. I think it would cover the most common cases where we just pass in the name of the collection like hoodie.store.collection('project') or hoodie.store.collection('message') and it would hide away the internal implementation (be it using _id prefixes or not). It could still accept a 2nd argument to for more advanced usage like hoodie.store.collection('item', {idPrefix: 'measurement/2016-01-02/temperature/'}) (in case we decide to use id prefixes).

What I like about .withIdPrefix is that I think we need to educate the Hoodie users anyway about why ID prefixes are relevant once their apps become a bit more advanced. But having .collection(name, {idPrefix}) would work just as well, maybe even better, because it would only needs to be understood once nested collections become relevant.

For now I’d like to at least understand how these two will interoperate.

I would have a query API on the root API only (just like .sync() for example), while the store.collection() (or however we end up calling it) only returns the methods to add, find, update and remove documents as well as listen to changes.


I wonder if it would be helpful to come up with a few different use app examples, for which we would list the different type of documents we need to store and access. Then we could create dream code for the different API suggestions and see how they compare? It would certainly take some effort, but maybe it will make a good base of discussion that we can keep up-to-date as the discussion evolves, and helps more community members jump in at any point?

@gr2m
Copy link
Member Author

gr2m commented Feb 26, 2017

We had some discussion on this with @janl while we have been at the CouchDB Dev Summit. Our recommendation is to go ahead with .withIdPrefix(prefix) and see how it feels. It is something that is comparably simple to change if it should turn out to not be the ideal solution.

Our thinking is that it’s not the first API that one would use when familiarizing with Hoodie. A person new to Hoodie would use the base hoodie.store.add() etc APIs. Once someone needs different types of types of documents, it will be a good time to educate the developer about the internals of Hoodie and why we use prefixes to differentiate different types documents and what else they are good for. Once we explained that properly, the API itself becomes very clear, and we can simply recommend to use / as a delimiter, without enforcing it.

We have a PR ready at hoodiehq/pouchdb-hoodie-api#124, we’d love a review 👀

@gr2m
Copy link
Member Author

gr2m commented Mar 4, 2017

on it’s way via hoodiehq/pouchdb-hoodie-api#124

@gr2m gr2m closed this as completed Mar 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants