Skip to content
This repository has been archived by the owner on Oct 24, 2021. It is now read-only.

Data Loading and Management #33

Closed
tmeasday opened this issue Oct 19, 2015 · 53 comments
Closed

Data Loading and Management #33

tmeasday opened this issue Oct 19, 2015 · 53 comments
Assignees
Labels

Comments

@tmeasday
Copy link
Contributor

Article outline

https://github.com/meteor/guide/blob/master/outlines/data-loading.md

Major decision points

  1. Data loading is best done through publications
  2. All subscriptions should happen inside UI components. Even "global" subscriptions should be done in the app layout component. Data loaded from a subscription should be accessed in the same component, and passed down through arguments, rather than relying on global data to be available in Minimongo
  3. There's a strategy for pagination here, we should investigate what works well in production apps
  4. Client-only data should be in a Tracker-enabled store, for example a ReactiveDict wrapped in an API
  5. Relational data should be published using publish-composite
  6. External data should be pushed to the client through publications - for example, you can poll a REST endpoint through a pub

Old outline:

Proposed outline

  1. Loading and publishing data from Mongo on the server.
  2. Subscribing to data on the client
    • For now, just in the straightforward way, emphasize autorun re-sub behaviour and workarounds
  3. Client only data (Stores) vs persistent server data (Collections)
  4. Modifying data ("actions"? -- store mutators or methods)
  5. Complex publications:
    • Relational data - use publish-composite to publish relational data.
    • limiting data to what you need
    • reusing publications vs limiting them.
    • pagination patterns
  6. Publishing data from 3rd party sources
    • Poll-publish pattern
  7. Publications as RESTful endpoints

Open Questions

  • Should webhooks be part of the methods article? I think so
  • Do we encourage people to pass queries / options into subscriptions? I think no.
@tmeasday
Copy link
Contributor Author

@justinsb This would possibly be a chapter you might want to weigh in on. I'll ping you again when it's more fleshed out.

@tmeasday
Copy link
Contributor Author

See also #11

@mitar
Copy link
Contributor

mitar commented Oct 26, 2015

So for me the most important thing I tell new people who start with Meteor is a cycle of data propagation you have to keep in mind:

  • data is in the database
  • you define publish endpoint to publish it
  • you subscribe to it from the client, data is pushed to the client
  • you push that data to the template
  • you declare how the data should be render in the template
  • you have an event handler
  • which calls some method on the server which modifies the data
  • and data change is pushed around and everything (because it is declaratively defined) updates automatically

So I think it is really important that people understand that they should not be changing data or templates directly on the client but should go through the server and leave to the loop to make everything happen.

What about where should publish functions go? This is for me still unclear. Should it be separate from views? Or together with views (so in same directory, where view for me is close to feature)? Because some publish functions are shared between views and some are not. Same for methods. Same you need for a particular view and some are generic.

@stubailo
Copy link
Contributor

Subscribing to data on the client

I think we should take a look at @arunoda's subs-manager and see if there is some low-hanging fruit we could suggest there.

Should webhooks be part of the methods article? I think so

Technically, webhooks will be calling a method, but conceptually they are about data loading. I guess the pattern is, use the webhook to insert into Mongo?

Do we encourage people to pass queries / options into subscriptions? I think no.

So is a subscription for one document where you pass the single document not a good idea?

@tmeasday
Copy link
Contributor Author

Subscribing to data on the client

I think we should take a look at @arunoda's subs-manager and see if there is some low-hanging fruit we could suggest there.

Subs manager is good for what it is but I think decided we aren't comfortable recommending that technique because of the scope for bugs given Meteor's current globalness. I could reconsider that.

Should webhooks be part of the methods article? I think so

Technically, webhooks will be calling a method, but conceptually they are about data loading. I guess the pattern is, use the webhook to insert into Mongo?

I thought webhooks are about data modification? So the forms chapter would make sense. The only issue is that it's about "forms" rather than "methods" right now. But I think that's still OK

Do we encourage people to pass queries / options into subscriptions? I think no.

So is a subscription for one document where you pass the single document not a good idea?

I think an _id is fine, just not an arbitrary selector. You wrote this in the security chapter anyway

@tmeasday
Copy link
Contributor Author

I think what @mitar is saying about a sort of "flow diagram" of how data moves around is hugely useful. My only question is which article does this "fluxy" diagram go in? This one or the methods one?

@arunoda
Copy link

arunoda commented Oct 28, 2015

@tmeasday could you tell more about the issue with subsManager?
I don't get the reason?
It's a cache where you can control how it behave.

It gives significant performance and UX improvements.

@tmeasday
Copy link
Contributor Author

@arunoda the issue that always concerns me is bugs which are hard to replicate.

The subs-manager pattern introduces a second layer of state in the app which is "where I was a little while ago". All of a sudden the data that's in your local cache is no longer determined but just where you are now, but also where you were for the lifetime of the subs manager cache.

If people were always super careful in their find calls to just select the documents and fields that they subscribed to, it wouldn't be a problem, but they aren't (not-withstanding heroic attempts by people like @SachaG to promote patterns to ensure it).

It's true that the above is the real problem, and things like page->page transitions suffer the same issue (two sets of subscriptions open at once and rendered to the screen separately). But the difference there is that it's much more obvious what the issue is when something goes wrong. In the subs manager world, it's easy to imagine scenarios where people have bugs reported that they can't replicate (because the true replication is "first go to page A, then go to B and do a bunch of stuff").

If you could do something like .subscribe(..).getDataset() (which @stubailo and I discussed at length but decided was too much of a departure for this version of the guide), then I'd be comfortable (getDataset could be promise-y and return straight away or not depending on caching).

Am I being overlay pedantic here? Maybe! But I'm worried about recommending patterns that I personally avoid..

Oh, and btw, I'm not sure it's fair to say it gives significant performance improvements. I can imagine both cases where it would help performance (not repeatedly re-opening the same pub) and hinder performance (leaving unnecessary and expensive publications open for extended periods of time).

@arunoda
Copy link

arunoda commented Oct 28, 2015

Okay I get it. I'm pretty okay with it's not in the here. Just my idea. May be we need to define different areas in the Meteor guide. Which tools suites in which place and so on. Anyway, eventually users will findout SubsManager.

Performance Gains

It gives huge performance boost. That's due to a lot of practical scenarios. About the performance gains subsManager gives you in two ways.

  1. Low Latency - with the use of the cache
  2. Low CPU Usage - I'll talk more about this below.

This reduce subRate of the app a lot. It's safe to assume users browse the same page(areas) a lot time in a single session. So, that reduce the all the re-subscribing and CPU costs goes to network activities (and transport related code in Meteor)

Our tests shows, most of the apps have subscriptions with very low lifetime. And changes in those subscriptions are very little. (compared with the time it's open). And Meteor reuses observers. Out tests shows many of the apps have over 50% obeserver reuse ratio.

So keeping the subscription open is not an issue


And we don't ask to add subsManager for every subscription. It's upto users to decide which subscriptions powered by SubsManager. We mentioned this in BulletProofMeteor and in Kadira docs.

@tmeasday
Copy link
Contributor Author

Ok, it's fair to say that for a subscription that is often/usually shared it does give significant performance gains.

I think if we don't include it, this is a clear case of a package that should be mentioned in a "further reading" section of this article. I'll wait for @stubailo to weigh in again.

@arunoda
Copy link

arunoda commented Oct 28, 2015

@tmeasday That's sound great.

SideNote: I assume this is discussed somewhere else, it's good idea to have different sections for people with different levels of understanding. Or we can narrow the first release for some generic guidelines.

@mitar
Copy link
Contributor

mitar commented Oct 28, 2015

If you could do something like .subscribe(..).getDataset() (which @stubailo and I discussed at length but decided was too much of a departure for this version of the guide), then I'd be comfortable (getDataset could be promise-y and return straight away or not depending on caching).

You mean this ticket? #2247

So maybe instead of getDataset (What ugly name, BTW, Java background leaking again? dataset? Why not simply documents? Or even better .subscribe(..).find() so you can make a query against it.) we should just be able to query based on subscriptions?

@tmeasday
Copy link
Contributor Author

You mean this ticket? #2247

More or less, yeah.

Java background leaking again?

Nope..


.documents() seems wrong because it implies a single collection. What we are talking about here is a subset of the data in each collection that the subscription publishes to. (.find() certainly is incorrect for this reason, unless it takes a collection name as first argument).

"X" is to database what cursor is to collection. Agree that "dataset" isn't a great word but it does seem to work.

@tmeasday
Copy link
Contributor Author

Or did you mean Java background because I put get in front? If so you are paying way too close attention to my random code snippets.

@mitar
Copy link
Contributor

mitar commented Oct 28, 2015

Or did you mean Java background because I put get in front? If so you are paying way too close attention to my random code snippets.

:-)

@mitar
Copy link
Contributor

mitar commented Oct 28, 2015

"X" is to database what cursor is to collection. Agree that "dataset" isn't a great word but it does seem to work.

The question is what are operations you can do on X? So first probably select a collection, then query?

I think better API would be that you could do:

Posts.find({}, {subscription: subscription})

where subscription is the handle returned from subscribe. Now that subscriptions have id, you could just somehow query based on that. This is clean, simple to make backwards compatible, and simple to add to existing queries.

@tmeasday
Copy link
Contributor Author

I'm not against that, but I have other plans around slicing up the datasets and using them as "contexts" for templates/components. You might call it Relay or something like that. (That makes me think, what does Relay/GraphQL call this concept..)

@tmeasday
Copy link
Contributor Author

Anyway, it's all pretty academic because a client-side merge box is a highly non-trivial change so I don't expect we'll see it any time soon.

@mitar
Copy link
Contributor

mitar commented Oct 28, 2015

I'm not against that, but I have other plans around slicing up the datasets and using them as "contexts" for templates/components.

I don't know about you, but with my proposed API this is as easy as:

Template.foo.onCreated(function () {
  this.context = this.subscribe("foo");
});

Template.helpers({
  foo: function () {
    return Foo.find({}, subscription: Template.instance().context);
  }
});

Of course that behavior of using queries inside template instances context could be done even automatically, that it takes all template subscriptions as context.

Anyway, it's all pretty academic because a client-side merge box is a highly non-trivial change so I don't expect we'll see it any time soon.

You would like to limit fields based on the subscription? They yea, it is tricky. But just getting which IDs are from which subscription is probably already available somewhere internally.

@tmeasday
Copy link
Contributor Author

You would like to limit fields based on the subscription? They yea, it is tricky. But just getting which IDs are from which subscription is probably already available somewhere internally.

Incorrect. You can use https://atmospherejs.com/percolate/find-from-publication to fake it, but it's a total kludge.

@mitar
Copy link
Contributor

mitar commented Oct 28, 2015

BTW, what you subscribe is internally called record set.

@tmeasday
Copy link
Contributor Author

If we are talking proposed APIs, mine would look something like

<template name="fooController">
  {{> foo instance.dataset}}
</template>
Template.fooController.onCreated(() => {
  this.dataset = this.subscribe('foo').dataset();
});

Template.foo.helpers({
  posts: function() {
   return this.dataset.posts.find();
  }
});

Then it is trivially easy to test foo against an arbitrary dataset.

@arunoda
Copy link

arunoda commented Oct 28, 2015

I like it. To do this, we need to remove the mergebox from the server.
Otherwise, we need to define the query alongside the publication.

On Wed, Oct 28, 2015 at 11:05 AM Tom Coleman notifications@github.com
wrote:

If we are talking proposed APIs, mine would look something like

{{> foo instance.dataset}}

Template.fooController.onCreated(() => {
this.dataset = this.subscribe('foo').dataset();
});

Template.foo.helpers({
posts: function() {
return this.dataset.posts.find();
}
});

Then it is trivially easy to test foo against an arbitrary dataset.


Reply to this email directly or view it on GitHub
#33 (comment).

@mitar
Copy link
Contributor

mitar commented Oct 28, 2015

OK, this API is really the same as mine, only that it is prefix instead of suffix. And that it has big problems because of the reactivity. What if I publish first one collection and then after some time (after ready) I publish another. You would at least have to have this.dataset.posts().find().

@tmeasday
Copy link
Contributor Author

@arunoda 👍. This is the direction that @stubailo were talking in, something like:

Posts.all = new Subscription({
  query: () => { ... }
});

const handle = Posts.all.subscribe('foo');

Which could then totally do the dataset pattern via re-running the query client side. But then the question is how to make the publication work properly with queries over multiple collections -- do you map it to publish-composite syntax or something?

Doesn't sound completely impossible but a big chunk of concepts that we'll leave for the next iteration of the guide if we still like it. (Thus my original comment)

@steph643
Copy link

steph643 commented Nov 4, 2015

image

I would rephrase heading c like this:

c.If it relates to individual items from an existing collection (per item checkboxes, for instance) or if you need to query it, use a local collection

And I would add:

d. Other solutions

where there could be pointers to more advanced solutions, such as reactive-state.

@mitar
Copy link
Contributor

mitar commented Nov 4, 2015

Huh, opaque strings for keys in the state. This is not very IDE friendly. ;-)

@steph643
Copy link

steph643 commented Nov 4, 2015

@tmeasday Yeah! This is something beyond the guide. I think we should stop here and jump to some other place. Otherwise, we'll make this tread a mess :)

I asked for a public discussion on this more than half a year ago (see here and here).

@tmeasday
Copy link
Contributor Author

tmeasday commented Nov 5, 2015

@steph643 thanks for the link. I guess this reactive state idea is sort of angular-like, no? -- A tree of "state" (aka scope) that you use within a template. I guess what I don't like about it is that if it's going to be tree like it makes sense to scope it to the relevant branch of the template heirarchy rather than letting the template be in charge of grabbing something global itself.

@mitar
Copy link
Contributor

mitar commented Nov 5, 2015

I really think such solutions should be third party packages. Blaze should provide template instances, and then people can attach react-like props, Blaze Components like fields, angular like state to it. We will hardly decide which one is the best. :-)

@stubailo
Copy link
Contributor

stubailo commented Nov 5, 2015

We will hardly decide which one is the best. :-)

Thankfully the ReactiveDict approach isn't a package - it's just one suggested pattern. If people decide that ReactiveDict doesn't fulfill their needs, it will be easy to switch to something else and have basically the same patterns.

@mitar
Copy link
Contributor

mitar commented Nov 5, 2015

Thankfully the ReactiveDict approach isn't a package

Yes, I am talking about some old ideas of making instance.state be by default present, automatically.

it's just one suggested pattern

Should we suggest all of them? Using ReactiveDict, ReactiveField, and reactive state? :-) That can be like a very short sections, three headings, three should examples, and then it can continue with whatever direction you want the rest of the guide to use.

@mitar
Copy link
Contributor

mitar commented Dec 23, 2015

I made this package which allows one to scope queries to the subscription. I decided to do a different API to the one above.

@tmeasday
Copy link
Contributor Author

Interesting. A few notes in comparison to FFP:

  1. What are the syncing issues (that FFP has) that you refer to?
  2. I don't know if it's true that FFP really sends much more data when you consider gzip, I suspect it wouldn't make a significant difference unless your source documents are tiny.
  3. FFP also does sorting -- I think you could support it also by just setting the scopeFieldName value to an order. This would actually IMO then be the biggest advantage of your library (because sorting via a second collection is basically impossible to do properly in Mongo).
  4. Personally I wouldn't say that monkey patching Meteor's pub and sub code is less complicated than wrapping things, but each to his own I guess ;)

@arunoda
Copy link

arunoda commented Dec 28, 2015

@tmeasday what's FFP?

@tmeasday
Copy link
Contributor Author

Oh, find-from-publication (the package that subscription scope is replicating with a different approach and API)

@arunoda
Copy link

arunoda commented Dec 28, 2015

Okay got it :)

@mitar
Copy link
Contributor

mitar commented Dec 28, 2015

What are the syncing issues (that FFP has) that you refer to?

That you are using two collections/subscriptions. So both have to be on the client up-to-date to be able to query based on the subscription, no? So if I do subscription.ready() and then I want to fetch documents only from that subscription, it is not necessary true that I can do that because the other subscription for which documents are in which subscription is not yet ready (or updated).

I don't know if it's true that FFP really sends much more data when you consider gzip, I suspect it wouldn't make a significant difference unless your source documents are tiny.

I have not measured things over the wire, true. So maybe this is premature optimization on my part. But, it does increase memory on the server-side because merge box stores all those documents.

FFP also does sorting -- I think you could support it also by just setting the scopeFieldName value to an order. This would actually IMO then be the biggest advantage of your library (because sorting via a second collection is basically impossible to do properly in Mongo).

Yes, I could do sorting, but I didn't want to because the server side becomes really complicated then (you have to use observe with addedBefore and stuff) to keep the sorting values up-to-date. So instead of just adding a field to whenever user calls added, now I have to intercept how they are calling added and how the sorting on the server end is changing. Also, I do not really care about the order on the server side. I think this is an anti-pattern to care in which order you send the documents. Maybe, because I am using reactivity on the server side as well, and things like publish middleware which all interfere with the order of documents being send over the wire. So if you want to do some sorting, in my view, this should be done on the client side. My package thus just provides the information which documents are from the subscription, and order is not something which is provided.

What use cases you have for an example of where the order of calling added is important to know on the client? I could see that one would want to preserve the order of the cursor with sort applied, but then the user would have to use observe with ordering, which is much more complicated then just order of added calls. Maybe I could expose an API for the user to put a custom value in the scopeFieldName. So then if they want a sorted publish, they would call observe themselves and compute the scopeFieldName value based on addedBefore and movedBofore themselves.

Personally I wouldn't say that monkey patching Meteor's pub and sub code is less complicated than wrapping things, but each to his own I guess ;)

I am not saying that it is nice, but it is simpler, less lines of code, less data to go over, simpler concept.

Meteor should provide APIs to do that properly. @rclai is now working on at least common API for something like this: https://github.com/rclai/meteor-collection-extensions

But I think that Meteor's common way is to not provide APIs until community develops package showing the need for that. But yes, please do merge this pull request in: meteor/meteor#5845

BTW, you might be interested in this package as well: https://github.com/peerlibrary/meteor-subscription-data

@tmeasday
Copy link
Contributor Author

What use cases you have for an example of where the order of calling added is important to know on the client?

I'm thinking about any time the sort order is not knowable on the client. For instance if you query a fulltext search endpoint (think ElasticSearch) to get a ranked set of documents for the publication.

@tmeasday
Copy link
Contributor Author

BTW, you might be interested in this package as well: https://github.com/peerlibrary/meteor-subscription-data

Interesting stuff, thanks for showing me @mitar

@mitar
Copy link
Contributor

mitar commented Dec 29, 2015

I'm thinking about any time the sort order is not knowable on the client. For instance if you query a fulltext search endpoint (think ElasticSearch) to get a ranked set of documents for the publication.

You can just pass that extra score to the client in that case. See the example here: https://github.com/peerlibrary/meteor-subscription-scope

@tmeasday
Copy link
Contributor Author

Well sure if you are ok about adding extra fields to the document. What I was thinking about was something similar to what you've done where the extra field is stripped somehow before coming back out to the query-er

@mitar
Copy link
Contributor

mitar commented Dec 30, 2015

Yes, but that score field is something you might even want to display to the user. Anyway, those are details. I am explaining my rationale. :-) As you noticed, it is something easy to change. And also I on purpose on the client side check only for existence of the field so that in theory we can add any extra payload.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants