Data Loading and Management #33

tmeasday · 2015-10-19T04:55:22Z

Article outline

https://github.com/meteor/guide/blob/master/outlines/data-loading.md

Major decision points

Data loading is best done through publications
All subscriptions should happen inside UI components. Even "global" subscriptions should be done in the app layout component. Data loaded from a subscription should be accessed in the same component, and passed down through arguments, rather than relying on global data to be available in Minimongo
There's a strategy for pagination here, we should investigate what works well in production apps
Client-only data should be in a Tracker-enabled store, for example a ReactiveDict wrapped in an API
Relational data should be published using publish-composite
External data should be pushed to the client through publications - for example, you can poll a REST endpoint through a pub

Old outline:

Proposed outline

Loading and publishing data from Mongo on the server.
Subscribing to data on the client
- For now, just in the straightforward way, emphasize autorun re-sub behaviour and workarounds
Client only data (Stores) vs persistent server data (Collections)
Modifying data ("actions"? -- store mutators or methods)
Complex publications:
- Relational data - use publish-composite to publish relational data.
- limiting data to what you need
- reusing publications vs limiting them.
- pagination patterns
Publishing data from 3rd party sources
- Poll-publish pattern
Publications as RESTful endpoints

Open Questions

Should webhooks be part of the methods article? I think so
Do we encourage people to pass queries / options into subscriptions? I think no.

The text was updated successfully, but these errors were encountered:

tmeasday · 2015-10-24T03:04:53Z

@justinsb This would possibly be a chapter you might want to weigh in on. I'll ping you again when it's more fleshed out.

tmeasday · 2015-10-26T03:45:38Z

See also #11

mitar · 2015-10-26T05:14:40Z

So for me the most important thing I tell new people who start with Meteor is a cycle of data propagation you have to keep in mind:

data is in the database
you define publish endpoint to publish it
you subscribe to it from the client, data is pushed to the client
you push that data to the template
you declare how the data should be render in the template
you have an event handler
which calls some method on the server which modifies the data
and data change is pushed around and everything (because it is declaratively defined) updates automatically

So I think it is really important that people understand that they should not be changing data or templates directly on the client but should go through the server and leave to the loop to make everything happen.

What about where should publish functions go? This is for me still unclear. Should it be separate from views? Or together with views (so in same directory, where view for me is close to feature)? Because some publish functions are shared between views and some are not. Same for methods. Same you need for a particular view and some are generic.

stubailo · 2015-10-26T17:26:24Z

Subscribing to data on the client

I think we should take a look at @arunoda's subs-manager and see if there is some low-hanging fruit we could suggest there.

Should webhooks be part of the methods article? I think so

Technically, webhooks will be calling a method, but conceptually they are about data loading. I guess the pattern is, use the webhook to insert into Mongo?

Do we encourage people to pass queries / options into subscriptions? I think no.

So is a subscription for one document where you pass the single document not a good idea?

tmeasday · 2015-10-28T03:15:02Z

Subscribing to data on the client

I think we should take a look at @arunoda's subs-manager and see if there is some low-hanging fruit we could suggest there.

Subs manager is good for what it is but I think decided we aren't comfortable recommending that technique because of the scope for bugs given Meteor's current globalness. I could reconsider that.

Should webhooks be part of the methods article? I think so

Technically, webhooks will be calling a method, but conceptually they are about data loading. I guess the pattern is, use the webhook to insert into Mongo?

I thought webhooks are about data modification? So the forms chapter would make sense. The only issue is that it's about "forms" rather than "methods" right now. But I think that's still OK

Do we encourage people to pass queries / options into subscriptions? I think no.

So is a subscription for one document where you pass the single document not a good idea?

I think an _id is fine, just not an arbitrary selector. You wrote this in the security chapter anyway

tmeasday · 2015-10-28T03:16:23Z

I think what @mitar is saying about a sort of "flow diagram" of how data moves around is hugely useful. My only question is which article does this "fluxy" diagram go in? This one or the methods one?

arunoda · 2015-10-28T03:46:03Z

@tmeasday could you tell more about the issue with subsManager?
I don't get the reason?
It's a cache where you can control how it behave.

It gives significant performance and UX improvements.

tmeasday · 2015-10-28T04:09:38Z

@arunoda the issue that always concerns me is bugs which are hard to replicate.

The subs-manager pattern introduces a second layer of state in the app which is "where I was a little while ago". All of a sudden the data that's in your local cache is no longer determined but just where you are now, but also where you were for the lifetime of the subs manager cache.

If people were always super careful in their find calls to just select the documents and fields that they subscribed to, it wouldn't be a problem, but they aren't (not-withstanding heroic attempts by people like @SachaG to promote patterns to ensure it).

It's true that the above is the real problem, and things like page->page transitions suffer the same issue (two sets of subscriptions open at once and rendered to the screen separately). But the difference there is that it's much more obvious what the issue is when something goes wrong. In the subs manager world, it's easy to imagine scenarios where people have bugs reported that they can't replicate (because the true replication is "first go to page A, then go to B and do a bunch of stuff").

If you could do something like .subscribe(..).getDataset() (which @stubailo and I discussed at length but decided was too much of a departure for this version of the guide), then I'd be comfortable (getDataset could be promise-y and return straight away or not depending on caching).

Am I being overlay pedantic here? Maybe! But I'm worried about recommending patterns that I personally avoid..

Oh, and btw, I'm not sure it's fair to say it gives significant performance improvements. I can imagine both cases where it would help performance (not repeatedly re-opening the same pub) and hinder performance (leaving unnecessary and expensive publications open for extended periods of time).

arunoda · 2015-10-28T04:22:33Z

Okay I get it. I'm pretty okay with it's not in the here. Just my idea. May be we need to define different areas in the Meteor guide. Which tools suites in which place and so on. Anyway, eventually users will findout SubsManager.

Performance Gains

It gives huge performance boost. That's due to a lot of practical scenarios. About the performance gains subsManager gives you in two ways.

Low Latency - with the use of the cache
Low CPU Usage - I'll talk more about this below.

This reduce subRate of the app a lot. It's safe to assume users browse the same page(areas) a lot time in a single session. So, that reduce the all the re-subscribing and CPU costs goes to network activities (and transport related code in Meteor)

Our tests shows, most of the apps have subscriptions with very low lifetime. And changes in those subscriptions are very little. (compared with the time it's open). And Meteor reuses observers. Out tests shows many of the apps have over 50% obeserver reuse ratio.

So keeping the subscription open is not an issue

And we don't ask to add subsManager for every subscription. It's upto users to decide which subscriptions powered by SubsManager. We mentioned this in BulletProofMeteor and in Kadira docs.

tmeasday · 2015-10-28T04:27:46Z

Ok, it's fair to say that for a subscription that is often/usually shared it does give significant performance gains.

I think if we don't include it, this is a clear case of a package that should be mentioned in a "further reading" section of this article. I'll wait for @stubailo to weigh in again.

arunoda · 2015-10-28T04:30:13Z

@tmeasday That's sound great.

SideNote: I assume this is discussed somewhere else, it's good idea to have different sections for people with different levels of understanding. Or we can narrow the first release for some generic guidelines.

mitar · 2015-10-28T04:53:31Z

If you could do something like .subscribe(..).getDataset() (which @stubailo and I discussed at length but decided was too much of a departure for this version of the guide), then I'd be comfortable (getDataset could be promise-y and return straight away or not depending on caching).

You mean this ticket? #2247

So maybe instead of getDataset (What ugly name, BTW, Java background leaking again? dataset? Why not simply documents? Or even better .subscribe(..).find() so you can make a query against it.) we should just be able to query based on subscriptions?

tmeasday · 2015-10-28T05:08:10Z

You mean this ticket? #2247

More or less, yeah.

Java background leaking again?

Nope..

.documents() seems wrong because it implies a single collection. What we are talking about here is a subset of the data in each collection that the subscription publishes to. (.find() certainly is incorrect for this reason, unless it takes a collection name as first argument).

"X" is to database what cursor is to collection. Agree that "dataset" isn't a great word but it does seem to work.

tmeasday · 2015-10-28T05:08:50Z

Or did you mean Java background because I put get in front? If so you are paying way too close attention to my random code snippets.

mitar · 2015-10-28T05:11:47Z

Or did you mean Java background because I put get in front? If so you are paying way too close attention to my random code snippets.

:-)

mitar · 2015-10-28T05:14:33Z

"X" is to database what cursor is to collection. Agree that "dataset" isn't a great word but it does seem to work.

The question is what are operations you can do on X? So first probably select a collection, then query?

I think better API would be that you could do:

Posts.find({}, {subscription: subscription})

where subscription is the handle returned from subscribe. Now that subscriptions have id, you could just somehow query based on that. This is clean, simple to make backwards compatible, and simple to add to existing queries.

tmeasday · 2015-10-28T05:16:28Z

I'm not against that, but I have other plans around slicing up the datasets and using them as "contexts" for templates/components. You might call it Relay or something like that. (That makes me think, what does Relay/GraphQL call this concept..)

tmeasday · 2015-10-28T05:17:08Z

Anyway, it's all pretty academic because a client-side merge box is a highly non-trivial change so I don't expect we'll see it any time soon.

mitar · 2015-10-28T05:29:34Z

I'm not against that, but I have other plans around slicing up the datasets and using them as "contexts" for templates/components.

I don't know about you, but with my proposed API this is as easy as:

Template.foo.onCreated(function () {
  this.context = this.subscribe("foo");
});

Template.helpers({
  foo: function () {
    return Foo.find({}, subscription: Template.instance().context);
  }
});

Of course that behavior of using queries inside template instances context could be done even automatically, that it takes all template subscriptions as context.

Anyway, it's all pretty academic because a client-side merge box is a highly non-trivial change so I don't expect we'll see it any time soon.

You would like to limit fields based on the subscription? They yea, it is tricky. But just getting which IDs are from which subscription is probably already available somewhere internally.

tmeasday · 2015-10-28T05:31:30Z

You would like to limit fields based on the subscription? They yea, it is tricky. But just getting which IDs are from which subscription is probably already available somewhere internally.

Incorrect. You can use https://atmospherejs.com/percolate/find-from-publication to fake it, but it's a total kludge.

mitar · 2015-10-28T05:34:27Z

BTW, what you subscribe is internally called record set.

tmeasday · 2015-10-28T05:35:28Z

If we are talking proposed APIs, mine would look something like

<template name="fooController">
  {{> foo instance.dataset}}
</template>

Template.fooController.onCreated(() => {
  this.dataset = this.subscribe('foo').dataset();
});

Template.foo.helpers({
  posts: function() {
   return this.dataset.posts.find();
  }
});

Then it is trivially easy to test foo against an arbitrary dataset.

arunoda · 2015-10-28T05:37:11Z

I like it. To do this, we need to remove the mergebox from the server.
Otherwise, we need to define the query alongside the publication.

On Wed, Oct 28, 2015 at 11:05 AM Tom Coleman notifications@github.com
wrote:

If we are talking proposed APIs, mine would look something like
{{> foo instance.dataset}}
Template.fooController.onCreated(() => {
this.dataset = this.subscribe('foo').dataset();
});

Template.foo.helpers({
posts: function() {
return this.dataset.posts.find();
}
});

Then it is trivially easy to test foo against an arbitrary dataset.

—
Reply to this email directly or view it on GitHub
#33 (comment).

mitar · 2015-10-28T05:41:28Z

OK, this API is really the same as mine, only that it is prefix instead of suffix. And that it has big problems because of the reactivity. What if I publish first one collection and then after some time (after ready) I publish another. You would at least have to have this.dataset.posts().find().

tmeasday · 2015-10-28T05:43:28Z

@arunoda 👍. This is the direction that @stubailo were talking in, something like:

Posts.all = new Subscription({
  query: () => { ... }
});

const handle = Posts.all.subscribe('foo');

Which could then totally do the dataset pattern via re-running the query client side. But then the question is how to make the publication work properly with queries over multiple collections -- do you map it to publish-composite syntax or something?

Doesn't sound completely impossible but a big chunk of concepts that we'll leave for the next iteration of the guide if we still like it. (Thus my original comment)

steph643 · 2015-11-04T09:58:11Z

I would rephrase heading c like this:

c.If it relates to individual items from an existing collection (per item checkboxes, for instance) or if you need to query it, use a local collection

And I would add:

d. Other solutions

where there could be pointers to more advanced solutions, such as reactive-state.

mitar · 2015-11-04T10:01:13Z

Huh, opaque strings for keys in the state. This is not very IDE friendly. ;-)

steph643 · 2015-11-04T10:06:05Z

@tmeasday Yeah! This is something beyond the guide. I think we should stop here and jump to some other place. Otherwise, we'll make this tread a mess :)

I asked for a public discussion on this more than half a year ago (see here and here).

tmeasday · 2015-11-05T01:51:04Z

@steph643 thanks for the link. I guess this reactive state idea is sort of angular-like, no? -- A tree of "state" (aka scope) that you use within a template. I guess what I don't like about it is that if it's going to be tree like it makes sense to scope it to the relevant branch of the template heirarchy rather than letting the template be in charge of grabbing something global itself.

mitar · 2015-11-05T01:52:59Z

I really think such solutions should be third party packages. Blaze should provide template instances, and then people can attach react-like props, Blaze Components like fields, angular like state to it. We will hardly decide which one is the best. :-)

stubailo · 2015-11-05T02:00:40Z

We will hardly decide which one is the best. :-)

Thankfully the ReactiveDict approach isn't a package - it's just one suggested pattern. If people decide that ReactiveDict doesn't fulfill their needs, it will be easy to switch to something else and have basically the same patterns.

mitar · 2015-11-05T02:04:07Z

Thankfully the ReactiveDict approach isn't a package

Yes, I am talking about some old ideas of making instance.state be by default present, automatically.

it's just one suggested pattern

Should we suggest all of them? Using ReactiveDict, ReactiveField, and reactive state? :-) That can be like a very short sections, three headings, three should examples, and then it can continue with whatever direction you want the rest of the guide to use.

mitar · 2015-12-23T13:19:44Z

I made this package which allows one to scope queries to the subscription. I decided to do a different API to the one above.

tmeasday · 2015-12-28T03:45:41Z

Interesting. A few notes in comparison to FFP:

What are the syncing issues (that FFP has) that you refer to?
I don't know if it's true that FFP really sends much more data when you consider gzip, I suspect it wouldn't make a significant difference unless your source documents are tiny.
FFP also does sorting -- I think you could support it also by just setting the scopeFieldName value to an order. This would actually IMO then be the biggest advantage of your library (because sorting via a second collection is basically impossible to do properly in Mongo).
Personally I wouldn't say that monkey patching Meteor's pub and sub code is less complicated than wrapping things, but each to his own I guess ;)

arunoda · 2015-12-28T04:01:39Z

@tmeasday what's FFP?

tmeasday · 2015-12-28T04:03:47Z

Oh, find-from-publication (the package that subscription scope is replicating with a different approach and API)

arunoda · 2015-12-28T04:06:32Z

Okay got it :)

mitar · 2015-12-28T09:24:15Z

What are the syncing issues (that FFP has) that you refer to?

That you are using two collections/subscriptions. So both have to be on the client up-to-date to be able to query based on the subscription, no? So if I do subscription.ready() and then I want to fetch documents only from that subscription, it is not necessary true that I can do that because the other subscription for which documents are in which subscription is not yet ready (or updated).

I don't know if it's true that FFP really sends much more data when you consider gzip, I suspect it wouldn't make a significant difference unless your source documents are tiny.

I have not measured things over the wire, true. So maybe this is premature optimization on my part. But, it does increase memory on the server-side because merge box stores all those documents.

FFP also does sorting -- I think you could support it also by just setting the scopeFieldName value to an order. This would actually IMO then be the biggest advantage of your library (because sorting via a second collection is basically impossible to do properly in Mongo).

Yes, I could do sorting, but I didn't want to because the server side becomes really complicated then (you have to use observe with addedBefore and stuff) to keep the sorting values up-to-date. So instead of just adding a field to whenever user calls added, now I have to intercept how they are calling added and how the sorting on the server end is changing. Also, I do not really care about the order on the server side. I think this is an anti-pattern to care in which order you send the documents. Maybe, because I am using reactivity on the server side as well, and things like publish middleware which all interfere with the order of documents being send over the wire. So if you want to do some sorting, in my view, this should be done on the client side. My package thus just provides the information which documents are from the subscription, and order is not something which is provided.

What use cases you have for an example of where the order of calling added is important to know on the client? I could see that one would want to preserve the order of the cursor with sort applied, but then the user would have to use observe with ordering, which is much more complicated then just order of added calls. Maybe I could expose an API for the user to put a custom value in the scopeFieldName. So then if they want a sorted publish, they would call observe themselves and compute the scopeFieldName value based on addedBefore and movedBofore themselves.

Personally I wouldn't say that monkey patching Meteor's pub and sub code is less complicated than wrapping things, but each to his own I guess ;)

I am not saying that it is nice, but it is simpler, less lines of code, less data to go over, simpler concept.

Meteor should provide APIs to do that properly. @rclai is now working on at least common API for something like this: https://github.com/rclai/meteor-collection-extensions

But I think that Meteor's common way is to not provide APIs until community develops package showing the need for that. But yes, please do merge this pull request in: meteor/meteor#5845

BTW, you might be interested in this package as well: https://github.com/peerlibrary/meteor-subscription-data

tmeasday · 2015-12-28T23:19:33Z

What use cases you have for an example of where the order of calling added is important to know on the client?

I'm thinking about any time the sort order is not knowable on the client. For instance if you query a fulltext search endpoint (think ElasticSearch) to get a ranked set of documents for the publication.

tmeasday · 2015-12-28T23:30:38Z

BTW, you might be interested in this package as well: https://github.com/peerlibrary/meteor-subscription-data

Interesting stuff, thanks for showing me @mitar

mitar · 2015-12-29T07:34:11Z

I'm thinking about any time the sort order is not knowable on the client. For instance if you query a fulltext search endpoint (think ElasticSearch) to get a ranked set of documents for the publication.

You can just pass that extra score to the client in that case. See the example here: https://github.com/peerlibrary/meteor-subscription-scope

tmeasday · 2015-12-30T01:49:09Z

Well sure if you are ok about adding extra fields to the document. What I was thinking about was something similar to what you've done where the extra field is stripped somehow before coming back out to the query-er

mitar · 2015-12-30T06:37:12Z

Yes, but that score field is something you might even want to display to the user. Anyway, those are details. I am explaining my rationale. :-) As you noticed, it is something easy to change. And also I on purpose on the client side check only for existence of the field so that in theory we can add any extra payload.

tmeasday added the article label Oct 19, 2015

tmeasday self-assigned this Oct 19, 2015

tmeasday added the status: ideation label Oct 24, 2015

tmeasday mentioned this issue Oct 26, 2015

Data loading and management: ideas #11

Closed

stubailo mentioned this issue Oct 27, 2015

Blaze - Scroll Infinite rendering so slow on large DOM meteor/meteor#5548

Closed

tmeasday added status: example app and removed status: outlined labels Nov 11, 2015

mitar mentioned this issue Nov 18, 2015

Great slowdown because of the interaction between a Minimongo query and subscription meteor/meteor#5633

Closed

tmeasday added status: draft in progress and removed status: example app labels Dec 1, 2015

stubailo added status: first draft and removed status: draft in progress labels Dec 4, 2015

tmeasday mentioned this issue Dec 15, 2015

Soft launch meta issue #160

Closed

32 tasks

mitar mentioned this issue Dec 23, 2015

Provide way to obtain scoped collections peerlibrary/meteor-subscription-scope#1

Open

tmeasday closed this as completed Apr 6, 2016

tmeasday removed the status: first draft label Apr 6, 2016

crapthings mentioned this issue Dec 29, 2017

componentWillMount vs componentDidMount meteor/react-packages#242

Closed

Data Loading and Management #33

Data Loading and Management #33

Comments

tmeasday commented Oct 19, 2015

Article outline

Major decision points

Proposed outline

Open Questions

tmeasday commented Oct 24, 2015

tmeasday commented Oct 26, 2015

mitar commented Oct 26, 2015

stubailo commented Oct 26, 2015

tmeasday commented Oct 28, 2015

tmeasday commented Oct 28, 2015

arunoda commented Oct 28, 2015

tmeasday commented Oct 28, 2015

arunoda commented Oct 28, 2015

Performance Gains

tmeasday commented Oct 28, 2015

arunoda commented Oct 28, 2015

mitar commented Oct 28, 2015

tmeasday commented Oct 28, 2015

tmeasday commented Oct 28, 2015

mitar commented Oct 28, 2015

mitar commented Oct 28, 2015

tmeasday commented Oct 28, 2015

tmeasday commented Oct 28, 2015

mitar commented Oct 28, 2015

tmeasday commented Oct 28, 2015

mitar commented Oct 28, 2015

tmeasday commented Oct 28, 2015

arunoda commented Oct 28, 2015

mitar commented Oct 28, 2015

tmeasday commented Oct 28, 2015

steph643 commented Nov 4, 2015

mitar commented Nov 4, 2015

steph643 commented Nov 4, 2015

tmeasday commented Nov 5, 2015

mitar commented Nov 5, 2015

stubailo commented Nov 5, 2015

mitar commented Nov 5, 2015

mitar commented Dec 23, 2015

tmeasday commented Dec 28, 2015

arunoda commented Dec 28, 2015

tmeasday commented Dec 28, 2015

arunoda commented Dec 28, 2015

mitar commented Dec 28, 2015

tmeasday commented Dec 28, 2015

tmeasday commented Dec 28, 2015

mitar commented Dec 29, 2015

tmeasday commented Dec 30, 2015

mitar commented Dec 30, 2015