Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch messages sent over DDP from server to client (DDP V2?) #9862

Closed

Conversation

Projects
None yet
@KoenLav
Copy link

commented May 5, 2018

Messages sent from the client to the server were already buffered, but still JSON.parse would be called once for every message and while buffering was implemented for messages from the client to the server, it was not implemented for messages from server to client.

This implementation:

  • sends the buffered messages from client to server as an array;
  • implements the same buffering strategy on the server side;
  • enables both the client and the server to interpret batched DDP messages.

Requesting feedback!

ps this PR supersedes: #9855
pps I'll leave some considerations as comments in the code.

@@ -82,6 +82,7 @@ export class Connection {
if (typeof url === 'object') {
self._stream = url;
} else {
// TODO: send allowBatching = true to the server

This comment has been minimized.

Copy link
@KoenLav

KoenLav May 5, 2018

Author

I did not know how to properly tell the server that this client understands batched DDP messages.

Or maybe, now I'm thinking about it, we should use the version for this?

@@ -1561,9 +1565,28 @@ export class Connection {
return;
}

var messages = [];

This comment has been minimized.

Copy link
@KoenLav

KoenLav May 5, 2018

Author

Looping over the outstanding methods to add every message to an array seems strange, but doing so will cause JSON.stringify to be called only once for all outstanding methods in this block, rather than once for every outstanding method.

return;
}
// Convert all DDP messages to Array form
messages = messages.constructor === Array ? messages : [messages];

This comment has been minimized.

Copy link
@KoenLav

KoenLav May 5, 2018

Author

Not sure if this is the best way to do this?

This comment has been minimized.

Copy link
@sebakerckhof

sebakerckhof May 7, 2018

Contributor

Array.isArray ?

keys(messages).forEach(i => {
const msg = messages[i];
const copy = EJSON.clone(msg);

This comment has been minimized.

Copy link
@KoenLav

KoenLav May 6, 2018

Author

It is probably better to clone the entire array before the loop, for performance.

This comment has been minimized.

Copy link
@sebakerckhof

sebakerckhof May 8, 2018

Contributor

as @copleykj also pointed out. There's no need for keys. Just do messages.forEach(msg => ...

This comment has been minimized.

Copy link
@copleykj

copleykj May 9, 2018

Contributor

Also being that the point of this PR is performance, I would recommend a regular for loop here as it's ~90% faster

@oswaldoacauan

This comment has been minimized.

Copy link

commented May 7, 2018

I would gladly help to merge this PR, need this feature asap :)

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 7, 2018

@oswaldoacauan developing for Meteor is rather simple; just fork the repository and follow the instructions in DEVELOPMENT.MD.

The guys from Meteor indicated they will focus on releasing 1.7 first and then work towards resolutions for server performance issues (like this PR is supposed to address).

Are there any insights you could share as to the specific problem you are experiencing? Or did you already share them?

@mitar

This comment has been minimized.

Copy link
Collaborator

commented May 8, 2018

Hm, can you show how this improves performance? I do not really understand why there would be much performance change. Can you make an example app where this shows?

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 8, 2018

@mitar in a scenario where a server would want to send 500 added/changed/removed messages within 500ms currently these messages are:

  • passed to JSON.stringify individually;
  • sent to the client individually.

Both of these operations have overhead associated with them, which is now only incurred once (at the cost of stringifying and sending a larger amount of data).

I expect this change to have a large impact on the following issues:

Reproduction: https://github.com/KoenLav/meteor-fiber-repro

Would love to see a test showing the performance improvement as well, but no time to create one currently.

@crapthings

This comment has been minimized.

Copy link

commented May 8, 2018

@KoenLav will this add new ddp message type or use exist one ?

does this change DDP Specification?

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 8, 2018

@crapthings

No, it simply batches the messages.

Yes, it does change the specification as clients will need to know how to accept arrays of messages rather than singular messages, however the intention is to make this fully backwards compatible for 'older' clients AND/OR an optional feature.

@mitar

This comment has been minimized.

Copy link
Collaborator

commented May 8, 2018

I expect this change to have a large impact on the following issues:

But you should probably confirm this. It might be an unnecessary change if underlying fiber pool issue is fixed. I think it might be better approach to get node used by Meteor to have a fix addressing the fiber pool issue. In that case both this issue and any other code path which makes a spike of fibers would perform better. This is why we need a test to measure performance improvement. So that we could compare existing code vs. existing code with fiber patch vs. existing code with batching of messages vs. existing code with fiber patch with batching of messages. Then we can know which approach to take.

Both of these operations have overhead associated with them, which is now only incurred once (at the cost of stringifying and sending a larger amount of data).

I am still puzzled that serializing 100 messages individually would have so big difference than serializing 100x larger message once. Can you also show that there is difference here?

@crapthings

This comment has been minimized.

Copy link

commented May 9, 2018

serializing 20000 items individually vs serializing once

=> Meteor server restarted
I20180509-09:27:48.820(8)? foreach: 94.341ms
=> Meteor server restarted
I20180509-09:28:30.809(8)? a: 104.490ms
I20180509-09:28:30.903(8)? b: 92.962ms
=> Meteor server restarted
I20180509-09:28:43.452(8)? a: 91.389ms
I20180509-09:28:43.566(8)? b: 113.280ms
=> Meteor server restarted
I20180509-09:28:48.563(8)? a: 102.033ms
I20180509-09:28:48.663(8)? b: 99.924ms
=> Meteor server restarted
I20180509-09:28:54.795(8)? a: 101.871ms
I20180509-09:28:54.902(8)? b: 106.864ms
=> Meteor server restarted
I20180509-09:29:04.691(8)? a: 103.247ms
I20180509-09:29:04.793(8)? b: 101.476ms
=> Meteor server restarted
import _ from 'lodash'
import faker from 'faker'

const list = _.times(20000, n => ({
  title: faker.lorem.sentence(),
  text: faker.lorem.paragraphs(),
  category: faker.lorem.word(),
  tags: (faker.lorem.words()).split(' '),
}))

console.time('a')
const a = _.map(list, item => JSON.stringify(item))
console.timeEnd('a')

console.time('b')
const b = JSON.stringify(list)
console.timeEnd('b')

METEOR@1.6.1.1

publish this.added

2w items one by one
image

2w items in one
image

2w items in 4 messages
image

2w items in 50 messages
image

const list = _.times(20000, n => ({
  title: faker.lorem.sentence(),
  text: faker.lorem.paragraphs(),
  category: faker.lorem.word(),
  tags: (faker.lorem.words()).split(' '),
}))

Test.remove({})
Test.batchInsert(list)

Meteor.publish('testa', function () {
  return Test.find()
})

Meteor.publish('testb', function () {
  const test = Test.find().fetch()
  this.added('test', 1, test)
  return this.ready()
})

Meteor.publish('testc', function () {
  // const test = Test.find().fetch()
  const test = _.chunk(Test.find().fetch(), 20000 / 50)
  _.each(test, (item, idx) => this.added('test', idx, item))
  return this.ready()
})

so do we need batching size control ?

i think 'this.removed' is similar to 'this.added', remove just send id.

if without publish unblock. u have to waiting for last subscription stop?

but cpu spike is in fiber

@mitar

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2018

serializing 20000 items individually vs serializing once

Your results literally show that it does not matter. Once one is faster, another time the other is faster. You should probably repeat this 10 times and average. But from the look of it, it really looks like there is not much to gain here for serializing many messages at once vs. each separately, except new complexity.

but cpu spike is in fiber

Yes, I think we should fix fiber spiking with a patch in node. But I am not yet convinced about batching at the API level. How did you measure things on the client side? So where is code for those outputs in the client console you made screen shots of? So processing each message separately on the client is know to be slow. Client has many things to optimize.

@copleykj

This comment has been minimized.

Copy link
Contributor

commented May 9, 2018

There does seem to be at least some merit in this effort as far as data being sent in small batches.

@crapthings

This comment has been minimized.

Copy link

commented May 9, 2018

@mitar

Meteor.startup(() => {
  console.time('testa')
  Tracker.autorun(() => {
    const ready = Meteor.subscribe('testc').ready()
    if (!ready) return
    console.log(Test.find().fetch())
    console.timeEnd('testa')
  })
})

and when u stop subscription, the data in cursor removed one by one, if its in batching it will be fast.

@mitar

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2018

That looks OK for testing, but still it would be good to know where exactly is slow because of lack of batching. It might be that only client-side batching is necessary.

and when u stop subscription, the data in cursor removed one by one, if its in batching it will be fast.

Yea, I can imagine that those operations on client are slow when removing one by one. But this could be fixed on client side only.

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 9, 2018

@mitar the problem is that some of these functions on the server side are blocking and that when a lot of these blocking events occur at the same time the server remains stuck at 100% CPU.

The patch for V8 which limits the impact of this issue is already in the works, however I believe we can also make some smart changes to Meteor which help in actually resolving the cause of the issue, rather than minimizing the effect.

In my current PR the messages are batched within a particular Session instance (so per connected user). Suppose there are 50 connected users and 2000 changes within half a second this means 100.000 stacks get executed, at once.

By batching the messages in the Session, the code leading up from the Collection observer up until the Stringify function is still called 100.000 times, but the code afterwards (stringifying and sending the messages) only 50 times.

I'm thinking about the possibility of batching messages, per Collection, as Meteor receives them from MongoDB. This would significantly reduce the load on the server as all of the code leading up to the sending of DDP messages would only be executed 50 times, rather than 100.000 times, and the code which batches the messages would only be executed 2000 times.

This would produce 2000 short stacks, batching the messages, eventually leading to 50 long stacks (with a bigger payload) instead of 100.000 long stacks with small payloads.

@tdjac0bs

This comment has been minimized.

Copy link

commented May 9, 2018

We was going to implement the same improvement to Meteor, because we figured out that the performance is affected due to the chatty nature of the DDP protocol (specially when we have a high RTT). Batching the messages server side solves this chatty problem.

@oswaldoacauan, can you create a demo app that make 10.000 updates per second, during 120s and compare the performance?

I suggest the following tests:

  • Run the demo app locally (latency 0) without this PR.
  • Run the demo app locally (latency 0) with this PR.
  • Run the demo app locally (latency 150ms) without this PR.
  • Run the demo app locally (latency 150ms) with this PR.

The measured variable could be the time the client takes to get all the updates.

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 9, 2018

I've just started working on batching the messages in the ObserveMutiplexer; the first place where the changes in a subscription are handled.

This will require refactoring every class involved (Session, Subscription, Observer, etc) but will completely solve the issues.

Will post WIP soon.

@mitar

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2018

the problem is that some of these functions on the server side are blocking and that when a lot of these blocking events occur at the same time the server remains stuck at 100% CPU.

Which functions are blocking? And, I mean, if many things occur at the same time, code has to process it, so it takes some time at 100% to go through all that. The question is if it takes longer or shorter if you batch or not. Generally, it would be better to not batch, so that you do not have one big chunk of CPU you monopolize, but that your tasks can be interleaved with other tasks. So batching in async environment like node might not necessary be productive, so you should really make a good case here.

I'm thinking about the possibility of batching messages, per Collection, as Meteor receives them from MongoDB. This would significantly reduce the load on the server as all of the code leading up to the sending of DDP messages would only be executed 50 times, rather than 100.000 times, and the code which batches the messages would only be executed 2000 times.

But if it simply takes 5 seconds to get through 100.000 changes, you can batch it how ever you want, CPU has to go through that. So the question is if it is high overhead to have small changes. I see benefit of small changes (better cooperative multi tasking), but we should see if there is improvement using batching.

I have in the past made some mistakes when I thought I am optimizing something, but then I didn't. This is why I am saying that we should really know and measure impacts of those optimizations. I completely agree that such improvements are great, but let's just make sure we are improving things in the right place.

@sebakerckhof

This comment has been minimized.

Copy link
Contributor

commented May 9, 2018

This will require refactoring every class involved (Session, Subscription, Observer, etc) but will completely solve the issues.

Before refactoring such core classes, it's really recommended to open a feature request and have a discussion in there.

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 10, 2018

@mitar

ObserveMultiplexer._applyCallback creates a SynchronousQueue entry for every change to a Collection which we are Subscribed to OR Observing (Subscriptions make use of Observe, which makes use of ObserveChanges, internally).

We would not want to handle these changes asynchronously, so this seems fine, but the way Meteor is structured the entire callstack

@sebakerckhof

In my opinion the easiest way to get some discussion started it by having a suggested implementation. I do agree refactoring these classes requires significant discussion and improvements to the current situation.

Then again, if we would implement the changes in a backwards-compatible (opt-in) way and are able to show significant performance improvements (and or the resolution of several long-standing issues) I think we should fix first, discuss and improve later.

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 10, 2018

Added some commits; the implementation is now fully backward compatible. In addition to that I ran some tests on the Fiber spikes repository (issue: #7747 , repository: https://github.com/KoenLav/meteor-fiber-repro) and can confirm that this PR improves the performance considerably. I would welcome some more tests/use cases of this specific batching implementation!

Even though performance increased considerably and time Meteor maxes out CPU is significantly reduced, the Fiber spike remains.

In an effort to find the source of the spike I started refactoring the aforementioned classes (Multiplexer, Observer, etc.), but I have not been successful in locating the source yet. In this PR, however, batching is applied at two levels, please take a look: #9876

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 12, 2018

@mitar (and others) I tested the change in throughput caused by these changes on my development machine; loading the 20.000 documents (Widgets) in the Meteor Fibers Repro repository takes 20-40 seconds without these changes and 1-2 seconds with these changes.

It's fairly easy to test and see for yourself!

I no longer consider this PR to be a WIP, from my side, so would like to get some more comments :)

@KoenLav KoenLav changed the title [WIP] Batch messages sent over DDP from server to client Batch messages sent over DDP from server to client (DDP V2?) May 12, 2018

@mitar

This comment has been minimized.

Copy link
Collaborator

commented May 13, 2018

Can you also test this running node version with a patch by Kenton which fixes the underlying fiber issue?

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 13, 2018

@mitar this patch improves performance with the aforementioned numbers even before the Fibers issue is triggered (when simply first loading a page which subscribes to 20.000 documents). So it should be considered an improvement regardless.

It also improves performance when the issue does arise, but only because Meteor is now simply equipped to handle more concurrent outgoing DDP messages (not because the Fibers issue itself is somehow alleviated).

My latest PR, however, does fix the Fibers issue (which was caused by the OplogTailing falling back to fetching (which makes use of ONE FIBER PER DOCUMENT which needed to be fetched)): #9885

Again, for this latest PR, it does not really matter whether the patch to V8 by @kentonv also alleviates the issue somewhat; the Fiber spike is an issue regardless (as demonstrated in my PR).

It is however very nice that all of these changes seem to increase performance at various levels in the application (from V8, to OplogTailing, to DDP).

@mitar

This comment has been minimized.

Copy link
Collaborator

commented May 13, 2018

@mitar this patch improves performance with the aforementioned numbers even before the Fibers issue is triggered (when simply first loading a page which subscribes to 20.000 documents). So it should be considered an improvement regardless.

Why? If that other patch maybe fixes both this and any other fiber spiking issue? I think this is great work, but we should really understand if it is needed on its own or it is just a workaround for a fiber spiking issue.

@KoenLav

This comment has been minimized.

Copy link
Author

commented May 13, 2018

The Fibers which get created are all created (and exited) within the OplogObserveDriver, they actually don't have anything to do with the changes in this PR, other than that they are also involved in the process of moving oplog notifications from MongoDB through the server all the way to the client.

Maybe I'm not providing enough details, but I do now in fact understand the various problems and know what their fixes are and how they are interrelated.

If I can help clarify any specific questions I'd be happy to do so!

benjamn and others added some commits Jan 5, 2019

Stop excluding test modules when meteor.testModule found in package.j…
…son. (#10402)

New Meteor apps have the following meteor.testModule in their package.json
files by default

  "meteor": {
    "testModule": "tests/main.js"
  }

When meteor.testModule is defined, it determines the test entry point when
running the `meteor test` command, ignoring legacy file naming conventions
like *.tests.js or *.app-tests.js.

The package-source.js code changed by this commit was incorrect because it
ignored those specially-named test files even when running tests, which
was a problem if the meteor.testModule tried to import them explicitly,
because they would not be properly compiled.

If you're using meteor.testModule, the distinction between `meteor test`
and `meteor test --full-app` matters a bit less, since the test entry
point will be the same for both modes, though you can still check
Meteor.isTest and Meteor.isAppTest at runtime to control test behavior.
Stop excluding test modules when meteor.testModule found in package.j…
…son. (#10402)

New Meteor apps have the following meteor.testModule in their package.json
files by default

  "meteor": {
    "testModule": "tests/main.js"
  }

When meteor.testModule is defined, it determines the test entry point when
running the `meteor test` command, ignoring legacy file naming conventions
like *.tests.js or *.app-tests.js.

The package-source.js code changed by this commit was incorrect because it
ignored those specially-named test files even when running tests, which
was a problem if the meteor.testModule tried to import them explicitly,
because they would not be properly compiled.

If you're using meteor.testModule, the distinction between `meteor test`
and `meteor test --full-app` matters a bit less, since the test entry
point will be the same for both modes, though you can still check
Meteor.isTest and Meteor.isAppTest at runtime to control test behavior.
Bump meteor-promise version to 0.8.7.
Should help with #10359, as this version includes @VeselyT's commit
ExentriqLtd/promise@bbe4f0d
Move meteor-{babel,promise} updates into v1.8.0.2 section of History.md.
While these updates were technically available to Meteor 1.8.0.1 apps, the
Meteor release version did not enforce the updates, and the old versions
were still included in the Meteor 1.8.0.1 dev bundle. In other words,
Meteor 1.8.0.2 is the release where these updates were fully enforced.
Refactor accounts-ui-unstyled/accounts_ui.js to fix bugs.
Besides helping with readability, this refactor fixes a number of bugs,
most notably the assumption that options.passwordSignupFields is an array,
though previously this package accepted a string; and the accidental use
of options.forceApprovalPrompt in code blocks that were supposed to be
handling the other options.

As a side note, I have yet to see a use of Array.prototype.reduce that
actually improved readability or performance, relative to any simpler
alternatives. Don't drink the functional programming kool-aid, y'all.
Do not treat client and server directories specially in packages. (#1…
…0414)

Fixes #10393.

Bumping compiler.BUILT_BY and LINKER_CACHE_SALT because
PR #10414 changes the behavior of the build system in a subtle way that
does not automatically trigger recompilation.
Modernize `ddp-client` package (#10413)
Use `const` and `let` instead of `var`, Object.create(null) instead of {}, and native functions instead of `lodash` utilities.
Attempt to fix tests by reverting puppeteer from 1.12.1 to 1.6.2.
Tests have started failing for reasons that may be related to puppeteer's
Meteor process management: https://circleci.com/gh/meteor/meteor/31035

Since I can't identify any other possible causes, using the same version
of puppeteer that other tests use (e.g. modules, dynamic-import) seems
like a reasonable first step.

Also updated puppeteer in tests/apps/app-config/package-lock.json to
version 1.6.2 (was 1.3.0), in an attempt to fix some unhandled promise
rejection warnings: https://circleci.com/gh/meteor/meteor/31063

@KoenLav KoenLav changed the base branch from release-1.7.0.3 to abernix/ddp-server-rework Mar 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.