Parallelize bulkDocs() #2553

nolanlawson · 2014-08-02T16:28:20Z

OK, so.

I've been trying to think of more ways to speed up secondary indexes. One thought that occurred to me is that the current bulkDocs may be a little inefficient, because it works like this:

bulkDocs([A, B, C]) ->
  1. Fetch A, wait for callback
  2. Fetch B, wait for callback
  3. Fetch C, wait for callback
  4. Write A, wait for callback
  5. Write B, wait for callback
  6. Write C, wait for callback

Presumably that's a lot of overhead of switching between the JavaScript VM and the C code, when really you could do it all at once (since it's all wrapped in a transaction):

bulkDocs([A, B, C]) ->
  1. Fetch A, B, and C in parallel, wait for callbacks
  2. Write A, B, and C in parallel, wait for callbacks

I started writing the code yesterday for IDB and WebSQL. Preliminary results are encouraging (temp-views test with 1 iteration):

Browser	Before	After	Change
Safari	7664ms	6837ms	-10.79%
Firefox	35358ms	28685ms	-18.87%
Chrome	19493ms	13121ms	-23.68%
Chrome WebSQL	14347ms	13640ms	-4.92%

The biggest problem I have with the code right now is that it's hard to understand; I need to refactor it first.

The text was updated successfully, but these errors were encountered:

nolanlawson · 2014-08-02T16:50:38Z

IE 10: 31975ms -> 30803ms (3.66% improvement).

calvinmetcalf · 2014-08-02T17:00:32Z

The issue isn't context switching it's most of the time is spent waiting so
you get a very nice speedup waiting in parallel.

This is overall going to be good but you your going to want to put an upper
limit on the number of parallel things being done at a time
On Aug 2, 2014 12:50 PM, "Nolan Lawson" notifications@github.com wrote:

IE 10: 31975ms -> 30803ms (3.66% improvement).

—
Reply to this email directly or view it on GitHub
#2553 (comment).

nolanlawson · 2014-08-02T17:26:00Z

@calvinmetcalf That makes sense. Whatever the reason, it's surprisingly a big speedup.

I'm not sure we need to set a limit on the number of parallel executions. Users can control that themselves when they call bulkDocs, and internally the only thing that determines the number of simultaneous docs is replication (which is user-configurable, default of 100) and map/reduce, which is currently hard-coded to 50.

To be fair, this number could grow a lot in the case of map/reduce if the user is emitting multiple key/values per doc, but that seems like an edge case. In any case, I say we cross that bridge when we get there.

nolanlawson · 2014-08-02T17:27:35Z

Would this make sense for leveldb.js too?

calvinmetcalf · 2014-08-02T17:49:39Z

I would double check that doing 100 or 500 or 1000 doc bulk docs are still
a speedup. This might make sense for level as well, I can look into that

On Sat, Aug 2, 2014 at 1:27 PM, Nolan Lawson notifications@github.com
wrote:

Would this make sense for leveldb.js too?

—
Reply to this email directly or view it on GitHub
#2553 (comment).

-Calvin W. Metcalf

nolanlawson · 2014-08-02T18:17:32Z

I refactored the code, so you can take a look at it if you want. I'm just worried that limiting the number of simultaneous calls would make the code complexity worse than it already is, whereas we could also solve that problem by just making the batch size in mapreduce user-configurable, so they have an escape option if it really becomes a problem.

calvinmetcalf · 2014-08-02T18:30:30Z

I'd recommend you re-factoring the code so that you can rewrite the main thing Promise.all([ids].map(getThenPut).then(done);currently there are multiple forEach loops populating arrays with a less then clear path through it.

I can do the rewrite for level, but in async we need to avoid 3 stooges syndrome, that is where everybody tries to get through the door at once and nobody actually gets through, e.g. we open up 100 requests and they all starve each other of memory.

nolanlawson · 2014-08-02T18:59:17Z

I'd love to use promises, but then both IDB and WebSQL end the transaction early. :(

I still don't think you need to solve the three stooges problem at the leveldb.js level. If we just make the batch size for mapreduce configurable, then the user has control over everything, and the defaults are already pretty sensible anyway (50 for mapreduce, 100 for changes/replication).

calvinmetcalf · 2014-08-02T19:24:56Z

Bulk docs is available outside if mapreduce if somebody does db.bulk and an
array of 500 or 1000 I don't want it to freeze now when it didn't before.
On Aug 2, 2014 2:59 PM, "Nolan Lawson" notifications@github.com wrote:

I'd love to use promises, but then both IDB and WebSQL end the transaction
early. :(

I still don't think you need to solve the three stooges problem at the
leveldb.js level. If we just make the batch size for mapreduce
configurable, then the user has control over everything, and the defaults
are already pretty sensible anyway (50 for mapreduce, 100 for
changes/replication).

—
Reply to this email directly or view it on GitHub
#2553 (comment).

nolanlawson · 2014-08-03T14:58:24Z

good point

nolanlawson · 2014-08-03T15:30:24Z

Done. I put it in adapter.js so I didn't have to rewrite it 3 times.

fix local storage error wtf?

nolanlawson · 2014-08-16T04:05:57Z

Fixed in all 3. Thanks @calvinmetcalf .

nolanlawson added a commit that referenced this issue Aug 2, 2014

(#2553) - parallelize bulkDocs for idb/websql

8a85c01

nolanlawson added a commit that referenced this issue Aug 3, 2014

(#2553) - parallelize bulkDocs for idb/websql

1b214d0

calvinmetcalf added a commit that referenced this issue Aug 4, 2014

(#2553) - parallelize bulkDocs for leveldb

5ec0422

calvinmetcalf added a commit that referenced this issue Aug 5, 2014

(#2553) - parallelize bulkDocs for leveldb

4af256e

nolanlawson added a commit that referenced this issue Aug 12, 2014

(#2553) - parallelize bulkDocs for idb/websql

aba8c48

nolanlawson added a commit that referenced this issue Aug 12, 2014

(#2553) - parallelize bulkDocs for idb/websql

771b33e

calvinmetcalf added a commit that referenced this issue Aug 12, 2014

(#2553) - parallelize bulkDocs for leveldb

0c4e1b9

fix local storage error wtf?

calvinmetcalf added a commit that referenced this issue Aug 12, 2014

(#2553) - parallelize bulkDocs for leveldb

cdb7a99

fix local storage error wtf?

calvinmetcalf added a commit that referenced this issue Aug 16, 2014

(#2553) - parallelize bulkDocs for leveldb

55d7df0

fix local storage error wtf?

calvinmetcalf added a commit that referenced this issue Aug 16, 2014

(#2553) - parallelize bulkDocs for leveldb

8dbeda5

fix local storage error wtf?

nolanlawson closed this as completed Aug 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize bulkDocs() #2553

Parallelize bulkDocs() #2553

nolanlawson commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

calvinmetcalf commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

calvinmetcalf commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

calvinmetcalf commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

calvinmetcalf commented Aug 2, 2014

nolanlawson commented Aug 3, 2014

nolanlawson commented Aug 3, 2014

nolanlawson commented Aug 16, 2014

Parallelize bulkDocs() #2553

Parallelize bulkDocs() #2553

Comments

nolanlawson commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

calvinmetcalf commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

calvinmetcalf commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

calvinmetcalf commented Aug 2, 2014

nolanlawson commented Aug 2, 2014

calvinmetcalf commented Aug 2, 2014

nolanlawson commented Aug 3, 2014

nolanlawson commented Aug 3, 2014

nolanlawson commented Aug 16, 2014