Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize bulkDocs() #2553

Closed
nolanlawson opened this issue Aug 2, 2014 · 12 comments
Closed

Parallelize bulkDocs() #2553

nolanlawson opened this issue Aug 2, 2014 · 12 comments

Comments

@nolanlawson
Copy link
Member

OK, so.

I've been trying to think of more ways to speed up secondary indexes. One thought that occurred to me is that the current bulkDocs may be a little inefficient, because it works like this:

bulkDocs([A, B, C]) ->
  1. Fetch A, wait for callback
  2. Fetch B, wait for callback
  3. Fetch C, wait for callback
  4. Write A, wait for callback
  5. Write B, wait for callback
  6. Write C, wait for callback

Presumably that's a lot of overhead of switching between the JavaScript VM and the C code, when really you could do it all at once (since it's all wrapped in a transaction):

bulkDocs([A, B, C]) ->
  1. Fetch A, B, and C in parallel, wait for callbacks
  2. Write A, B, and C in parallel, wait for callbacks

I started writing the code yesterday for IDB and WebSQL. Preliminary results are encouraging (temp-views test with 1 iteration):

Browser Before After Change
Safari 7664ms 6837ms -10.79%
Firefox 35358ms 28685ms -18.87%
Chrome 19493ms 13121ms -23.68%
Chrome WebSQL 14347ms 13640ms -4.92%

The biggest problem I have with the code right now is that it's hard to understand; I need to refactor it first.

@nolanlawson
Copy link
Member Author

IE 10: 31975ms -> 30803ms (3.66% improvement).

@calvinmetcalf
Copy link
Member

The issue isn't context switching it's most of the time is spent waiting so
you get a very nice speedup waiting in parallel.

This is overall going to be good but you your going to want to put an upper
limit on the number of parallel things being done at a time
On Aug 2, 2014 12:50 PM, "Nolan Lawson" notifications@github.com wrote:

IE 10: 31975ms -> 30803ms (3.66% improvement).


Reply to this email directly or view it on GitHub
#2553 (comment).

@nolanlawson
Copy link
Member Author

@calvinmetcalf That makes sense. Whatever the reason, it's surprisingly a big speedup.

I'm not sure we need to set a limit on the number of parallel executions. Users can control that themselves when they call bulkDocs, and internally the only thing that determines the number of simultaneous docs is replication (which is user-configurable, default of 100) and map/reduce, which is currently hard-coded to 50.

To be fair, this number could grow a lot in the case of map/reduce if the user is emitting multiple key/values per doc, but that seems like an edge case. In any case, I say we cross that bridge when we get there.

@nolanlawson
Copy link
Member Author

Would this make sense for leveldb.js too?

@calvinmetcalf
Copy link
Member

I would double check that doing 100 or 500 or 1000 doc bulk docs are still
a speedup. This might make sense for level as well, I can look into that

On Sat, Aug 2, 2014 at 1:27 PM, Nolan Lawson notifications@github.com
wrote:

Would this make sense for leveldb.js too?


Reply to this email directly or view it on GitHub
#2553 (comment).

-Calvin W. Metcalf

@nolanlawson
Copy link
Member Author

I refactored the code, so you can take a look at it if you want. I'm just worried that limiting the number of simultaneous calls would make the code complexity worse than it already is, whereas we could also solve that problem by just making the batch size in mapreduce user-configurable, so they have an escape option if it really becomes a problem.

@calvinmetcalf
Copy link
Member

I'd recommend you re-factoring the code so that you can rewrite the main thing Promise.all([ids].map(getThenPut).then(done);currently there are multiple forEach loops populating arrays with a less then clear path through it.

I can do the rewrite for level, but in async we need to avoid 3 stooges syndrome, that is where everybody tries to get through the door at once and nobody actually gets through, e.g. we open up 100 requests and they all starve each other of memory.

@nolanlawson
Copy link
Member Author

I'd love to use promises, but then both IDB and WebSQL end the transaction early. :(

I still don't think you need to solve the three stooges problem at the leveldb.js level. If we just make the batch size for mapreduce configurable, then the user has control over everything, and the defaults are already pretty sensible anyway (50 for mapreduce, 100 for changes/replication).

@calvinmetcalf
Copy link
Member

Bulk docs is available outside if mapreduce if somebody does db.bulk and an
array of 500 or 1000 I don't want it to freeze now when it didn't before.
On Aug 2, 2014 2:59 PM, "Nolan Lawson" notifications@github.com wrote:

I'd love to use promises, but then both IDB and WebSQL end the transaction
early. :(

I still don't think you need to solve the three stooges problem at the
leveldb.js level. If we just make the batch size for mapreduce
configurable, then the user has control over everything, and the defaults
are already pretty sensible anyway (50 for mapreduce, 100 for
changes/replication).


Reply to this email directly or view it on GitHub
#2553 (comment).

@nolanlawson
Copy link
Member Author

good point

@nolanlawson
Copy link
Member Author

Done. I put it in adapter.js so I didn't have to rewrite it 3 times.

calvinmetcalf added a commit that referenced this issue Aug 12, 2014
fix local storage error wtf?
calvinmetcalf added a commit that referenced this issue Aug 12, 2014
fix local storage error wtf?
calvinmetcalf added a commit that referenced this issue Aug 16, 2014
fix local storage error wtf?
calvinmetcalf added a commit that referenced this issue Aug 16, 2014
fix local storage error wtf?
@nolanlawson
Copy link
Member Author

Fixed in all 3. Thanks @calvinmetcalf .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants