Parallel copy by rclark · Pull Request #87 · mapbox/tilelive

rclark · 2014-10-02T00:04:53Z

Allows you to split a read operation into an arbitrary number of jobs. Pass a job parameter to options when using tilelive.createReadStream or tilelive.deserialize:

var readable = tilelive.createReadStream(src, { type: 'scanline', job: { total: 4, num: 1 } });

This instructs tilelive to only read tiles that would fall into job 1 of 4. A complete read would mean four calls each with a different num.

Still to-do:

deserialize shouldn't utilize the same x % total === num - 1 approach that other streams do. That means deserializing every row before throwing out the ones that aren't part of the current job. It should skip based on row number instead. Done.
other ideas for tests?

coveralls · 2014-10-02T00:07:29Z

Coverage increased (+0.15%) when pulling ef301a6 on parallel-copy into 580d44f on master.

rclark · 2014-10-02T00:46:26Z

In the failing tests, as the ratio of num jobs : total tiles increases the distribution of tile reads across jobs gets very bad very quickly.

Because of the approach here (dividing jobs based on tile.x values) getting an even distribution is going to be very tileset-dependent. I think the next step is to look at this distribution in some more true-to-life situations to determine if it is a reasonable approach.

yhahn · 2014-10-02T02:43:49Z

@rclark I'm not too concerned about even distribution across jobs. Any approach (bbox, etc) is going to be datasource dependent unless the approach involves knowing the geographic shape/density of the data beforehand.

I'll keep pondering the pyramid question tonight.

rclark · 2014-10-02T19:24:31Z

The horrible distribution I was seeing is because the modulus approach will never put tiles into jobs where job num > max tile.x. Weird implication: the further west your copy operation is, the less you can benefit from parallelization.

rclark · 2014-10-02T21:48:51Z

@yhahn I managed to write stream-pyramid.js such that:

Tiles in zoom levels where num tiles < num jobs are all fed to job 1
modulus-splitting occurs on tiles at the zoom level where num tiles >= num jobs. Each job gets a set of these tiles and renders out its children pyramid-style.

The logic is a mess and I am not proud of this.

rclark · 2014-10-07T18:48:24Z

Okay @yhahn I fell back on straight bbox-splitting with low-zoom and along-the-boundaries duplication of rendered tiles. The logic is certainly cleaner.

yhahn · 2014-10-07T18:51:15Z

@rclark 👍

rclark · 2014-10-07T20:25:23Z

Now it doesn't mod-split the deserialization stream based on stream order, but by pulling the X value out of the serialized data via regex. JSON.parse (without decoding the buffer) is noticeably costly when running hundreds of jobs.

I tried to set it up in an abstracted-enough way that it should be clear what would have to be done if you wanted to change serialization formats.

yhahn · 2014-10-08T15:53:55Z

@rclark 👍 want to merge + roll 5.3.0 or so?

Parallel copy

readable streams can be split into jobs

ef301a6

rclark added 2 commits October 1, 2014 17:30

skip based on line number, not tile.x for deserialize streams

1a95e99

adds failing tests for distribution of tiles across jobs

cfb73d3

sketch bbox approach

bc645dc

rclark force-pushed the parallel-copy branch 2 times, most recently from 1600f63 to 503c036 Compare October 7, 2014 18:43

test that all tiles are rendered, duplicative bbox-splitting for pyramid

712889d

rclark force-pushed the parallel-copy branch from 503c036 to 712889d Compare October 7, 2014 18:44

rclark changed the title ~~[wip] Parallel copy~~ Parallel copy Oct 7, 2014

get serialized X value without JSON.parse

428ac61

rclark pushed a commit that referenced this pull request Oct 8, 2014

Merge pull request #87 from mapbox/parallel-copy

79b4dc7

Parallel copy

rclark merged commit 79b4dc7 into master Oct 8, 2014

rclark deleted the parallel-copy branch October 8, 2014 16:00

yhahn mentioned this pull request Oct 23, 2014

State/resume system for tilelive-copy #86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel copy#87

Parallel copy#87
rclark merged 6 commits intomasterfrom
parallel-copy

rclark commented Oct 2, 2014

Uh oh!

coveralls commented Oct 2, 2014

Uh oh!

rclark commented Oct 2, 2014

Uh oh!

yhahn commented Oct 2, 2014

Uh oh!

rclark commented Oct 2, 2014

Uh oh!

rclark commented Oct 2, 2014

Uh oh!

rclark commented Oct 7, 2014

Uh oh!

yhahn commented Oct 7, 2014

Uh oh!

rclark commented Oct 7, 2014

Uh oh!

yhahn commented Oct 8, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rclark commented Oct 2, 2014

Uh oh!

coveralls commented Oct 2, 2014

Uh oh!

rclark commented Oct 2, 2014

Uh oh!

yhahn commented Oct 2, 2014

Uh oh!

rclark commented Oct 2, 2014

Uh oh!

rclark commented Oct 2, 2014

Uh oh!

rclark commented Oct 7, 2014

Uh oh!

yhahn commented Oct 7, 2014

Uh oh!

rclark commented Oct 7, 2014

Uh oh!

yhahn commented Oct 8, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants