New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Merge Queue To Handle Larger Inputs #2400

Merged
merged 9 commits into from Oct 4, 2017

Conversation

Projects
None yet
5 participants
@jamesmcclain
Member

jamesmcclain commented Sep 30, 2017

These changes simplify MergeQueue and improve its efficiency. Thank you to @mteldridge for noticing this issue.

Connects #2358

jamesmcclain and others added some commits Sep 30, 2017

Added test for MergeQueue problem.
Signed-off-by: Matthew Eldridge <meldridge@astraea.io>
replaced randomSquare with randomPoly
Signed-off-by: Matthew Eldridge <meldridge@astraea.io>
bug fix in MergeQueueSpec
Signed-off-by: Matthew Eldridge <meldridge@astraea.io>

@jamesmcclain jamesmcclain changed the title from [WiP] Allow Merge Queue To Handle Larger Inputs to Allow Merge Queue To Handle Larger Inputs Oct 1, 2017

@metasim

This comment has been minimized.

Member

metasim commented Oct 2, 2017

It would be interesting to see how this method now performs.

def insertRange(): Seq[(Long, Long)] = MergeQueue(ranges)

Old runs are in bench/archive, but in CSV format.

@jamesmcclain

This comment has been minimized.

Member

jamesmcclain commented Oct 2, 2017

Before

[info] Benchmark                             (shuffle)  (size)  (skip)  (span)  Mode  Cnt     Score     Error  Units
[info] MergeQueueBench.insertAlternateRange      false  100000      10      10  avgt    5     3.284 ±   0.819  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000      10       1  avgt    5     2.916 ±   0.180  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000       1      10  avgt    5     4.079 ±   2.376  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000       1       1  avgt    5     3.750 ±   1.378  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000      10      10  avgt    5     0.349 ±   0.193  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000      10       1  avgt    5     0.626 ±   0.348  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000       1      10  avgt    5     0.428 ±   0.084  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000       1       1  avgt    5     0.350 ±   0.208  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000      10      10  avgt    5    37.203 ±   9.385  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000      10       1  avgt    5    36.526 ±   8.689  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000       1      10  avgt    5    39.964 ±  14.404  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000       1       1  avgt    5    45.577 ±  14.403  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000      10      10  avgt    5     2.725 ±   1.145  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000      10       1  avgt    5     3.686 ±   1.597  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000       1      10  avgt    5     3.280 ±   0.707  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000       1       1  avgt    5     3.038 ±   1.790  ms/op
[info] MergeQueueBench.insertRange               false  100000      10      10  avgt    5     3.392 ±   1.649  ms/op
[info] MergeQueueBench.insertRange               false  100000      10       1  avgt    5    18.822 ±   5.279  ms/op
[info] MergeQueueBench.insertRange               false  100000       1      10  avgt    5     3.681 ±   1.292  ms/op
[info] MergeQueueBench.insertRange               false  100000       1       1  avgt    5     3.434 ±   2.161  ms/op
[info] MergeQueueBench.insertRange               false   10000      10      10  avgt    5     0.334 ±   0.205  ms/op
[info] MergeQueueBench.insertRange               false   10000      10       1  avgt    5     1.512 ±   0.401  ms/op
[info] MergeQueueBench.insertRange               false   10000       1      10  avgt    5     0.332 ±   0.119  ms/op
[info] MergeQueueBench.insertRange               false   10000       1       1  avgt    5     0.295 ±   0.225  ms/op
[info] MergeQueueBench.insertRange                true  100000      10      10  avgt    5  1147.474 ± 285.513  ms/op
[info] MergeQueueBench.insertRange                true  100000      10       1  avgt    5  5183.929 ± 635.850  ms/op
[info] MergeQueueBench.insertRange                true  100000       1      10  avgt    5    34.518 ±  11.628  ms/op
[info] MergeQueueBench.insertRange                true  100000       1       1  avgt    5   430.916 ± 248.288  ms/op
[info] MergeQueueBench.insertRange                true   10000      10      10  avgt    5    15.346 ±   6.375  ms/op
[info] MergeQueueBench.insertRange                true   10000      10       1  avgt    5    54.815 ±  16.561  ms/op
[info] MergeQueueBench.insertRange                true   10000       1      10  avgt    5     1.113 ±   0.371  ms/op
[info] MergeQueueBench.insertRange                true   10000       1       1  avgt    5     7.010 ±   2.650  ms/op
[info] 
[info] Benchmark result is saved to /home/jmcclain/local/src/geotrellis/bench/target/jmh-results-20171002090202.json
[success] Total time: 407 s, completed Oct 2, 2017 9:08:49 AM

After

[info] Benchmark                             (shuffle)  (size)  (skip)  (span)  Mode  Cnt   Score    Error  Units
[info] MergeQueueBench.insertAlternateRange      false  100000      10      10  avgt    5   3.611 ±  1.488  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000      10       1  avgt    5   4.150 ±  1.796  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000       1      10  avgt    5   3.320 ±  1.157  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000       1       1  avgt    5   3.756 ±  1.274  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000      10      10  avgt    5   0.344 ±  0.159  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000      10       1  avgt    5   0.549 ±  0.282  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000       1      10  avgt    5   0.386 ±  0.069  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000       1       1  avgt    5   0.416 ±  0.178  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000      10      10  avgt    5  45.176 ± 20.956  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000      10       1  avgt    5  38.426 ± 28.030  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000       1      10  avgt    5  40.446 ± 16.111  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000       1       1  avgt    5  45.548 ± 23.045  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000      10      10  avgt    5   3.199 ±  1.373  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000      10       1  avgt    5   3.352 ±  1.083  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000       1      10  avgt    5   3.331 ±  1.946  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000       1       1  avgt    5   3.181 ±  0.824  ms/op
[info] MergeQueueBench.insertRange               false  100000      10      10  avgt    5  23.150 ± 15.494  ms/op
[info] MergeQueueBench.insertRange               false  100000      10       1  avgt    5  21.879 ±  9.104  ms/op
[info] MergeQueueBench.insertRange               false  100000       1      10  avgt    5  23.806 ± 14.646  ms/op
[info] MergeQueueBench.insertRange               false  100000       1       1  avgt    5  26.575 ± 11.492  ms/op
[info] MergeQueueBench.insertRange               false   10000      10      10  avgt    5   1.633 ±  0.932  ms/op
[info] MergeQueueBench.insertRange               false   10000      10       1  avgt    5   1.450 ±  0.891  ms/op
[info] MergeQueueBench.insertRange               false   10000       1      10  avgt    5   1.623 ±  0.970  ms/op
[info] MergeQueueBench.insertRange               false   10000       1       1  avgt    5   1.478 ±  1.241  ms/op
[info] MergeQueueBench.insertRange                true  100000      10      10  avgt    5  39.229 ± 17.330  ms/op
[info] MergeQueueBench.insertRange                true  100000      10       1  avgt    5  33.831 ±  9.979  ms/op
[info] MergeQueueBench.insertRange                true  100000       1      10  avgt    5  36.971 ± 24.518  ms/op
[info] MergeQueueBench.insertRange                true  100000       1       1  avgt    5  38.254 ± 23.609  ms/op
[info] MergeQueueBench.insertRange                true   10000      10      10  avgt    5   2.641 ±  1.996  ms/op
[info] MergeQueueBench.insertRange                true   10000      10       1  avgt    5   2.612 ±  1.850  ms/op
[info] MergeQueueBench.insertRange                true   10000       1      10  avgt    5   2.491 ±  0.724  ms/op
[info] MergeQueueBench.insertRange                true   10000       1       1  avgt    5   2.609 ±  1.600  ms/op
[info] 
[info] Benchmark result is saved to /home/jmcclain/local/src/geotrellis/bench/target/jmh-results-20171002090950.json
[success] Total time: 358 s, completed Oct 2, 2017 9:15:48 AM
class MergeQueue(initialSize: Int = 1) {
private val treeSet = new java.util.TreeSet(new RangeComparator()) // Sorted data structure
private val fudge = 1 // Set this to one to preserve original behavior

This comment has been minimized.

@lossyrob

lossyrob Oct 2, 2017

Member

What does this mean? What is the original behavior vs the non-original behavior, and why did it change?

Consider a more clear name for the variable.

@jamesmcclain

This comment has been minimized.

Member

jamesmcclain commented Oct 3, 2017

Ah yes, I realized that I used the phrase "preservation of previous behavior" in two unrelated ways. The first refers to the fact that in the original code, not just overlapping intervals but intervals that have a gap between them of up to one unit are merged. The second usage refers to the fact that one can request the merged intervals more than once, so a the data structure needs to be preserved.

I will make an update in a few moments.

@echeipesh echeipesh merged commit 00fa464 into locationtech:master Oct 4, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
ip-validation
Details

@jamesmcclain jamesmcclain deleted the jamesmcclain:fix/merge-queue branch Oct 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment