Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Merge Queue To Handle Larger Inputs #2400

Merged
merged 9 commits into from
Oct 4, 2017

Conversation

jamesmcclain
Copy link
Member

These changes simplify MergeQueue and improve its efficiency. Thank you to @mteldridge for noticing this issue.

Connects #2358

James McClain and others added 7 commits September 30, 2017 16:41
@jamesmcclain jamesmcclain changed the title [WiP] Allow Merge Queue To Handle Larger Inputs Allow Merge Queue To Handle Larger Inputs Oct 1, 2017
@metasim
Copy link
Member

metasim commented Oct 2, 2017

It would be interesting to see how this method now performs.

def insertRange(): Seq[(Long, Long)] = MergeQueue(ranges)

Old runs are in bench/archive, but in CSV format.

@jamesmcclain
Copy link
Member Author

jamesmcclain commented Oct 2, 2017

Before

[info] Benchmark                             (shuffle)  (size)  (skip)  (span)  Mode  Cnt     Score     Error  Units
[info] MergeQueueBench.insertAlternateRange      false  100000      10      10  avgt    5     3.284 ±   0.819  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000      10       1  avgt    5     2.916 ±   0.180  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000       1      10  avgt    5     4.079 ±   2.376  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000       1       1  avgt    5     3.750 ±   1.378  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000      10      10  avgt    5     0.349 ±   0.193  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000      10       1  avgt    5     0.626 ±   0.348  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000       1      10  avgt    5     0.428 ±   0.084  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000       1       1  avgt    5     0.350 ±   0.208  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000      10      10  avgt    5    37.203 ±   9.385  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000      10       1  avgt    5    36.526 ±   8.689  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000       1      10  avgt    5    39.964 ±  14.404  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000       1       1  avgt    5    45.577 ±  14.403  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000      10      10  avgt    5     2.725 ±   1.145  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000      10       1  avgt    5     3.686 ±   1.597  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000       1      10  avgt    5     3.280 ±   0.707  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000       1       1  avgt    5     3.038 ±   1.790  ms/op
[info] MergeQueueBench.insertRange               false  100000      10      10  avgt    5     3.392 ±   1.649  ms/op
[info] MergeQueueBench.insertRange               false  100000      10       1  avgt    5    18.822 ±   5.279  ms/op
[info] MergeQueueBench.insertRange               false  100000       1      10  avgt    5     3.681 ±   1.292  ms/op
[info] MergeQueueBench.insertRange               false  100000       1       1  avgt    5     3.434 ±   2.161  ms/op
[info] MergeQueueBench.insertRange               false   10000      10      10  avgt    5     0.334 ±   0.205  ms/op
[info] MergeQueueBench.insertRange               false   10000      10       1  avgt    5     1.512 ±   0.401  ms/op
[info] MergeQueueBench.insertRange               false   10000       1      10  avgt    5     0.332 ±   0.119  ms/op
[info] MergeQueueBench.insertRange               false   10000       1       1  avgt    5     0.295 ±   0.225  ms/op
[info] MergeQueueBench.insertRange                true  100000      10      10  avgt    5  1147.474 ± 285.513  ms/op
[info] MergeQueueBench.insertRange                true  100000      10       1  avgt    5  5183.929 ± 635.850  ms/op
[info] MergeQueueBench.insertRange                true  100000       1      10  avgt    5    34.518 ±  11.628  ms/op
[info] MergeQueueBench.insertRange                true  100000       1       1  avgt    5   430.916 ± 248.288  ms/op
[info] MergeQueueBench.insertRange                true   10000      10      10  avgt    5    15.346 ±   6.375  ms/op
[info] MergeQueueBench.insertRange                true   10000      10       1  avgt    5    54.815 ±  16.561  ms/op
[info] MergeQueueBench.insertRange                true   10000       1      10  avgt    5     1.113 ±   0.371  ms/op
[info] MergeQueueBench.insertRange                true   10000       1       1  avgt    5     7.010 ±   2.650  ms/op
[info] 
[info] Benchmark result is saved to /home/jmcclain/local/src/geotrellis/bench/target/jmh-results-20171002090202.json
[success] Total time: 407 s, completed Oct 2, 2017 9:08:49 AM

After

[info] Benchmark                             (shuffle)  (size)  (skip)  (span)  Mode  Cnt   Score    Error  Units
[info] MergeQueueBench.insertAlternateRange      false  100000      10      10  avgt    5   3.611 ±  1.488  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000      10       1  avgt    5   4.150 ±  1.796  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000       1      10  avgt    5   3.320 ±  1.157  ms/op
[info] MergeQueueBench.insertAlternateRange      false  100000       1       1  avgt    5   3.756 ±  1.274  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000      10      10  avgt    5   0.344 ±  0.159  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000      10       1  avgt    5   0.549 ±  0.282  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000       1      10  avgt    5   0.386 ±  0.069  ms/op
[info] MergeQueueBench.insertAlternateRange      false   10000       1       1  avgt    5   0.416 ±  0.178  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000      10      10  avgt    5  45.176 ± 20.956  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000      10       1  avgt    5  38.426 ± 28.030  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000       1      10  avgt    5  40.446 ± 16.111  ms/op
[info] MergeQueueBench.insertAlternateRange       true  100000       1       1  avgt    5  45.548 ± 23.045  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000      10      10  avgt    5   3.199 ±  1.373  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000      10       1  avgt    5   3.352 ±  1.083  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000       1      10  avgt    5   3.331 ±  1.946  ms/op
[info] MergeQueueBench.insertAlternateRange       true   10000       1       1  avgt    5   3.181 ±  0.824  ms/op
[info] MergeQueueBench.insertRange               false  100000      10      10  avgt    5  23.150 ± 15.494  ms/op
[info] MergeQueueBench.insertRange               false  100000      10       1  avgt    5  21.879 ±  9.104  ms/op
[info] MergeQueueBench.insertRange               false  100000       1      10  avgt    5  23.806 ± 14.646  ms/op
[info] MergeQueueBench.insertRange               false  100000       1       1  avgt    5  26.575 ± 11.492  ms/op
[info] MergeQueueBench.insertRange               false   10000      10      10  avgt    5   1.633 ±  0.932  ms/op
[info] MergeQueueBench.insertRange               false   10000      10       1  avgt    5   1.450 ±  0.891  ms/op
[info] MergeQueueBench.insertRange               false   10000       1      10  avgt    5   1.623 ±  0.970  ms/op
[info] MergeQueueBench.insertRange               false   10000       1       1  avgt    5   1.478 ±  1.241  ms/op
[info] MergeQueueBench.insertRange                true  100000      10      10  avgt    5  39.229 ± 17.330  ms/op
[info] MergeQueueBench.insertRange                true  100000      10       1  avgt    5  33.831 ±  9.979  ms/op
[info] MergeQueueBench.insertRange                true  100000       1      10  avgt    5  36.971 ± 24.518  ms/op
[info] MergeQueueBench.insertRange                true  100000       1       1  avgt    5  38.254 ± 23.609  ms/op
[info] MergeQueueBench.insertRange                true   10000      10      10  avgt    5   2.641 ±  1.996  ms/op
[info] MergeQueueBench.insertRange                true   10000      10       1  avgt    5   2.612 ±  1.850  ms/op
[info] MergeQueueBench.insertRange                true   10000       1      10  avgt    5   2.491 ±  0.724  ms/op
[info] MergeQueueBench.insertRange                true   10000       1       1  avgt    5   2.609 ±  1.600  ms/op
[info] 
[info] Benchmark result is saved to /home/jmcclain/local/src/geotrellis/bench/target/jmh-results-20171002090950.json
[success] Total time: 358 s, completed Oct 2, 2017 9:15:48 AM


class MergeQueue(initialSize: Int = 1) {
private val treeSet = new java.util.TreeSet(new RangeComparator()) // Sorted data structure
private val fudge = 1 // Set this to one to preserve original behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? What is the original behavior vs the non-original behavior, and why did it change?

Consider a more clear name for the variable.

@jamesmcclain
Copy link
Member Author

Ah yes, I realized that I used the phrase "preservation of previous behavior" in two unrelated ways. The first refers to the fact that in the original code, not just overlapping intervals but intervals that have a gap between them of up to one unit are merged. The second usage refers to the fact that one can request the merged intervals more than once, so a the data structure needs to be preserved.

I will make an update in a few moments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants