Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent sweep #1681

Merged
merged 50 commits into from Apr 9, 2015
Merged

Concurrent sweep #1681

merged 50 commits into from Apr 9, 2015

Conversation

schani
Copy link
Contributor

@schani schani commented Apr 2, 2015

The purpose of this is to reduce pause times of major collections by making sweep completely concurrent.

The first phase of sweeping is iterating through the block list, freeing blocks without live objects, and designating the others for lazy sweeping. This phase happened while the world was stopped. This makes it concurrent. The changes include the introduction of a very simple thread pool abstraction (currently supporting only a single thread) that unifies concurrent marking, jobs when scanning roots, and concurrent sweeping.

These are benchmarking results on Linux/AMD64:

concurrent-sweep

"default-sgen" is master. "sgen-concurrent-sweep" is this branch with concurrent sweep enabled, "sgen-no-concurrent-sweep" is this branch with concurrent sweep disabled.

I don't know why binarytree is slower with this branch, concurrent sweep disabled. I ran it on my OSX machine and got these results:

master

       25.07 real        23.57 user         1.39 sys
       25.01 real        23.55 user         1.40 sys
       25.17 real        23.72 user         1.39 sys
       25.24 real        23.77 user         1.39 sys
       25.23 real        23.68 user         1.43 sys

       avg: 25.144

no-concurrent-sweep

       24.63 real        22.99 user         1.49 sys
       24.46 real        22.91 user         1.49 sys
       24.43 real        22.94 user         1.43 sys
       24.60 real        23.02 user         1.50 sys
       24.34 real        22.83 user         1.44 sys

       avg: 24.492

concurrent-sweep

       24.16 real        23.25 user         1.46 sys
       24.26 real        23.28 user         1.51 sys
       24.13 real        23.19 user         1.49 sys
       24.06 real        23.09 user         1.53 sys
       24.16 real        23.28 user         1.44 sys

       avg: 24.154

Here are some pause time graphs. This branch on the left (with concurrent sweep enabled), master on the right.

graph4:

pausetimes-conc-sweep-graph4

health:

pausetimes-conc-sweep-health

binarytree:

pausetimes-conc-sweep-binarytrees

schani added 30 commits April 2, 2015 16:41
Both the concurrent sweep thread as well as the nursery collector will
need access to the block array.  Until we've made that lock-free,
we're simply using a lock.
The nursery collector requires that sweeping has finished, and instead
of waiting it will cooperate with the sweep thread to finish more
quickly.  The sweep thread will traverse the block array from high
indexes to low ones while the nursery collector will go from low to
high.  They will contend only very briefly when they meet somewhere in
the middle.
New blocks need to be swept if they're allocated during a non-concurrent major
collection or while a concurrent major collection is running.
It's updated from nursery collections and from the sweep thread concurrently.
Don't use the difference to the last collection, but just calculate the maximum heap
size and trigger a collection when it's reached.
And since we always wait for the sweep now we can do iterations over
the blocks without taking the lock.
@akoeplinger
Copy link
Member

@schani please note that there's a crash during the System.dll testsuite on jenkins: Can't iterate blocks while the world is running or sweep is in progress.

@schani
Copy link
Contributor Author

schani commented Apr 3, 2015

@akoeplinger Fixed.

@evincarofautumn
Copy link
Contributor

I didn’t observe an appreciable difference in performance between master and this branch on binarytree, on Linux nor OS X. Here is a cursory run of perf on Linux anyway:

master:

 Performance counter stats for './master/bin/mono-sgen /home/jon/benchmarker/tests/shootout/binarytree.exe 19':

      29506.661239 task-clock (msec)         #    0.999 CPUs utilized
             8,450 context-switches          #    0.286 K/sec
                11 cpu-migrations            #    0.000 K/sec
           459,294 page-faults               #    0.016 M/sec
   115,067,872,588 cycles                    #    3.900 GHz                     [100.00%]
   213,770,451,534 instructions              #    1.86  insns per cycle         [100.00%]
    27,582,284,387 branches                  #  934.782 M/sec                   [100.00%]
       116,143,327 branch-misses             #    0.42% of all branches         [100.00%]

      29.523146559 seconds time elapsed

concurrent-sweep:

 Performance counter stats for './concurrent-sweep/bin/mono-sgen /home/jon/benchmarker/tests/shootout/binarytree.exe 19':

      30812.231183 task-clock (msec)         #    1.039 CPUs utilized
             9,469 context-switches          #    0.307 K/sec
                26 cpu-migrations            #    0.001 K/sec
           282,797 page-faults               #    0.009 M/sec
   119,664,947,463 cycles                    #    3.884 GHz                     [100.00%]
   214,789,375,480 instructions              #    1.79  insns per cycle         [100.00%]
    28,030,875,077 branches                  #  909.732 M/sec                   [100.00%]
       232,322,906 branch-misses             #    0.83% of all branches         [100.00%]

      29.665200704 seconds time elapsed

concurrent-sweep has more context-switches and cpu-migrations (naturally), as well as more branch misses. I was somewhat surprised to see it also had fewer page-faults.

@akoeplinger
Copy link
Member

@schani looks like it still fails with the same assert during System.GC.GetTotalMemory (at least on i386, the amd64 build failed due to an already started xvfb...).

@kumpera
Copy link
Contributor

kumpera commented Apr 4, 2015

@evincarofautumn a more useful metric for this would be total pause time and not wallclock time.

@schani
Copy link
Contributor Author

schani commented Apr 7, 2015

@akoeplinger It seems that wasn't the latest commit. It works now.

@kumpera On pause times see my charts above.

@akoeplinger
Copy link
Member

@schani I pulled and build this PR to make sure it's not a Jenkins error and I see crashes locally during the System testsuite too (Ubuntu 14.04/amd64).

@schani
Copy link
Contributor Author

schani commented Apr 7, 2015

@akoeplinger Could you post a log of that? It works for me on Ubuntu 14.10/amd64.

@akoeplinger
Copy link
Member

@schani did a few test runs: http://pastebin.com/Dtyh790k

@schani
Copy link
Contributor Author

schani commented Apr 8, 2015

@akoeplinger I can't reproduce this, neither on Linux nor on OSX. Can you confirm that this does not occur for you with master? Are you using any specific options? Could you post a full log?

@akoeplinger
Copy link
Member

@schani you're right, I just saw it happening on master on Jenkins as well and on my copy. Looks like I somehow didn't properly clean up my working dir when I tested on master, sorry!

@schani schani merged commit 1a43231 into mono:master Apr 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants