Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: >200ms GC latency in test programs (calling runtime.GC?) #15847

Closed
genez opened this issue May 26, 2016 · 42 comments

Comments

Projects
None yet
9 participants
@genez
Copy link

commented May 26, 2016

  1. What version of Go are you using (go version)?
    go version go1.6.2 windows/amd64
  2. What operating system and processor architecture are you using (go env)?
    set GOARCH=amd64
    set GOBIN=
    set GOEXE=.exe
    set GOHOSTARCH=amd64
    set GOHOSTOS=windows
    set GOOS=windows
    set GOPATH=C:\dev\Go
    set GORACE=
    set GOROOT=C:\Go
    set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
    set GO15VENDOREXPERIMENT=1
    set CC=gcc
    set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
    set CXX=g++
    set CGO_ENABLED=1
  3. What did you do?
    Hello, I am writing a POC software that should replace a currently running C application.
    The server part loads a SQLITE db3 file in memory (using maps) and then waits for text commands on UDP socket.

The client reads a text file with commands, sends them to the server and waits for a response.
The client takes measurement on server reply time.

My tests were taken on the same machine (localhost) to avoid network delays

While running the client, response time jumps from 3 milliseconds to more than 200 milliseconds
and in the same moment the server displays this line (I am running it with SET GODEBUG=gctrace=1)

gc 14 @18.400s 2%: 0.50+380+1.0 ms clock, 2.0+294/380/266+4.0 ms cpu, 344->344->182 MB, 359 MB goal, 4 P

I think that 200ms STW is a bit too much and would like to know how to fix this beahvior
I have a dropbox folder with server and client sources, and corresponding SQLITE DB and command text file
(https://www.dropbox.com/sh/e2timsyjifvp61i/AAAmDoKSP7NANjDlsYcGcBvGa?dl=0)

the client must be executed with 2 parameters:

  • serverip:port (4420)
  • sendername (L1)

My concern is not the throughput but the latency
Our application drives an automated machine that must have the response back in 10ms

@genez

This comment has been minimized.

Copy link
Author

commented May 26, 2016

Maybe related to this issue: #9477

@genez

This comment has been minimized.

Copy link
Author

commented May 26, 2016

Just changing the key of the maps from string to int drop the GC time to 5 milliseconds

@RLH

This comment has been minimized.

Copy link
Contributor

commented May 26, 2016

First kudo's for taking end to end latencies instead of just focusing on GC latencies. Did the end to end latency also drop when you changed the map keys?. Could you send up the complete before and after gctrace? How many goroutines does the application have and what is the nature of the goroutines work.

When reading the gctrace line you sent. The 2% is the overall time spend in GC. the 0.50 and the 1.0ms are the STW times. The 380ms is the time spent doing the concurrent mark. During the concurrent mark we have seen the GC starve the mutators (your application) in certain situations and fixed several things in this area for 1.7. You might want to try tip to see if this will be a problem going forward.

@genez

This comment has been minimized.

Copy link
Author

commented May 26, 2016

Sorry but the test with int keys was not real
I was failing to convert to int and then adding every item at key 0

I tried with [20]byte keys but I have the same issue

C:\dev\ETSGO\etsgo>etsgo
gc 1 @0.046s 2%: 1.1+1.8+0.75 ms clock, 2.3+0/1.8/0.80+1.5 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc 2 @0.066s 4%: 0+3.0+1.2 ms clock, 0+1.0/1.0/1.0+4.8 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc 3 @0.085s 5%: 0+13+1.0 ms clock, 0+3.0/2.0/1.0+4.0 ms cpu, 5->7->6 MB, 6 MB goal, 4 P
gc 4 @0.131s 4%: 0+3.0+0 ms clock, 0+0/0/3.0+0 ms cpu, 7->7->4 MB, 9 MB goal, 4 P
gc 5 @0.171s 3%: 0+7.0+0 ms clock, 0+4.0/3.0/9.0+0 ms cpu, 10->10->9 MB, 11 MB goal, 4 P
gc 6 @0.179s 5%: 0+7.0+1.0 ms clock, 0+3.0/7.0/0+4.3 ms cpu, 13->13->13 MB, 18 MB goal, 4 P
gc 7 @0.304s 5%: 0+19+0 ms clock, 0+19/9.0/10+0 ms cpu, 27->27->18 MB, 28 MB goal, 4 P
gc 8 @0.326s 6%: 0+11+0.50 ms clock, 0+7.5/11/9.0+2.0 ms cpu, 26->26->26 MB, 37 MB goal, 4 P
gc 9 @0.586s 5%: 0+28+0 ms clock, 0+28/15/14+0 ms cpu, 54->54->37 MB, 55 MB goal, 4 P
gc 10 @0.698s 6%: 0+66+1.0 ms clock, 0+0/57/9.0+4.0 ms cpu, 55->56->51 MB, 75 MB goal, 4 P
gc 11 @1.191s 5%: 0+65+1.0 ms clock, 0+65/34/31+4.0 ms cpu, 104->104->75 MB, 105 MB goal, 4 P
gc 12 @1.576s 4%: 0+111+1.0 ms clock, 0+0/4.5/109+4.0 ms cpu, 117->118->72 MB, 150 MB goal, 4 P
gc 13 @2.396s 4%: 0+124+0.50 ms clock, 0+124/57/66+2.0 ms cpu, 173->173->150 MB, 174 MB goal, 4 P
gc 14 @2.532s 6%: 0+178+0.54 ms clock, 0+62/178/0+2.1 ms cpu, 212->214->213 MB, 300 MB goal, 4 P
gc 15 @4.635s 6%: 0.49+316+1.0 ms clock, 1.9+316/172/143+4.0 ms cpu, 437->437->300 MB, 438 MB goal, 4 P
gc 16 @4.978s 8%: 0+553+0.47 ms clock, 0+115/553/0+1.9 ms cpu, 424->430->426 MB, 601 MB goal, 4 P
Ready. Loaded 1002662 serialsgc 17 @6.215s 14%: 0+0+303 ms clock, 0+115/553/0+1215 ms cpu, 452->452->283 MB, 452 MB goal, 4 P (forced)
GC took 304.8374ms
GC forced
gc 18 @126.525s 0%: 0+525+0.53 ms clock, 0+0/525/3.0+2.1 ms cpu, 283->283->283 MB, 566 MB goal, 4 P
scvg0: inuse: 335, idle: 227, sys: 563, released: 0, consumed: 563 (MB)

@genez

This comment has been minimized.

Copy link
Author

commented May 26, 2016

I am going to try with Go tip tomorrow and let you know the results

@genez

This comment has been minimized.

Copy link
Author

commented May 26, 2016

tested with Go tip

go version devel +2168f2a Thu May 26 16:27:24 2016 +0000 windows/amd64

same results (client side):
min: 0.451400 | avg: 0.873292 | max: 218.594800

I am attaching gctrace file

thanks
gctrace.txt

@ianlancetaylor ianlancetaylor added this to the Go1.7Maybe milestone May 26, 2016

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented May 26, 2016

@RLH

This comment has been minimized.

Copy link
Contributor

commented May 26, 2016

By client side I'm assuming we are talking about end to end, which is what matters. I'm also assuming the trace is from the server and the numbers are in ms.

From the gctrace I notice this line

gc 17 @5.964s 10%: 0+0+235 ms clock, 0+275/166/108+942 ms cpu, 456->456->283 MB, 456 MB goal, 4 P (forced)

indicating a 235ms STW pause. The "(forced)" annotation means that GC 17 is the result of a call to runtime.GC() which does a STW collection instead of a concurrent GC. Removing the runtime.GC() call from server code will eliminate that STW pause which may be the problem. If the client pauses could be matched with the server GC pauses that would help confirm it is a GC issue. It says "Ready" just before the call to runtime.GC(), what does that mean?

From the trace it also looks like the code spends 5 seconds populating the in-(go)heap database. This can be deduced by looking at the heap sizes which reach 456MB at GC 17 and then the server goes into a steady state where no GC activity happens between 5.96 seconds and 34 .93 seconds.

Let us know if removing the runtime.GC() call helps and if the client pauses are coordinated with the server GC times. Finally if it is easy to calculate 95, 99 and 99.9 percentiles on the client that would also be interesting.

@rsc rsc changed the title High garbage collection latency (+200ms) with test client/server programs runtime: >200ms GC latency in test programs (calling runtime.GC?) May 27, 2016

@genez

This comment has been minimized.

Copy link
Author

commented May 27, 2016

Let me clarify a bit more what's happening at client side:

  • testclient is run with GOGC=off just to make sure that the delay is not related to client GC
  • server is started with GOGC=100 and GODEBUG=gctrace=1.
  • when server starts up, it reads 1 million "objects" from a SQLITE db into 5 distinct maps
    (map[[20]byte]FullCode)
    One map is "global" with all codes, the other maps are a subset (distinct by Type)

objects are structured extactly like this:

type FullCode struct {
Type int
Code string
Status int
Children map[[20]byte]FullCode
}

after startup, my server now performs a manual GC

func timeGC() time.Duration {
start := time.Now()
runtime.GC()
return time.Since(start)
}

At this stage, testclient program starts and begins sending UDP requests to the server.
after a few seconds I can see that cilent detects RTT > 200ms and exactly at the same time the server prints out a GC trace line.

The trace line I am talking about is this

gc 18 @126.525s 0%: 0+525+0.53 ms clock, 0+0/525/3.0+2.1 ms cpu, 283->283->283 MB, 566 MB goal, 4 P

I can record a video and upload it to youtube to better explain the situation if it's ok for you

@aclements

This comment has been minimized.

Copy link
Member

commented May 27, 2016

This is almost certainly #10345 (which is a generalization of #9477), given the map with 1 million objects. @genez, would you be able to test a small patch to the runtime? There's an easy way to confirm or deny the connection with #10345 if we can tweak the runtime.

@dlsniper

This comment has been minimized.

Copy link
Contributor

commented May 27, 2016

Hi @RLH and @aclements I've tested this out for a bit yesterday as it came out on slack.
The issue doesn't happen with 200k or 300k items in the maps. As soon as this gets bigger then during the test there will be a pause at some moment when the GC runs automatically.
With lower count of items, the pause varies from 1-2 ms to 25 ms but it's within ok limits.
For me it looks like #9477 .

Since the maps in this case are allocated once and the elements are not deleted then would it could we maybe get a way to track if the state is changed between GC runs so that this doesn't need to do anything during the STW phase? Something like: the GC computes the hash of the map while running the the parallel phase and in the STW phase it compares the hash from the previous one with the current one and if they are different they then it doesn't do anything for it?

@genez

This comment has been minimized.

Copy link
Author

commented May 27, 2016

@aclements yes I can test the patch, just let me know how to apply it

Thanks

@aclements

This comment has been minimized.

Copy link
Member

commented May 27, 2016

@dlsniper, just to be clear on the terminology, this issue isn't about the STW phase, it's about mutator availability. According to the gctrace, the STW phases are all sub-millisecond. If this is #9477, the problem is that the process of scanning a single object, even during concurrent marking, is currently non-interruptible on the particular thread it's running on. Other threads will continue to make progress, though they might also get caught up scanning large single objects. Of course, this is still a GC-caused latency problem; it's just not a problem in the STW phase. :)

It's not safe to exempt the map from GC. You would also have to exempt everything reachable from that map.

@genez, great. I'll cook up a patch.

@RLH

This comment has been minimized.

Copy link
Contributor

commented May 27, 2016

The gctrace line

gc 18 @126.525s 0%: 0+525+0.53 ms clock, 0+0/525/3.0+2.1 ms cpu, 283->283->283 MB, 566 MB goal, 4 P

says that the concurrent mark phase takes 525 + 0.52 ms (wall) clock and uses 0/525/3 + 2.1 ms CPU clock. Since we have 4 P (cpus) then it looks like the concurrent mark phase is using exactly 1/4 of the CPUs and the STW mark termination phase is using all of the CPUs but for only .52 (wall) ms or .52*4 (2.1) ms CPU time. This would seem to leave the expected 75% of the total CPU for the mutator during the concurrent mark phase. All this matches up with what we expect to see.

How many requests are in flight when the GC is being run and are all of them delayed? How many are in flight when the GC is not running? Are requests handled FIFO that might result in a convoy of requests? Even if a requests gets backed up behind a long mark involving a map other goroutines should make progress since there is 75% CPU availability. If other goroutines / requests are making progress then perhaps this isn't a global mutator availability problem but rather a problem with the GC conspiring with the scheduler and probability to starve one of the goroutines.

@aclements

This comment has been minimized.

Copy link
Member

commented May 27, 2016

@genez, the test patch is https://go-review.googlesource.com/c/23512/. To pull it in to your local Go clone (which I assume you have, since you reported the issue still happens on tip), run git fetch https://go.googlesource.com/go refs/changes/12/23512/1 && git checkout FETCH_HEAD. Run it with GODEBUG=gctrace=1 and it should print "long scanobject" messages with some useful details when things go wrong.

@genez

This comment has been minimized.

Copy link
Author

commented May 27, 2016

I applied the patch and recompiled Go tip

Response time measured from the client application (after 1 million requests) has changed dramatically:
min: 0.918100 | avg: 1.007302 | max: 73.195100

average and minimum are now a bit higher than with previous runtime but the max is less than 30%
a big improvement!!

please check attached execution log with gctrace enabled
gctrace_with_patch.txt

regarding the questions from @RLH

The logic is strictly sequential: request-response
no parallelism at the moment
the server application does handle the incoming request in a goroutine but there's only 1 active request at a time if you only start 1 client

I hope this answers all of your questions

Please remember that in my dropbox shared folder you can find server and client source code + resources file needed to run the applications
Maybe they can be of some value for you

@aclements

This comment has been minimized.

Copy link
Member

commented May 28, 2016

@genez, I have another patch for you to try if you don't mind: https://go-review.googlesource.com/23540 (It's probably easier for you to try, since I don't have your code set up; if this doesn't help I'll look into locally reproducing this with the code you posted.) Like before, use git fetch https://go.googlesource.com/go refs/changes/40/23540/2 && git checkout FETCH_HEAD to pull it into your local Go clone.

@genez

This comment has been minimized.

Copy link
Author

commented May 28, 2016

Just tried latest patch: "HEAD is now at 6632631... runtime: bound scanobject to ~100 µs"

client side max jumped after a few seconds from 2ms to 52ms:
(two susbequent lines of client log)
min: 0.464200 | avg: 0.867400 | max: 2.005700
min: 0.464200 | avg: 0.870495 | max: 52.144300

server printed this line:
gc 21 @28.678s 1%: 1.0+108+0 ms clock, 4.0+0/53/54+0 ms cpu, 344->344->187 MB, 374 MB goal, 4 P

You can find attached gctrace file
gctrace_with_patch_2.txt

Regarding my source, I was only suggesting to look at the code. I don't mind testing your patches, as this is a very important task for me

Thank you

@genez

This comment has been minimized.

Copy link
Author

commented May 28, 2016

Sorry guys.
I'm afraid I was testing against a modified server source with a simplified FullCode struct
it didn't contain one member, the Code string, it was commented out for one of my tests.

I have just re-run the test with patch 2 with the correct server source (almost identical to original source, just with manual GC collection and loaded items count)

client still shows >200ms latency
min: 0.463700 | avg: 0.879956 | max: 225.606500

server shows
gc 18 @38.937s 1%: 1.0+447+2.0 ms clock, 4.0+0/232/221+8.0 ms cpu, 494->496->284 MB, 566 MB goal, 4 P

I am terribly sorry for the mistake
I am going to make a copy of the sources just to be sure not to do the same error again

This shows a big difference in timing if I include a string member in my struct, or not.
Maybe this can help you with your investigations

please check updated gctrace log
gctrace_with_patch_2_correct_source.txt

@genez

This comment has been minimized.

Copy link
Author

commented May 28, 2016

I added a short video demonstrating the issue to my dropbox shared folder that contain sources and resrouce files:

https://www.dropbox.com/sh/e2timsyjifvp61i/AAAmDoKSP7NANjDlsYcGcBvGa?dl=0

(server is slowed down by the screen capture software)
Maybe it can be of some help

@aclements

This comment has been minimized.

Copy link
Member

commented May 31, 2016

@genez, thanks for trying the second patch. Could you try another run with GOMAXPROCS=3? You might also need patch 2.

It's possible this is a scheduler issue where the one P that's running your one goroutine becomes a dedicated mark worker when GC starts and your goroutine winds up stuck on the run queue for the whole GC cycle. GOMAXPROCS=3 will eliminate that dedicated mark worker.

@genez

This comment has been minimized.

Copy link
Author

commented Jun 1, 2016

Thank you @aclements
Just tested with GOMAXPROCS=3

Client side: no effects
I hope the gctrace file will show you something useful
gctrace_GOMAXPROC_3.txt

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 1, 2016

Interesting. So, I think I've reproduced the problem in a small(ish) test case. The trace of that test case does confirm my hypothesis that the garbage collector is taking over the one P that's running user code and for some reason the user goroutine isn't getting moved or stolen. I thought GOMAXPROCS=3 would mitigate this, but it turns out it just changes its nature: the fractional worker only gets 75% of a P, but that's still enough to starve the user goroutine on that P for long stretches. The trace also confirms another thing the gctraces are showing: we're often not spinning up idle mark workers on the idle Ps.

@genez

This comment has been minimized.

Copy link
Author

commented Jun 1, 2016

ok please let me know if I can help you in any way

@gianpaolog

This comment has been minimized.

Copy link

commented Jun 3, 2016

@aclements I am very interested in the solution of this issue, do you think it could be solved as of 1.7?

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 7, 2016

@genez, what happens if you run with GOMAXPROCS=1?

@gianpaolog, unfortunately the issue involved is quite subtle and despite understanding it (I think), I'm not actually sure how to fix it, which suggests any fix would be too risky for this point in the 1.7 development cycle. However, from what I understand, it affects applications with very few goroutines the most (e.g., one active goroutine at a time) and I believe running with GOMAXPROCS=1 should work around the issue without significantly affecting performance (assuming the application doesn't go through phases where there are many active goroutines).

@genez

This comment has been minimized.

Copy link
Author

commented Jun 7, 2016

@aclements: I have just fetched gotip sources from scratch and re-applied your two patches as per your instructions. running the server with GOMAXPROCS=1 dramatically improves max latency, that now tops to 43ms after 3 millions requests (with a mean value of 0.63ms). unfortunately still not enough for my requirements of 10ms max latency.

regarding your assumptions, you're right: I will eventually end up having more less 10 active clients in a real world scenario, so we are talking about a very limited number of goroutines. also, throughput is not my main concern (but latency is)

@gianpaolog

This comment has been minimized.

Copy link

commented Jun 8, 2016

Waiting for a 1.7.1 could work for me, but I would like to understand if you think it is feasible to solve the bug and provide a latency of a few ms (the expectation from the talks is actually around 3-5ms, in the scenario of less than 2GB of memory managed by GC).

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 8, 2016

@genez, thanks for trying that out. In general, it's more useful to know high percentile latency rather than absolute max latency. Unless you're using a system that's hard real time from top to bottom, including the OS kernel, there are a lot of things that can cause occasional high latency. Do you know what your, say, 99%ile or 99.9%ile latency is?

@gianpaolog, based on @genez's report, it sounds like setting GOMAXPROCS=1 is a viable workaround for this issue. We generally only do point releases for crash bugs or serious performance issues that can't be worked around, so this wouldn't qualify.

This particular issue is at an unfortunate cross-roads: our sub-10ms design goal has been for global STW time; so far we've focused much less on the impact to latency of individual goroutines where other goroutines can continue to execute in parallel. However, if an application only has one goroutine, then impacting a single goroutine's latency might as well be a global stop, which is what we're seeing in this issue. With 1.7, we've focused on rooting out the last sources of STW latency, so now that that should no longer be a primary source of latency, we're going to turn out attention to other sources of latency for the 1.8 cycle, including scheduler-induced latency like in this issue.

@aclements aclements modified the milestones: Go1.8Early, Go1.7Maybe Jun 8, 2016

@genez

This comment has been minimized.

Copy link
Author

commented Jun 14, 2016

@aclements this is only a test, the real software is going to run in a soft-realtime environment (including a PREEMPT_RT patched kernel, no paging, no disk access, and other well known best practices for soft realtime systems)

Anyway I would not focus on this, as I implemented the very same server in other languages (including C/C++, Python and Java) and then tested them in the same conditions (with the same Go client)
For example Java scored a max latency of 37ms after 3 million requests (but with a considerably higher mean value of 1.50ms)
Python does a better job with a 17ms max latency
From the beginning I was expecting Go to outperform the other implementations and score something near to the C/C++ implementation (5ms max latency)

I am now going to improve the test client to plot percentiles and will let you know the results

@gianpaolog

This comment has been minimized.

Copy link

commented Jun 21, 2016

@aclements in my tests, I am experiencing peaks in latency (about 200ms) on average in 1 sample out of 10,000. The peaks totally disappear if I use GOGC=off.
Since this service is working on an industrial production line, the line stops every few minutes due to this latency. Any chance to have this issue fixed for 1.8?

@dr2chase

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2016

fyi, Austin's herding cats this week at a conference, so he may not get back to you immediately.

Do you understand the distinction Austin is making between GC stop-the-world latency, and the possibility of a concurrent GC "accidentally" stealing too much of a worker goroutine's forward progress and thus causing whole-system latency?

@gianpaolog

This comment has been minimized.

Copy link

commented Jun 21, 2016

Yes: not a STW issue, but Austin was speaking about a concurrency issue that could be solved, anyhow.

@gianpaolog

This comment has been minimized.

Copy link

commented Jun 21, 2016

In the end of the day, users are experiencing high latency peaks due to GC behaviour, and this is a big limitation for a number of applications. In my case, I would not be able to replace the existing C service with Go.

@RLH

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2016

This is certainly on the list for 1.8. To get us to focus on it early and
often all we need is a simple example that we can run locally.

On Tue, Jun 21, 2016 at 10:39 AM, Gianpaolo Gasparini <
notifications@github.com> wrote:

@aclements https://github.com/aclements in my tests, I am experiencing
peaks in latency (about 200ms) on average in 1 sample out of 10,000. The
peaks totally disappear if I use GOGC=off.
Since this service is working on an industrial production line, the line
stops every few minutes due to this latency. Any chance to have this issue
fixed for 1.8?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#15847 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AA7Wn7oemnpU3BjLjWaRqkLskIGUJOGxks5qN_eggaJpZM4Inih6
.

@gianpaolog

This comment has been minimized.

Copy link

commented Jun 21, 2016

@RLH here you are:
https://www.dropbox.com/sh/693jdhz2uef061y/AAAwV3hTVFvYfkk1Im5pvnb_a?dl=0

Instructions:
Once unzipped L1.7z (contains a text file with sample data):

  1. run ets3 (the server listening on UDP port 4420)
  2. run "./client 127.0.0.1:4420 L1"

L1 file contains about 1 million of fake messages to be sent to the server. The average answer takes around 0.3ms, but every 8-10k samples, you'll see a major latency.

@RLH

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2016

Thanks, I was able to reproduce the issue locally.

On Tue, Jun 21, 2016 at 4:42 PM, Gianpaolo Gasparini <
notifications@github.com> wrote:

@RLH https://github.com/RLH here you are:
https://www.dropbox.com/sh/693jdhz2uef061y/AAAwV3hTVFvYfkk1Im5pvnb_a?dl=0

Instructions:
Once unzipped L1.7z (contains a text file with sample data):

  1. run ets3 (the server listening on UDP port 4420)
  2. run "./client 127.0.0.1:4420 L1"

L1 file contains about 1 million of fake messages to be sent to the
server. The average answer takes around 0.3ms, but every 8-10k samples,
you'll see a major latency.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#15847 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AA7Wn93dMCWFDlBayLBA7tzYGwZ2kXpeks5qOEyZgaJpZM4Inih6
.

@gianpaolog

This comment has been minimized.

Copy link

commented Jul 5, 2016

Hi @RLH ,
just to let you know that we repeated the test on the same machine with .Net Core.
We had exactly 2 peaks in the beginning of the test (loading something?) around 12ms, while the rest of the million samples were stable around 0.3ms.
Please don't tell me I must use the old C#... ;-)

@genez

This comment has been minimized.

Copy link
Author

commented Sep 15, 2016

just an update, I am following the issue #16293 and I was thinking that the change done to the GC could be good for this issue aswell.
I am now re-running my tests with up-to-date master go source and absolute maximum latency has dropped from 200ms to 21ms
In 1 million request/reply cycles I only had 5 occurrences over 20ms and 21 occurrences between 10ms and 20ms.
this is a great improvement!

@aclements

This comment has been minimized.

Copy link
Member

commented Sep 15, 2016

@genez, that's good to hear! Note that the fix for issue #16293 is the same as the patch you tried earlier (CL 23540) and reported didn't fix the issue. However, it's possible that combined with some other fixes on master helped.

I'm still planning to fix the problem with idle mark workers running when there is actual work to do, which was one of the root causes we worked out in this issue.

@quentinmit

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

@aclements: Should this issue be retitled to be specifically about idle mark workers and moved out of Go1.8Early?

@aclements

This comment has been minimized.

Copy link
Member

commented Sep 29, 2016

Closing, since @genez's original application seems to be doing much better. We'll track the remaining issue with idle mark workers running when there's real work to do at #16528.

@aclements aclements closed this Sep 29, 2016

@golang golang locked and limited conversation to collaborators Sep 29, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.