opt: reduce GC pause frequency for Conn.AsyncWrite #218

panjf2000 · 2021-07-05T14:17:29Z

Fixes #214

name: Pull request
about: Propose changes to the code
title: 'reduce GC pause frequency for Conn.AsyncWrite'
labels: ''
assignees: ''

1. Are you opening this pull request for bug-fixs, optimizations or new feature?

Optimizations

2. Please describe how these code changes achieve your intention.

Eliminate function literals and reuse structs with the help of sync.Pool

3. Please link to the relevant issues (if any).

#214

4. Which documentation changes (if any) need to be made/updated because of this PR?

None

4. Checklist

I have squashed all insignificant commits.
I have commented my code for explaining package types, values, functions, and non-obvious lines.
I have written unit tests and verified that all tests passes (if needed).
I have documented feature info on the README (only when this PR is adding a new feature).
(optional) I am willing to help maintain this change if there are issues with it later.

Fixes #214

codecov · 2021-07-05T14:18:37Z

Codecov Report

Merging #218 (de15d3a) into master (8aeb278) will decrease coverage by 0.07%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##           master     #218      +/-   ##
==========================================
- Coverage   85.41%   85.34%   -0.08%     
==========================================
  Files          18       18              
  Lines        1193     1187       -6     
==========================================
- Hits         1019     1013       -6     
  Misses        134      134              
  Partials       40       40

Flag	Coverage Δ
unittests	`85.34% <75.00%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
connection_unix.go	`73.04% <66.66%> (-0.24%)`	⬇️
eventloop_unix.go	`74.00% <66.66%> (+1.74%)`	⬆️
acceptor_unix.go	`40.90% <100.00%> (ø)`
server_unix.go	`91.66% <100.00%> (ø)`
loop_linux.go	`75.00% <0.00%> (-25.00%)`	⬇️
reactor_linux.go	`90.90% <0.00%> (-9.10%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7ea839...de15d3a. Read the comment docs.

lming

Thanks for making the change. I'll find a time to run the benchmark but I think it will resolve the gc issue I mentioned previously.

However using two separate async task queues may need more consideration.

internal/netpoll/kqueue_events.go

internal/netpoll/queue/queue.go

internal/netpoll/epoll.go

… low-priority

lming · 2021-07-06T17:46:26Z

Re: e6a7563

It's not ideal to handle Close event specially. I think the core issue to be addressed is to bound the memory usage (or queue lengh) in some way. I'd suggest you to consider this solution:

Expose an option allowing client to set MaxPendingWriteTasksPerConn
When AsyncWrite is called, a per connection counter of NumPendingWriteTasks is checked. If it reaches the limit, the write task is dropped and an error is returned to the caller.
When the write task is executed by the poller, the connection's NumPendingWriteTasks is decremented

The benefit:

The number of data write tasks per connection is bounded, thus the total number of pending async tasks is also bounded
We by pass checking MaxPendingWriteTasksPerConn for non-data-write tasks so they won't be dropped
We still use one queue for all async tasks and thus the ordering is maintained.

Another robustness sugggestion is to expose an option to limit the max write buffer size per connection, which prevents slow connection accumulate unbounded sending buffer.

lming · 2021-07-06T18:34:28Z

Further thought: I think it's important for gnet to cap per connection resource usage as other system does, for example Linux kernal has options for per connection send/recv buffer size. In gnet's case, per connection's send buf size and async tasks number should be capped.

My previous proposal has a flaw in that AsyncWrite may be successful but later the buf was dropped because the send buff is full. It can be addressed with following improvement:

Expose per conn options MaxPendingWriteTasksPerConn and MaxWriteBuffSizePerConn
When client calls AsyncWrite:
1. The conn's NumPendingTasks is checked, returns error if it reaches the limit
2. The conn's write buf size is checked (TotalWriteBuffSizeInPendingTasks + len(c.outboundBuffer) < `MaxWriteBuffSizePerConn), returns error if the write buf size reaches the limit

With this change, the caller is aware of the error immediately when one of the two limits is reached. I am not sure how difficult to do the buf size check within AsyncWrite in a lock-free fashion though.

Appologize if this makes the PR even more complicated. Hope you'll see that I am trying to make constructive suggestions to make gnet more robust. Cheers.

panjf2000 · 2021-07-07T05:32:21Z

Limiting queue size and write buffer may also make points in this case, but I still think that it's rational to introduce a new task queue with higher priority, sometimes we want to run tasks inside event-loops as soon as possible, we don't want them to be delayed by tasks of AsyncWrite, there are use cases in gnet currently.

As for the special treatment to Conn.Close(), I don't think it will be a problem here, this API is public for users and it might get involved with AsyncWrite, so all tasks of Conn.Close() can be put into the task queue with lower priority, just like AsyncWrite.

Appologize if this makes the PR even more complicated. Hope you'll see that I am trying to make constructive suggestions to make gnet more robust. Cheers.

No need to feel sorry for your thoughtful consideration, I actually appreciate it. I reckon that MaxPendingWriteTasksPerConn and MaxWriteBuffSizePerConn could be doable features for gnet, but I'm not sure that we should do it in this PR, maybe take it one step at a time, let's just be absorbed in this issue of GC pause frequency and run some tests to verify these code changes, and I'm so open to discussing more optimizations for gnet after this PR merged, thanks~

panjf2000 · 2021-07-07T16:09:27Z

Would you spare some time to run your tests on this PR?
@lming

internal/netpoll/queue/queue.go

lming · 2021-07-07T17:42:35Z

Agreed that a priority task queue is necessary for certain admin ops. Rather than naming them Trigger and TriggerLag, I would call them UrgentTrigger and Trigger.

Turns out that my test code relies on changes on my fork of gnet. It will take some time to merge this branch to my fork to run the test. From the code review I think the gc pause issue should be greatly aliviated. I'd suggest to merge this PR when you feel comfortable and I'll later merget it into my fork and do the test.

Fixes panjf2000#214

opt: reduce GC pause frequency for Conn.AsyncWrite

1d61234

Fixes #214

panjf2000 added 4 commits July 5, 2021 22:18

chore: delete test code in CI workflow

0d835f0

fix: resolve issue that tasks in priorAsyncTaskQueue pass into silence

49528b2

chore: fix some linter issues

71227ad

opt: bump up the number of async tasks can be run

f27bd7e

lming reviewed Jul 6, 2021

View reviewed changes

internal/netpoll/kqueue_events.go Outdated Show resolved Hide resolved

internal/netpoll/queue/queue.go Show resolved Hide resolved

internal/netpoll/epoll.go Show resolved Hide resolved

panjf2000 added 3 commits July 6, 2021 15:08

chore: make variable self-explanatory in netpoll

40158a1

fix: put the task of Conn.Close into the asynchronous task queue with…

e6a7563

… low-priority

chore: update the value of MaxAsyncTasksAtOneTime in poller

ab7056a

chore: improve comments on pollers

9d52d80

lming reviewed Jul 7, 2021

View reviewed changes

internal/netpoll/queue/queue.go Show resolved Hide resolved

panjf2000 added 2 commits July 8, 2021 02:10

chore: rename Trigger-like methods in pollers

60644c5

chore: add comments for exported structs and functions

de15d3a

panjf2000 merged commit 2e758a9 into master Jul 7, 2021

panjf2000 deleted the reduce-gc branch July 7, 2021 18:38

0-haha pushed a commit to 0-haha/gnet that referenced this pull request Jan 25, 2023

opt: reduce GC pause frequency for Conn.AsyncWrite (panjf2000#218)

bc33eec

Fixes panjf2000#214

panjf2000 mentioned this pull request Mar 29, 2024

opt: mitigate the latency issue by prioritizing asynchronous writes #563

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: reduce GC pause frequency for Conn.AsyncWrite #218

opt: reduce GC pause frequency for Conn.AsyncWrite #218

panjf2000 commented Jul 5, 2021

codecov bot commented Jul 5, 2021 •

edited

Loading

lming left a comment

lming commented Jul 6, 2021 •

edited

Loading

lming commented Jul 6, 2021

panjf2000 commented Jul 7, 2021

panjf2000 commented Jul 7, 2021

lming commented Jul 7, 2021

opt: reduce GC pause frequency for Conn.AsyncWrite #218

opt: reduce GC pause frequency for Conn.AsyncWrite #218

Conversation

panjf2000 commented Jul 5, 2021

name: Pull request about: Propose changes to the code title: 'reduce GC pause frequency for Conn.AsyncWrite' labels: '' assignees: ''

1. Are you opening this pull request for bug-fixs, optimizations or new feature?

2. Please describe how these code changes achieve your intention.

3. Please link to the relevant issues (if any).

4. Which documentation changes (if any) need to be made/updated because of this PR?

4. Checklist

codecov bot commented Jul 5, 2021 • edited Loading

Codecov Report

lming left a comment

Choose a reason for hiding this comment

lming commented Jul 6, 2021 • edited Loading

lming commented Jul 6, 2021

panjf2000 commented Jul 7, 2021

panjf2000 commented Jul 7, 2021

lming commented Jul 7, 2021

name: Pull request
about: Propose changes to the code
title: 'reduce GC pause frequency for Conn.AsyncWrite'
labels: ''
assignees: ''

codecov bot commented Jul 5, 2021 •

edited

Loading

lming commented Jul 6, 2021 •

edited

Loading