Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: reduce GC pause frequency for Conn.AsyncWrite #218

Merged
merged 11 commits into from
Jul 7, 2021
Merged

Conversation

panjf2000
Copy link
Owner

Fixes #214


name: Pull request
about: Propose changes to the code
title: 'reduce GC pause frequency for Conn.AsyncWrite'
labels: ''
assignees: ''

1. Are you opening this pull request for bug-fixs, optimizations or new feature?

Optimizations

2. Please describe how these code changes achieve your intention.

Eliminate function literals and reuse structs with the help of sync.Pool

3. Please link to the relevant issues (if any).

#214

4. Which documentation changes (if any) need to be made/updated because of this PR?

None

4. Checklist

  • I have squashed all insignificant commits.
  • I have commented my code for explaining package types, values, functions, and non-obvious lines.
  • I have written unit tests and verified that all tests passes (if needed).
  • I have documented feature info on the README (only when this PR is adding a new feature).
  • (optional) I am willing to help maintain this change if there are issues with it later.

@codecov
Copy link

codecov bot commented Jul 5, 2021

Codecov Report

Merging #218 (de15d3a) into master (8aeb278) will decrease coverage by 0.07%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #218      +/-   ##
==========================================
- Coverage   85.41%   85.34%   -0.08%     
==========================================
  Files          18       18              
  Lines        1193     1187       -6     
==========================================
- Hits         1019     1013       -6     
  Misses        134      134              
  Partials       40       40              
Flag Coverage Δ
unittests 85.34% <75.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
connection_unix.go 73.04% <66.66%> (-0.24%) ⬇️
eventloop_unix.go 74.00% <66.66%> (+1.74%) ⬆️
acceptor_unix.go 40.90% <100.00%> (ø)
server_unix.go 91.66% <100.00%> (ø)
loop_linux.go 75.00% <0.00%> (-25.00%) ⬇️
reactor_linux.go 90.90% <0.00%> (-9.10%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7ea839...de15d3a. Read the comment docs.

Copy link

@lming lming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change. I'll find a time to run the benchmark but I think it will resolve the gc issue I mentioned previously.

However using two separate async task queues may need more consideration.

internal/netpoll/kqueue_events.go Outdated Show resolved Hide resolved
internal/netpoll/queue/queue.go Show resolved Hide resolved
internal/netpoll/epoll.go Show resolved Hide resolved
@lming
Copy link

lming commented Jul 6, 2021

Re: e6a7563

It's not ideal to handle Close event specially. I think the core issue to be addressed is to bound the memory usage (or queue lengh) in some way. I'd suggest you to consider this solution:

  • Expose an option allowing client to set MaxPendingWriteTasksPerConn
  • When AsyncWrite is called, a per connection counter of NumPendingWriteTasks is checked. If it reaches the limit, the write task is dropped and an error is returned to the caller.
  • When the write task is executed by the poller, the connection's NumPendingWriteTasks is decremented

The benefit:

  • The number of data write tasks per connection is bounded, thus the total number of pending async tasks is also bounded
  • We by pass checking MaxPendingWriteTasksPerConn for non-data-write tasks so they won't be dropped
  • We still use one queue for all async tasks and thus the ordering is maintained.

Another robustness sugggestion is to expose an option to limit the max write buffer size per connection, which prevents slow connection accumulate unbounded sending buffer.

@lming
Copy link

lming commented Jul 6, 2021

Further thought: I think it's important for gnet to cap per connection resource usage as other system does, for example Linux kernal has options for per connection send/recv buffer size. In gnet's case, per connection's send buf size and async tasks number should be capped.

My previous proposal has a flaw in that AsyncWrite may be successful but later the buf was dropped because the send buff is full. It can be addressed with following improvement:

  • Expose per conn options MaxPendingWriteTasksPerConn and MaxWriteBuffSizePerConn
  • When client calls AsyncWrite:
    1. The conn's NumPendingTasks is checked, returns error if it reaches the limit
    2. The conn's write buf size is checked (TotalWriteBuffSizeInPendingTasks + len(c.outboundBuffer) < `MaxWriteBuffSizePerConn), returns error if the write buf size reaches the limit

With this change, the caller is aware of the error immediately when one of the two limits is reached. I am not sure how difficult to do the buf size check within AsyncWrite in a lock-free fashion though.

Appologize if this makes the PR even more complicated. Hope you'll see that I am trying to make constructive suggestions to make gnet more robust. Cheers.

@panjf2000
Copy link
Owner Author

Limiting queue size and write buffer may also make points in this case, but I still think that it's rational to introduce a new task queue with higher priority, sometimes we want to run tasks inside event-loops as soon as possible, we don't want them to be delayed by tasks of AsyncWrite, there are use cases in gnet currently.

As for the special treatment to Conn.Close(), I don't think it will be a problem here, this API is public for users and it might get involved with AsyncWrite, so all tasks of Conn.Close() can be put into the task queue with lower priority, just like AsyncWrite.

Appologize if this makes the PR even more complicated. Hope you'll see that I am trying to make constructive suggestions to make gnet more robust. Cheers.

No need to feel sorry for your thoughtful consideration, I actually appreciate it. I reckon that MaxPendingWriteTasksPerConn and MaxWriteBuffSizePerConn could be doable features for gnet, but I'm not sure that we should do it in this PR, maybe take it one step at a time, let's just be absorbed in this issue of GC pause frequency and run some tests to verify these code changes, and I'm so open to discussing more optimizations for gnet after this PR merged, thanks~

@panjf2000
Copy link
Owner Author

Would you spare some time to run your tests on this PR?
@lming

@lming
Copy link

lming commented Jul 7, 2021

Agreed that a priority task queue is necessary for certain admin ops. Rather than naming them Trigger and TriggerLag, I would call them UrgentTrigger and Trigger.

Turns out that my test code relies on changes on my fork of gnet. It will take some time to merge this branch to my fork to run the test. From the code review I think the gc pause issue should be greatly aliviated. I'd suggest to merge this PR when you feel comfortable and I'll later merget it into my fork and do the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: cap the memory usage and allocation caused by AsyncWrite
2 participants