Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: add block/wake function #46431

Open
GuhuangLS opened this issue May 28, 2021 · 12 comments
Open

proposal: runtime: add block/wake function #46431

GuhuangLS opened this issue May 28, 2021 · 12 comments
Labels
Projects
Milestone

Comments

@GuhuangLS
Copy link

@GuhuangLS GuhuangLS commented May 28, 2021

Background: channel is designed to be a data transfer pipe between
go routines. A scenario in gVisor is: block/wake a G (stands for a
task) by using channel, wherein there's no need for data transfer.
Channel is relatively too heavy for this case.
Channel manages Gs in a list, and has capacity. When the channel is
full, the sender will bock, and the channel is clear, the receiver
is blocked. The receiver pushed to chan list, then schedule to run
next. The sender uses goready to wake Gs in the chan list.

A summary on our idea: introduce a new set of APIs for goroutine
wake/block.
For example, Waitlist at gVisor(such as futex.waiter:
pkg/sentry/kernel/futex/futex.go) used to task IPC; Waiter uses
GetG() to get the address of G, then saved at waiter struct; then
call BlockG to schedule the M to run next G. The coming waker
uses the G(waiter) address to wake it by WakeG(G).

Details:

We propose three new APIs:

GetG() - used to get the address of G by itself, which can be saved
at waitlist.By address, the waker find the G easier, then to wake it at
go runtime.

WakeG() - can be used to wake one G, which can be in running/blocked
status.
If the waiter has been blocked aready, then waker wakes it by
goready().
If the waiter is in waiting process, the one got sched.wakeLock first
processes first:if waiter is later, then waker wakes it by goready(). If the
waiter got the lock after the waker,then no need to block, because
"gp.wakeStatus == gStatusWaked".
If the waker coming, waiter not blocked, then if the waiter is at list, then
modifies gp.wakeStatus = gStatusWaked. when waiter coming, no need
bo block. if no waiter at the list, no need to WakeG().

BlockG() - can be used to block goroutine by itself.
modify:
GuhuangLS@109c8f5

How we use this in gVisor:

GuhuangLS/gvisor@97e0e6c

In futex()/epoll_wait(), we can modify it to use the new mechanism
for block and wake. Between sentry and go runtime, we maintain the
status of task Gs. Let's use futex as an example, add running status
at goruntime, NoWake,Waked,Blocked.

At sentry, one task/G can use BlockG() to block, like <-chan. Other
tasks/Gs can use WakeG() to wake the task/G which is blocked by BlockG()
, like chan <-. Based on a basic prototype of Go and gVisor, we use the
program in google/gvisor#2033(comment) as the test program.

We can see 22% improvement by test case:
google/gvisor#2033.
cc @prattmic @ianlewis @QinChenggang @tanjianfeng @lubinszARM

@seankhliao seankhliao changed the title schedule: add block/wake function proposal: runtime: add block/wake function May 28, 2021
@gopherbot gopherbot added this to the Proposal milestone May 28, 2021
@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented May 28, 2021

@ianlancetaylor

This comment has been hidden.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals May 28, 2021
@GuhuangLS

This comment has been hidden.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jun 1, 2021

Channel is relatively too heavy for this case.

If this is the case, can't we just improve the implementation of chan struct{} (or more generally, chan <any zero-sized type>) in the compiler+runtime? Have a special streamlined implementation for this use-case? That doesn't require adding a new API, and everyone benefits. It's not going to be as fast as the caller knowing exactly which G to wake up, but channels have lots of room for improvement in particular, I think.

Beyond that, does the sync package's primitives help here at all? And if not, could they be improved to help out here?

Also, is there anything we could in the runtime to improve the situation? In google/gvisor#2033 @prattmic noted performance issues in the scheduler such as #43997. I suspect there are still ways we can improve the scheduler without having to resort to new APIs that could help improve your (and gVisor's) situation.

Finally, I just want to say that in general, exposing the concept of a G to the user (beyond the concept of a goroutine I mean) brings with it potentially a lot of conceptual overhead. There are good arguments against something like goroutine-local-storage and the GetG function of this proposal seems to provide a direct mechanism to expose a unique identifier for a goroutine.

@prattmic
Copy link
Member

@prattmic prattmic commented Jun 1, 2021

For this case, I think there is potential for general runtime optimizations on channels.

One such proposed optimization is #32113, which I prototyped in #32113 (comment). That optimization applies well to the futex case, where wake+wait is common. IIRC, I applied this prototype google/gvisor#2033 and saw similar improvements to this prototype (~20%). Though I don't seem to have posted the benchmark results anywhere, so I may be misremembering and need to dig up the actual results.

@prattmic
Copy link
Member

@prattmic prattmic commented Jun 1, 2021

@GuhuangLS
Copy link
Author

@GuhuangLS GuhuangLS commented Jun 2, 2021

Channel is relatively too heavy for this case.

If this is the case, can't we just improve the implementation of chan struct{} (or more generally, chan <any zero-sized type>) in the compiler+runtime? Have a special streamlined implementation for this use-case? That doesn't require adding a new API, and everyone benefits. It's not going to be as fast as the caller knowing exactly which G to wake up, but channels have lots of room for improvement in particular, I think.

Beyond that, does the sync package's primitives help here at all? And if not, could they be improved to help out here?

Also, is there anything we could in the runtime to improve the situation? In google/gvisor#2033 @prattmic noted performance issues in the scheduler such as #43997. I suspect there are still ways we can improve the scheduler without having to resort to new APIs that could help improve your (and gVisor's) situation.

Finally, I just want to say that in general, exposing the concept of a G to the user (beyond the concept of a goroutine I mean) brings with it potentially a lot of conceptual overhead. There are good arguments against something like goroutine-local-storage and the GetG function of this proposal seems to provide a direct mechanism to expose a unique identifier for a goroutine.

For #43997, shows the case clearly.
Like gVisor, it mangages Gs by task struct; futex/lock realised at sentry, uses waitlist. Waiter and waker communicate each other by channel which saved at waiterlist. chan can be seen by gVisor and go runtime. if no chan, need other struct for communication.
The GetG likes chan, tells the waiter at runtime. Maybe change name of GetG to GetLightweightChan?
Except for task program/chan/scheduler, one more lightwait function could
enrich runtime.

@GuhuangLS
Copy link
Author

@GuhuangLS GuhuangLS commented Jun 2, 2021

For this case, I think there is potential for general runtime optimizations on channels.

One such proposed optimization is #32113, which I prototyped in #32113 (comment). That optimization applies well to the futex case, where wake+wait is common. IIRC, I applied this prototype google/gvisor#2033 and saw similar improvements to this prototype (~20%). Though I don't seem to have posted the benchmark results anywhere, so I may be misremembering and need to dig up the actual results.

#32113 shows wake/block by M waked and idle p got, it's good optimization. One G to goready other G, then to get idle p, to wake M, could the concurrence be improvement between goready and wake M?

#46431 optimise chan list managetime, it calls goready() at WakeG(), so #32113 optimsization and #46431 is not confliction. The two optimsization could work togerther.

@rsc
Copy link
Contributor

@rsc rsc commented Jun 9, 2021

It would definitely be much more preferable to focus on optimizations that apply to particular common uses of channels than to introduce new APIs.

@GuhuangLS
Copy link
Author

@GuhuangLS GuhuangLS commented Jun 10, 2021

It would definitely be much more preferable to focus on optimizations that apply to particular common uses of channels than to introduce new APIs.

if add the function to channel , is it feasible? Let channel owns the fast path.

@cocotyty
Copy link

@cocotyty cocotyty commented Jun 29, 2021

It would definitely be much more preferable to focus on optimizations that apply to particular common uses of channels than to introduce new APIs.

if add the function to channel , is it feasible? Let channel owns the fast path.

Great idea ! Maybe fast path on 'chan struct{}' ?

@GuhuangLS
Copy link
Author

@GuhuangLS GuhuangLS commented Jun 30, 2021

It would definitely be much more preferable to focus on optimizations that apply to particular common uses of channels than to introduce new APIs.

if add the function to channel , is it feasible? Let channel owns the fast path.

Great idea ! Maybe fast path on 'chan struct{}' ?

Hope it is; Maybe need any other modification to use 'chan struct{}' directly;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Proposals
Incoming
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants