One possibility for the "C call into Go" direction would be to split crosscall2 into two functions. One would save callee-saved registers, call a function (passed in via a stack slot or a non-callee-save non-argument register such as r11) with the non-stack-pointer registers remaining as-is, and then restore the callee-saved registers. The other would copy argument registers into stack slots, invoke a Go function, and then copy return-parameter slots into return-registers.
The register-saving function does not depend on the particulars of the C call, so it could be reused for calls with other signatures (such as signal handlers). The arguments-to-slots function necessarily does depend on the function signature (because that determines which registers need to be copied and what size slots they need to be copied into), so that would need specific variants for crosscall2, sigtramp, and any other function with a distinct signature.
For the "Go call into C" direction (such as sigfwd; see #17641), we could probably factor out a function which aligns the stack pointer, invokes a function (again passed in a stack slot or non-argument register), and then restores the stack. It would still be up to the individual callers to load the appropriate argument registers, but that would at least cut down on some of the duplication (e.g. between sigfwd and callCgoSigaction).