New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite cbor decoder to avoid use of fnptrs. #49
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This results in substantial performance improvements. Previously, function pointers were used in the statemachines. (This is an idea I originally garnered from a talk on "Lexical Scanning in Go" -- https://talks.golang.org/2011/lex.slide#19 -- but it seems that it's a fairly fragile concept where minor variation from the pattern can result in disasterous obstructions to optimizations!) If these function pointers were *only* being treated as numbers, the performance of this usage would be perfectly fine. However, they are not: disassembling the compiler output for this design reveals that the compiler generates a type to hold closure information, **allocates it**, and then proceeds with *that* pointer. All this happens in a line of source which seems to be a simple "="! So. Rewrite the decoder to use simple consts for statemachine state. With this change, `step_acceptMapValue` went from 93 assembler instructions to 63, a closure disappeared, a generated type disappeared, **the implicit `runtime.newobject` disappeared**, and several occurances of `runtime.gcWriteBarrier` disappeared. All of these costs are associated with the generation of that closure, and the switch to using consts for the statemachine state eliminates all of them at once. The biggest impact is the removal of the unnecessary allocation. This substantially reduces the GC pressure generated, and thereby increases overall performance significantly. Benchcmp has the following to say: ``` benchmark old ns/op new ns/op delta Benchmark_StructAlpha_UnmarshalFromCborRefmt-8 6203 4792 -22.75% Benchmark_MapAlpha_UnmarshalFromCborRefmt-8 12656 11130 -12.06% benchmark old allocs new allocs delta Benchmark_StructAlpha_UnmarshalFromCborRefmt-8 54 10 -81.48% Benchmark_MapAlpha_UnmarshalFromCborRefmt-8 157 113 -28.03% benchmark old bytes new bytes delta Benchmark_StructAlpha_UnmarshalFromCborRefmt-8 1044 340 -67.43% Benchmark_MapAlpha_UnmarshalFromCborRefmt-8 4656 3952 -15.12% ``` The exact numbers will vary per shape of the data in the workload, but suffice it to say: significantly faster (>20% should be common), and drastically less memory pressure (80% less? You saw it here). The number of allocations for unmarshalling cbor into a struct with refmt is now *half* as many as the number of allocations for doing the same unmarshal using stdlib json. Nice. For reference, this is the *new* assembly of `step_acceptMapValue`: ``` 0x0000 00000 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) TEXT "".(*Decoder).step_acceptMapValue(SB), ABIInternal, $56-40 0x0000 00000 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) MOVQ (TLS), CX 0x0009 00009 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) CMPQ SP, 16(CX) 0x000d 00013 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) JLS 163 0x0013 00019 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) SUBQ $56, SP 0x0017 00023 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) MOVQ BP, 48(SP) 0x001c 00028 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) LEAQ 48(SP), BP 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) FUNCDATA $0, gclocals·56d33af5d84ec1114330c1119ad93f68(SB) 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) FUNCDATA $1, gclocals·f6bd6b3389b872033d462029172c8612(SB) 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) FUNCDATA $3, gclocals·fca29e89d033ef11d64e11a599ce9bf0(SB) 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) PCDATA $2, $1 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) PCDATA $0, $0 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) MOVQ "".d+64(SP), AX 0x0026 00038 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) MOVQ 8(AX), CX 0x002a 00042 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) PCDATA $2, $2 0x002a 00042 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) MOVQ 16(AX), DX 0x002e 00046 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) MOVQ 48(CX), CX 0x0032 00050 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) PCDATA $2, $0 0x0032 00050 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) MOVQ DX, (SP) 0x0036 00054 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) CALL CX 0x0038 00056 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) PCDATA $2, $1 0x0038 00056 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) MOVQ 24(SP), AX 0x003d 00061 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:198) MOVQ 16(SP), CX 0x0042 00066 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:199) TESTQ CX, CX 0x0045 00069 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:199) JEQ 96 0x0047 00071 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:200) PCDATA $0, $1 0x0047 00071 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:200) MOVB $1, "".done+80(SP) 0x004c 00076 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:200) PCDATA $0, $2 0x004c 00076 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:200) MOVQ CX, "".err+88(SP) 0x0051 00081 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:200) PCDATA $2, $0 0x0051 00081 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:200) MOVQ AX, "".err+96(SP) 0x0056 00086 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:200) MOVQ 48(SP), BP 0x005b 00091 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:200) ADDQ $56, SP 0x005f 00095 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:200) RET 0x0060 00096 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:202) PCDATA $2, $1 0x0060 00096 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:202) PCDATA $0, $3 0x0060 00096 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:202) MOVQ "".d+64(SP), AX 0x0065 00101 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:202) MOVB $5, 48(AX) 0x0069 00105 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:203) PCDATA $2, $3 0x0069 00105 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:203) PCDATA $0, $1 0x0069 00105 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:203) MOVQ "".tokenSlot+72(SP), CX 0x006e 00110 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:203) MOVB $0, 88(CX) 0x0072 00114 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:204) PCDATA $2, $4 0x0072 00114 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:204) MOVQ AX, (SP) 0x0076 00118 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:204) PCDATA $2, $0 0x0076 00118 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:204) MOVQ CX, 16(SP) 0x007b 00123 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:204) CALL "".(*Decoder).stepHelper_acceptValue(SB) 0x0080 00128 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:204) PCDATA $2, $1 0x0080 00128 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:204) MOVQ 40(SP), AX 0x0085 00133 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:204) MOVQ 32(SP), CX 0x008a 00138 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:205) MOVB $0, "".done+80(SP) 0x008f 00143 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:205) PCDATA $0, $2 0x008f 00143 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:205) MOVQ CX, "".err+88(SP) 0x0094 00148 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:205) PCDATA $2, $0 0x0094 00148 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:205) MOVQ AX, "".err+96(SP) 0x0099 00153 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:205) MOVQ 48(SP), BP 0x009e 00158 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:205) ADDQ $56, SP 0x00a2 00162 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:205) RET 0x00a3 00163 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:205) NOP 0x00a3 00163 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) PCDATA $0, $-1 0x00a3 00163 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) PCDATA $2, $-1 0x00a3 00163 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) CALL runtime.morestack_noctxt(SB) 0x00a8 00168 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:196) JMP 0 ``` And contrast it with the larger, *old* assembly for `step_acceptMapValue`: ``` 0x0000 00000 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) TEXT "".(*Decoder).step_acceptMapValue(SB), ABIInternal, $64-40 0x0000 00000 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) MOVQ (TLS), CX 0x0009 00009 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) CMPQ SP, 16(CX) 0x000d 00013 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) JLS 266 0x0013 00019 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) SUBQ $64, SP 0x0017 00023 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) MOVQ BP, 56(SP) 0x001c 00028 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) LEAQ 56(SP), BP 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) FUNCDATA $0, gclocals·5af671a95c0d19577a0fa6fa8a10967f(SB) 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) FUNCDATA $1, gclocals·7d2d5fca80364273fb07d5820a76fef4(SB) 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) FUNCDATA $3, gclocals·b3cd19c3ced5a6f764ea50d3b770f05d(SB) 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) PCDATA $2, $1 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) PCDATA $0, $0 0x0021 00033 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) MOVQ "".d+72(SP), AX 0x0026 00038 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) MOVQ 8(AX), CX 0x002a 00042 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) PCDATA $2, $2 0x002a 00042 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) MOVQ 16(AX), DX 0x002e 00046 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) MOVQ 48(CX), CX 0x0032 00050 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) PCDATA $2, $0 0x0032 00050 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) MOVQ DX, (SP) 0x0036 00054 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) CALL CX 0x0038 00056 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) MOVBLZX 8(SP), AX 0x003d 00061 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) PCDATA $2, $3 0x003d 00061 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) MOVQ 24(SP), CX 0x0042 00066 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) MOVQ 16(SP), DX 0x0047 00071 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:172) TESTQ DX, DX 0x004a 00074 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:172) JNE 241 0x0050 00080 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) PCDATA $2, $0 0x0050 00080 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:171) MOVB AL, "".majorByte+55(SP) 0x0054 00084 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) PCDATA $2, $1 0x0054 00084 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) LEAQ type.noalg.struct { F uintptr; R *"".Decoder }(SB), AX 0x005b 00091 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) PCDATA $2, $0 0x005b 00091 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) MOVQ AX, (SP) 0x005f 00095 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) CALL runtime.newobject(SB) 0x0064 00100 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) PCDATA $2, $1 0x0064 00100 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) MOVQ 8(SP), AX 0x0069 00105 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) LEAQ "".(*Decoder).step_acceptMapKey-fm(SB), CX 0x0070 00112 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) MOVQ CX, (AX) 0x0073 00115 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) PCDATA $2, $-2 0x0073 00115 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) PCDATA $0, $-2 0x0073 00115 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) CMPL runtime.writeBarrier(SB), $0 0x007a 00122 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) JNE 204 0x007c 00124 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) MOVQ "".d+72(SP), CX 0x0081 00129 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) MOVQ CX, 8(AX) 0x0085 00133 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) MOVQ AX, 48(CX) 0x0089 00137 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:176) PCDATA $2, $4 0x0089 00137 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:176) PCDATA $0, $1 0x0089 00137 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:176) MOVQ "".tokenSlot+80(SP), AX 0x008e 00142 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:176) MOVB $0, 88(AX) 0x0092 00146 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) PCDATA $2, $1 0x0092 00146 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) MOVQ CX, (SP) 0x0096 00150 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) MOVBLZX "".majorByte+55(SP), CX 0x009b 00155 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) MOVB CL, 8(SP) 0x009f 00159 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) PCDATA $2, $0 0x009f 00159 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) MOVQ AX, 16(SP) 0x00a4 00164 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) CALL "".(*Decoder).stepHelper_acceptValue(SB) 0x00a9 00169 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) MOVQ 32(SP), AX 0x00ae 00174 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) PCDATA $2, $3 0x00ae 00174 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) MOVQ 40(SP), CX 0x00b3 00179 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:178) MOVB $0, "".done+88(SP) 0x00b8 00184 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:178) PCDATA $0, $2 0x00b8 00184 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:178) MOVQ AX, "".err+96(SP) 0x00bd 00189 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:178) PCDATA $2, $0 0x00bd 00189 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:178) MOVQ CX, "".err+104(SP) 0x00c2 00194 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:178) MOVQ 56(SP), BP 0x00c7 00199 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:178) ADDQ $64, SP 0x00cb 00203 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:178) RET 0x00cc 00204 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) PCDATA $2, $-2 0x00cc 00204 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) PCDATA $0, $-2 0x00cc 00204 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) LEAQ 8(AX), DI 0x00d0 00208 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) MOVQ AX, CX 0x00d3 00211 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) MOVQ "".d+72(SP), AX 0x00d8 00216 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) CALL runtime.gcWriteBarrier(SB) 0x00dd 00221 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) LEAQ 48(AX), DI 0x00e1 00225 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) MOVQ AX, DX 0x00e4 00228 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) MOVQ CX, AX 0x00e7 00231 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) CALL runtime.gcWriteBarrier(SB) 0x00ec 00236 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:177) MOVQ DX, CX 0x00ef 00239 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:175) JMP 137 0x00f1 00241 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) PCDATA $2, $3 0x00f1 00241 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) PCDATA $0, $1 0x00f1 00241 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) MOVB $1, "".done+88(SP) 0x00f6 00246 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) PCDATA $0, $2 0x00f6 00246 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) MOVQ DX, "".err+96(SP) 0x00fb 00251 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) PCDATA $2, $0 0x00fb 00251 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) MOVQ CX, "".err+104(SP) 0x0100 00256 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) MOVQ 56(SP), BP 0x0105 00261 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) ADDQ $64, SP 0x0109 00265 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) RET 0x010a 00266 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:173) NOP 0x010a 00266 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) PCDATA $0, $-1 0x010a 00266 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) PCDATA $2, $-1 0x010a 00266 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) CALL runtime.morestack_noctxt(SB) 0x010f 00271 (/gopath/src/github.com/polydawn/refmt/cbor/cborDecoder.go:169) JMP 0 ``` (Assembly is from go version go1.12.5 linux/amd64.) Huge thanks to @gmasgras and the folks working on IPFS infra! They provided some pprof files that were a perfect kick in the shorts for starting to look into these issues, and were fantastic data to aim with. In the future, similar optimizations are probably possible in several other parts of refmt: the obj package also uses function pointers in several places where we now might regard it as a unwise. These are not all as simple to update, though, so it may take place in future PRs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This results in substantial performance improvements.
Previously, function pointers were used in the statemachines. (This is an idea I originally garnered from a talk on "Lexical Scanning in Go" -- https://talks.golang.org/2011/lex.slide#19 -- but it seems that it's a fairly fragile concept where minor variation from the pattern can result in disasterous obstructions to optimizations!)
If these function pointers were only being treated as numbers, the performance of this usage would be perfectly fine. However, they are not: disassembling the compiler output for this design reveals that the compiler generates a type to hold closure information, allocates it, and then proceeds with that pointer. All this happens in a line of source which seems to be a simple "="!
So. Rewrite the decoder to use simple consts for statemachine state.
With this change,
step_acceptMapValue
went from 93 assembler instructions to 63, a closure disappeared, a generated type disappeared, the implicitruntime.newobject
disappeared, and several occurances ofruntime.gcWriteBarrier
disappeared. All of these costs are associated with the generation of that closure, and the switch to using consts for the statemachine state eliminates all of them at once.The biggest impact is the removal of the unnecessary allocation. This substantially reduces the GC pressure generated, and thereby increases overall performance significantly.
Benchcmp has the following to say:
The exact numbers will vary per shape of the data in the workload, but
suffice it to say: significantly faster (>20% should be common), and
drastically less memory pressure (80% less? You saw it here).
The number of allocations for unmarshalling cbor into a struct with
refmt is now half as many as the number of allocations for doing
the same unmarshal using stdlib json. Nice.
For reference, this is the old assembly of
step_acceptMapValue
(93 instructions):And contrast it with the new assembly for
step_acceptMapValue
(only 63 instructions):(Assembly is from go version go1.12.5 linux/amd64.)
Huge thanks to @gmasgras and the folks working on IPFS infra! They provided some pprof files that were a perfect kick in the shorts for starting to look into these issues, and were fantastic data to aim with.
In the future, similar optimizations are probably possible in several other parts of refmt: the obj package also uses function pointers in several places where we now might regard it as a unwise. These are not all as simple to update, though, so it may take place in future PRs.