Skip to content

runtime: improve AddCleanup performance #75188

@podtserkovskiy

Description

@podtserkovskiy

Overview

Meta widely uses CGo, allowing Gophers to access a vast array of C++ libraries.
We previously advocated for the defer Close() pattern for C++ bounded objects; however, this defer call was frequently overlooked, leading to memory leaks as these objects resemble regular Go objects.
Consequently, we adopted automated memory management through runtime.AddCleanup.
Unfortunately, adding these cleanups incurs a performance penalty, specifically during the execution of the runtime.AddCleanup function.

Benchmark

Before implementing this change, we aimed to understand its performance implications.
Several online blog posts compare manual deallocation with runtime.SetFinalizer, highlighting the poor performance of finalizers. We conducted a similar benchmark using the brand new runtime.AddCleanup, introduced in Go 1.24.
While runtime.AddCleanup shows a 2x performance improvement, it still lags behind manual deallocation.

The following benchmark results on Go 1.25.0 on a MacBookPro M1 14" indicate approximately a 5x slowdown compared to manual deallocation (benchmarking on a Linux server yielded similar results):

BenchmarkAllocateFree-8                 15169078                73.42 ns/op
BenchmarkAllocateDefer-8                15890224                75.04 ns/op
BenchmarkAllocateAddCleanup-8            3516811               359.7 ns/op
BenchmarkAllocateWithFinalizer-8         1739244               676.9 ns/op

For the benchmark code and additional results, please refer to podtserkovskiy/slow-addcleanup-repro

Further investigation

Delving deeper into this issue, we believe the root cause lies in the linear search within the sorted linked list of special objects on mspan, specifically in the *mspan.specialFindSplicePoint function. A CPU profile generated on Go 1.25rc2 with GOGC=off (for a less noisy picture, as it doesn't significantly affect the results) illustrates this:

Image

Is there a potential to accelerate the *mspan.specialFindSplicePoint function (e.g., by using a skip-list instead of a plain linked-list)?

Metadata

Metadata

Assignees

Labels

ImplementationIssues describing a semantics-preserving change to the Go implementation.NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions