Skip to content

runtime/pprof: CPU profiling causes severe Scheduler Delays with Deep Context Chains #75583

@JasonSongHoho

Description

@JasonSongHoho

Go version

go version go1.23.4 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY='git.garena.com'
GONOSUMDB='git.garena.com'
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.4'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build892259413=/tmp/go-build -gno-record-gcc-switches'

What did you do?

We discovered that CPU profiling causes severe scheduler delays when combined with deep context chains, making profiling unusable in production environments with real network operations.

Reproduction steps:

  1. Download reproduction program:
    # Save the attached go_profiling_issue_repro.go file 

go_profiling_issue_repro.go.zip

  1. Run with CPU profiling + ZooKeeper (demonstrates severe issue):

    go run go_profiling_issue_repro.go -maxprocs=4 -enable-zk=true -enable-profiling=true -context-depth=10000000
  2. Run without CPU profiling (normal behavior):

    go run go_profiling_issue_repro.go -maxprocs=4 -enable-zk=true -enable-profiling=false -context-depth=10000000

Environment:

  • Go version: go1.21.6 darwin/arm64
  • OS: macOS (also reproduced on Linux)
  • Hardware: Apple M1/M2, x86_64
  • GOMAXPROCS: 4

Test Configuration:

  • Context depth: 10,000,000 layers
  • GOMAXPROCS: 4
  • Real ZooKeeper network operations
  • Periodic CPU profiling (10s every 1min)
  • Ticker-based time-sensitive operations (50ms intervals)

Reproduction program code:
https://go.dev/play/p/1sgZodXCKvW?v=goprev

What did you see happen?

Program startup and initialization:

=== Go Context Scheduler Issue Reproduction ===
Go Version: go1.21.6
GOMAXPROCS: 4 (was: 14)
Context Depth: 10000000, Ticker Interval: 50ms
ZooKeeper: 127.0.0.1:2181 (enabled: true)
CPU Profiling: true

🔧 Creating 10000000-layer context chain...
✅ Deep context chain created

🚀 Starting background operations...
📊 Starting periodic CPU profiling: 10s every 1min
🔗 Connecting to ZooKeeper at 127.0.0.1:2181...
🔄 Started 10s CPU profiling session: repro_cpu_1758700953.prof

Severe performance degradation observed:

🐌 ZK operation slow: 511.066667ms (scheduler delay: -32.968416ms, tick: 11, op: 22)
🐌 ZK operation slow: 528.069875ms (scheduler delay: 58.083375ms, tick: 10, op: 23)
⚠️  Ticker delayed by 83.60925ms (count: 20)
🐌 ZK operation slow: 639.356875ms (scheduler delay: 2.306149875s, tick: 28, op: 64)
🐌 ZK operation slow: 545.889459ms (scheduler delay: 61.617583ms, tick: 29, op: 65)
🐌 ZK operation slow: 766.755709ms (scheduler delay: -9.641959ms, tick: 37, op: 82)
📊 Baseline: 31 ticks in 5 seconds

Key observations:

  1. Extreme Scheduler Delays:

    • Normal operations: Expected ~50ms intervals
    • During CPU profiling: 2.306149875s scheduler delay (46x slower)
    • ZK operations taking 500-800ms instead of expected <50ms
  2. Network Operation Impact:

    • ZooKeeper operations severely affected during profiling
    • Connection timeouts and delays cascade through the system
    • Real network I/O becomes unreliable during profiling sessions
  3. Ticker Performance Collapse:

    • Expected: 50ms ticker intervals
    • Observed: 83ms+ delays (67% slower)
    • Baseline shows only 31 ticks in 5 seconds (should be ~100 ticks)
  4. Context Chain Amplification:

    • 10M-layer context chains amplify the profiling overhead
    • Each context.WithTimeout operation becomes extremely expensive
    • Stack expansion during profiling creates performance bottlenecks

What did you expect to see?

Expected behavior:

  • CPU profiling should have minimal impact on Go Runtime Scheduler Latency

Root cause hypothesis:
Deep context chains require significant stack space during operations like context.WithTimeout. CPU profiling appears to interfere with goroutine scheduling during stack expansion, particularly when combined with network I/O operations.

Business impact:
If there's a deep context chain, user cannot use CPU profiling for performance debugging in production enviorenment.

This significantly impacts the observability of Go applications in production environments where deep context chains are standard practice for request tracing, authentication, and resource management.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReportIssues describing a possible bug in the Go implementation.NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.help wanted

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions