-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
This proposal addresses the long-standing request for a race detector that works with CGO_ENABLED=0.
Related issues:
- runtime/race: eliminate dependency on libc #9918 - race: runtime depends on libc (opened by @dvyukov, 2014, still open)
- runtime/race: eliminate dependency on cmd/cgo #6508 - runtime/race: can't be used with CGO_ENABLED=0 (2013)
- Race detector should not require cgo #38955 - race detector should not require cgo (2020, closed as duplicate)
Background
The current race detector requires ThreadSanitizer (C++), which creates several limitations:
CGO_ENABLED=1is mandatory- Cannot produce static binaries for
FROM scratchDocker containers - Cross-compilation requires target platform libc
- Platform-specific allocation failures (runtime: ThreadSanitizer failed to allocate / CHECK failed #37651, race: 'unsupported VMA range' on linux/arm64 #29948, runtime:
linux/s390xThreadSanitizer failed to allocate 0x7e0000 (8257536) bytes at address 900000180000 (errno: 12)in QEMU #67881)
Issue #9918 has been open for 11 years with "Unplanned" milestone. The solution proposed there — "eliminate all libc dependencies from race runtime" — is what this implementation provides.
Implementation
I've implemented a pure-Go race detector based on peer-reviewed algorithms:
| Algorithm | Paper | Contribution |
|---|---|---|
| FastTrack | PLDI 2009 | Core happens-before detection |
| BigFoot | PLDI 2017 | Static coalescing (90% barrier reduction) |
| SmartTrack | PLDI 2020 | Ownership tracking (10-20% faster) |
| O(1) Sampling | PLDI 2024 | Probabilistic sampling for CI/CD |
Current status (v0.7.0):
- 359/359 tests from Go's official race test suite passing
- Works with
CGO_ENABLED=0 - Standalone CLI tool:
racedetector build/run/test - All Go platforms supported (pure Go, no assembly required)
Performance optimizations:
- Lazy stack capture: 2.6x faster hot path (329ns → 127ns)
- Lock-free VarState: 10x faster concurrent access
- Sharded shadow memory: 12.5% faster under contention
- VectorClock pooling: 99.7% fewer allocations
Real-world testing:
- Validated on gogpu/wgpu — a pure-Go WebGPU implementation that requires
CGO_ENABLED=0
Repository: https://github.com/kolkov/racedetector
Questions for the community
Before pursuing a formal design document, I'd like to understand community needs:
1. Demand: Does the CGO dependency actually block you in production?
- 👍 Yes, this is a real blocker
- 👎 No, CGO_ENABLED=1 works fine for me
2. Integration preference:
- Standalone tool (current approach) is sufficient
- Would only use if built into
go build -race - Either option works
3. Performance expectations:
- ThreadSanitizer achieves 5-20x overhead
- What overhead is acceptable for pure-Go implementation?
4. Missing features:
- What would prevent you from using this today?
Compatibility
- API: Uses same
raceread/racewritesemantics - Behavior: Same detection capabilities as current race detector
- Breaking changes: None — this is additive functionality
- New capability:
CGO_ENABLED=0support
Next steps
Depending on community feedback:
- High demand: Prepare formal design document for Go team review
- Low demand: Continue as standalone tool
I'm happy to provide detailed benchmarks, demos, or additional technical information.
/cc @dvyukov (author of #9918, original ThreadSanitizer integrator)
/cc @aclements (runtime team lead)