Description
I don't really have specific recommendations here, but I wanted to report this troubleshooting session, because this is the worst time I've ever had trying to troubleshoot Go program performance, I was at my wit's end trying to debug it, Go has a brand of being not difficult to troubleshoot, and "I can find the problems quickly" is a big part of the reason I like the language. So some time spent thinking about and fixing these issues seems useful.
The details of the program are not that important, but I had a pretty simple program that checks for rows in a Postgres database, updates their status if it finds one, then makes an HTTP request with the contents of the database row. I tested it by writing a single database row once per second, with a single worker. Via *DBStats
I never had more than 4 database conns open at a time.
What I observed was high and growing RSS usage. In production this would result in the program being OOM killed after some length of time. Locally on my Mac, RAM use grew from 140MB to 850MB when I left it running overnight.
I tried all of the tricks I could find via blog posts:
- disabling HTTP2 support
- disabling MADV_DONTNEED
- setting Connection: close on the HTTP Client
- manually inspecting the code for goroutine leaks (constant use of around 20 goroutines)
- manually inspecting the code for defers in a for loop in a long running function, unbounded array appends etc. (none)
- ensuring HTTP bodies were fully consumed, DB queries closed after use, etc.
None of these made a difference, the memory usage kept growing. Running pprof and looking at the runtime memory statistics, I was struck by a difference between the reported values and the actual observed memory usage. These numbers are after the program had been running all night and Activity Monitor told me it was using 850MB of RAM.
# runtime.MemStats
# Alloc = 3410240
# TotalAlloc = 2834416608
# Sys = 213078288
# Lookups = 0
# Mallocs = 38955235
# Frees = 38943138
# HeapAlloc = 3410240
# HeapSys = 199819264
# HeapIdle = 194469888
# HeapInuse = 5349376
# HeapReleased = 192643072
# HeapObjects = 12097
# Stack = 1376256 / 1376256
# MSpan = 132600 / 196608
# MCache = 13888 / 16384
# BuckHashSys = 2045579
# GCSys = 8042776
# OtherSys = 1581421
# NextGC = 4194304
# LastGC = 1581710840991223000
# PauseNs = [...]
# PauseEnd = [...]
# NumGC = 1187
# NumForcedGC = 0
# GCCPUFraction = 4.524177278457185e-06
# DebugGC = false
Specifically there's only about 200MB of "Sys" allocated and that number stayed pretty constant even though Activity Monitor reported the program was using about 850MB of RSS.
Another odd thing was the inuse_space
heap profile only reported about 340kB of RAM usage. This sample was taken after reducing runtime.MemProfileRate
to 512 and after leaving the program running all night.
(pprof) top20
Showing nodes accounting for 313.47kB, 91.83% of 341.36kB total
Dropped 100 nodes (cum <= 1.71kB)
Showing top 20 nodes out of 130
flat flat% sum% cum cum%
76.76kB 22.49% 22.49% 76.76kB 22.49% runtime.malg
68.02kB 19.93% 42.41% 68.02kB 19.93% bufio.NewReaderSize (inline)
54kB 15.82% 58.23% 54kB 15.82% github.com/rcrowley/go-metrics.newExpDecaySampleHeap (inline)
15.35kB 4.50% 62.73% 57.89kB 16.96% github.com/kevinburke/rickover/dequeuer.(*Dequeuer).Work
15.35kB 4.50% 67.23% 15.35kB 4.50% net/http.(*persistConn).roundTrip
14.80kB 4.34% 71.56% 15.90kB 4.66% database/sql.(*Rows).awaitDone
9.99kB 2.93% 74.49% 13.55kB 3.97% github.com/lib/pq.NewConnector
8.77kB 2.57% 77.06% 8.77kB 2.57% database/sql.(*Tx).awaitDone
8.28kB 2.43% 79.48% 69.90kB 20.48% github.com/lib/pq.(*Connector).open
8kB 2.34% 81.83% 8kB 2.34% bufio.NewWriterSize
8kB 2.34% 84.17% 8kB 2.34% hash/crc32.slicingMakeTable
5.24kB 1.53% 85.71% 10.23kB 3.00% github.com/lib/pq.(*conn).prepareTo
4kB 1.17% 86.88% 4kB 1.17% runtime.allgadd
I would have expected that number to be a lot higher.
Eventually I realized that I had been running this program with the race detector on - performance isn't super important but correctness is so I wanted to ensure I caught any unsafe reads or writes. Some searching around in the issue tracker revealed this issue - #26813 - which seems to indicate that the race detector might not clean up correctly after defers and recovers.
Recompiling and running the program without the race detector seems to eliminate the leak. It would be great to have confirmation that #26813 is actually the problem, though. github.com/lib/pq uses recover and panic quite heavily to do fast stack unwinds - similar to the JSON package - but as far as I can tell from adding print statements to my test program, it's not ever actually panicking, so maybe just the heavy use of recover
and my use of defer context.cancel()
is enough to set it off.
Look, it is fine if the race detector leaks memory, and I understand running programs in production for days on end with the race detector on is atypical. What bothered me more was how hard this was to track down, and how poorly the tooling did at identifying the actual problem. The only documentation on the runtime performance of the race detector states:
The cost of race detection varies by program, but for a typical program, memory usage may increase by 5-10x and execution time by 2-20x.
I read pretty much every blog post anyone had written on using pprof and none of them mentioned this issue, either.
It would be nice if runtime.MemStats had a section about memory allocated for the race detector, or if there was more awareness/documentation of this issue somewhere. I'm sure there are other possible solutions that I'm probably not aware of because I'm not super familiar
Metadata
Metadata
Assignees
Labels
Type
Projects
Status