Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: long latency in mark assist #40225

Closed
choleraehyq opened this issue Jul 15, 2020 · 10 comments
Closed

runtime: long latency in mark assist #40225

choleraehyq opened this issue Jul 15, 2020 · 10 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge GarbageCollector NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@choleraehyq
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.15beta1 linux/amd64

Does this issue reproduce with the latest release?

Yes, 1.15beta1 is the latest release.

What operating system and processor architecture are you using (go env)?

Linux amd64

What did you do?

Upgrade one production service instance from Go 1.14.4 to Go 1.15beta1. It's a quite complicated proxy-like service, currently, I can't reproduce by a simple test case.

What did you expect to see?

Run well as before.

What did you see instead?

The latency of this service instance sharply increases. I dump a trace of this process and find out after sweep termination, mark assist use nearly all cpus in about 20ms:
截屏2020-07-15 15 01 12
MMU:
截屏2020-07-15 15 16 15
part of GC trace:

gc 1958 @1686.290s 2%: 0.73+32+0.12 ms clock, 17+422/185/0.46+2.9 ms cpu, 1538->1541->788 MB, 1577 MB goal, 24 P
gc 1959 @1687.261s 2%: 0.34+31+0.13 ms clock, 8.3+397/176/0+3.2 ms cpu, 1537->1541->795 MB, 1576 MB goal, 24 P
gc 1960 @1688.215s 2%: 0.60+31+0.12 ms clock, 14+408/176/0+2.8 ms cpu, 1551->1554->796 MB, 1590 MB goal, 24 P
gc 1961 @1689.211s 2%: 0.44+31+0.10 ms clock, 10+409/177/0.036+2.6 ms cpu, 1552->1556->806 MB, 1592 MB goal, 24 P
gc 1962 @1690.182s 2%: 0.78+32+0.16 ms clock, 18+440/185/0.006+3.8 ms cpu, 1572->1578->803 MB, 1613 MB goal, 24 P
gc 1963 @1691.177s 2%: 0.28+39+0.19 ms clock, 6.7+434/213/0.012+4.7 ms cpu, 1566->1569->802 MB, 1606 MB goal, 24 P
gc 1964 @1692.139s 2%: 2.0+37+0.14 ms clock, 48+447/211/0+3.4 ms cpu, 1564->1567->812 MB, 1604 MB goal, 24 P
gc 1965 @1693.148s 2%: 0.50+40+0.18 ms clock, 12+477/224/0+4.3 ms cpu, 1584->1587->821 MB, 1625 MB goal, 24 P
gc 1966 @1694.139s 2%: 0.98+41+0.15 ms clock, 23+405/234/0.20+3.7 ms cpu, 1602->1605->814 MB, 1642 MB goal, 24 P
gc 1967 @1695.135s 2%: 1.2+30+0.13 ms clock, 30+439/170/0+3.2 ms cpu, 1588->1592->799 MB, 1629 MB goal, 24 P
gc 1968 @1696.128s 2%: 2.5+36+0.14 ms clock, 61+463/203/0+3.3 ms cpu, 1559->1561->799 MB, 1598 MB goal, 24 P
gc 1969 @1697.089s 2%: 0.40+37+1.3 ms clock, 9.6+480/206/0+32 ms cpu, 1558->1561->795 MB, 1598 MB goal, 24 P
gc 1970 @1698.056s 2%: 9.5+30+1.1 ms clock, 230+509/222/0+26 ms cpu, 1550->1552->793 MB, 1590 MB goal, 24 P
gc 1971 @1699.022s 2%: 3.0+34+0.17 ms clock, 73+457/196/0+4.1 ms cpu, 1547->1550->793 MB, 1586 MB goal, 24 P
gc 1972 @1699.982s 2%: 0.62+34+0.11 ms clock, 15+446/195/0+2.8 ms cpu, 1546->1549->792 MB, 1586 MB goal, 24 P
gc 1973 @1700.970s 2%: 1.0+35+0.15 ms clock, 24+468/195/0+3.7 ms cpu, 1546->1548->790 MB, 1585 MB goal, 24 P
gc 1974 @1701.951s 2%: 0.64+32+0.098 ms clock, 15+431/175/0+2.3 ms cpu, 1542->1545->792 MB, 1581 MB goal, 24 P
gc 1975 @1702.916s 2%: 1.0+30+0.16 ms clock, 26+378/170/0+3.9 ms cpu, 1545->1548->794 MB, 1584 MB goal, 24 P
gc 1976 @1703.899s 2%: 0.86+31+0.10 ms clock, 20+401/164/0.009+2.4 ms cpu, 1548->1552->799 MB, 1588 MB goal, 24 P
gc 1977 @1704.875s 2%: 0.94+33+0.25 ms clock, 22+449/190/0+6.1 ms cpu, 1559->1562->791 MB, 1599 MB goal, 24 P
gc 1978 @1705.841s 2%: 0.25+34+0.19 ms clock, 6.1+439/193/0+4.7 ms cpu, 1544->1547->784 MB, 1583 MB goal, 24 P
gc 1979 @1706.801s 2%: 1.3+39+0.16 ms clock, 32+427/210/0+3.8 ms cpu, 1529->1533->785 MB, 1568 MB goal, 24 P
gc 1980 @1707.770s 2%: 1.0+31+0.19 ms clock, 24+355/179/0+4.7 ms cpu, 1532->1535->785 MB, 1571 MB goal, 24 P
gc 1981 @1708.732s 2%: 0.64+31+0.10 ms clock, 15+454/180/0+2.5 ms cpu, 1532->1535->786 MB, 1571 MB goal, 24 P
gc 1982 @1709.702s 2%: 0.30+27+0.097 ms clock, 7.2+357/151/0+2.3 ms cpu, 1534->1537->789 MB, 1573 MB goal, 24 P
gc 1983 @1710.669s 2%: 0.60+27+0.10 ms clock, 14+350/147/0+2.5 ms cpu, 1540->1543->789 MB, 1579 MB goal, 24 P
gc 1984 @1711.640s 2%: 0.66+28+0.27 ms clock, 16+405/161/0+6.5 ms cpu, 1538->1542->788 MB, 1578 MB goal, 24 P
gc 1985 @1712.588s 2%: 16+22+0.16 ms clock, 391+441/216/0.068+3.8 ms cpu, 1537->1540->789 MB, 1576 MB goal, 24 P
gc 1986 @1713.566s 2%: 0.35+29+0.13 ms clock, 8.5+419/170/0+3.2 ms cpu, 1539->1542->791 MB, 1578 MB goal, 24 P
gc 1987 @1714.545s 2%: 0.45+28+0.088 ms clock, 10+371/160/0.24+2.1 ms cpu, 1543->1545->790 MB, 1582 MB goal, 24 P
gc 1988 @1715.523s 2%: 0.37+29+0.12 ms clock, 8.9+388/166/0.51+3.1 ms cpu, 1540->1544->792 MB, 1580 MB goal, 24 P
gc 1989 @1716.481s 2%: 0.29+31+0.13 ms clock, 7.0+420/178/0.10+3.1 ms cpu, 1545->1548->791 MB, 1584 MB goal, 24 P
gc 1990 @1717.439s 2%: 0.36+33+0.14 ms clock, 8.7+425/191/0.064+3.4 ms cpu, 1543->1546->791 MB, 1582 MB goal, 24 P
gc 1991 @1718.417s 2%: 0.26+30+0.21 ms clock, 6.3+346/174/0+5.2 ms cpu, 1543->1546->790 MB, 1582 MB goal, 24 P
gc 1992 @1719.391s 2%: 1.6+34+0.18 ms clock, 40+415/178/0+4.3 ms cpu, 1542->1544->790 MB, 1581 MB goal, 24 P
gc 1993 @1720.356s 2%: 0.27+33+0.37 ms clock, 6.5+414/180/0.40+8.9 ms cpu, 1542->1545->791 MB, 1581 MB goal, 24 P
gc 1994 @1721.330s 2%: 0.29+33+0.16 ms clock, 7.1+470/194/0.067+3.9 ms cpu, 1544->1547->793 MB, 1583 MB goal, 24 P
gc 1995 @1722.308s 2%: 0.33+36+0.14 ms clock, 7.9+501/209/0+3.4 ms cpu, 1547->1551->790 MB, 1586 MB goal, 24 P
gc 1996 @1723.280s 2%: 1.8+36+0.14 ms clock, 45+432/206/0+3.4 ms cpu, 1542->1545->793 MB, 1581 MB goal, 24 P
gc 1997 @1724.249s 2%: 0.37+35+0.82 ms clock, 8.9+380/195/0+19 ms cpu, 1547->1550->795 MB, 1586 MB goal, 24 P
gc 1998 @1725.216s 2%: 0.29+34+0.10 ms clock, 7.1+402/190/0.22+2.4 ms cpu, 1551->1553->793 MB, 1591 MB goal, 24 P
gc 1999 @1726.207s 2%: 0.37+34+0.13 ms clock, 8.8+411/196/0+3.3 ms cpu, 1547->1550->788 MB, 1586 MB goal, 24 P
gc 2000 @1727.180s 2%: 0.29+28+0.10 ms clock, 6.9+391/155/0.070+2.5 ms cpu, 1538->1542->794 MB, 1577 MB goal, 24 P
gc 2001 @1728.144s 2%: 0.71+24+0.093 ms clock, 17+351/138/0+2.2 ms cpu, 1548->1551->805 MB, 1588 MB goal, 24 P
gc 2002 @1729.131s 2%: 0.33+40+0.19 ms clock, 8.1+501/226/0+4.5 ms cpu, 1572->1576->809 MB, 1611 MB goal, 24 P
gc 2003 @1730.113s 2%: 0.33+33+0.16 ms clock, 8.0+431/190/0+3.8 ms cpu, 1578->1581->795 MB, 1618 MB goal, 24 P
gc 2004 @1731.079s 2%: 1.3+36+0.18 ms clock, 31+478/203/0+4.4 ms cpu, 1551->1555->796 MB, 1591 MB goal, 24 P
gc 2005 @1732.043s 2%: 0.36+32+0.11 ms clock, 8.8+454/189/0+2.6 ms cpu, 1552->1555->795 MB, 1592 MB goal, 24 P
gc 2006 @1733.020s 2%: 0.29+33+0.31 ms clock, 7.1+386/187/0+7.4 ms cpu, 1552->1555->795 MB, 1591 MB goal, 24 P
gc 2007 @1733.966s 2%: 0.33+40+0.10 ms clock, 8.0+498/230/0.12+2.5 ms cpu, 1551->1555->796 MB, 1591 MB goal, 24 P
gc 2008 @1734.935s 2%: 3.3+32+0.13 ms clock, 81+458/180/0+3.1 ms cpu, 1553->1556->796 MB, 1592 MB goal, 24 P
gc 2009 @1735.898s 2%: 15+24+0.18 ms clock, 371+515/220/0+4.4 ms cpu, 1553->1556->794 MB, 1592 MB goal, 24 P
gc 2010 @1736.868s 2%: 0.50+32+0.15 ms clock, 12+448/181/0+3.8 ms cpu, 1549->1552->795 MB, 1588 MB goal, 24 P
gc 2011 @1737.833s 2%: 1.6+33+0.10 ms clock, 40+384/186/0.55+2.6 ms cpu, 1551->1553->791 MB, 1590 MB goal, 24 P
gc 2012 @1738.805s 2%: 0.73+41+0.18 ms clock, 17+550/233/0+4.3 ms cpu, 1543->1546->795 MB, 1582 MB goal, 24 P
@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 15, 2020
@ALTree
Copy link
Member

ALTree commented Jul 15, 2020

cc @aclements @mknyszek

@mknyszek
Copy link
Contributor

mknyszek commented Jul 15, 2020

@choleraehyq thanks for the detailed report!

I'll take a look through changes that were made in the runtime that could be relevant, since I can't think of what could be causing this off the top of my head.

Judging by the gctrace, GOGC is 100. But, my main observation is that GCs frequently seem to start too early. For instance, look at GC#2000. At the end of that GC we had 794 MiB live, setting the goal for GC#2001 at 1588 MiB. However, if you look at when a GC started and ended in that cycle, it started at heap size 1548 MiB and ended at 1551 MiB. This pattern is consistent across nearly all pairs of GCs in the trace you posted. This pattern indicates to me that the pacer thinks there's a lot of work to do and has driven the GCs to start earlier, but they're ending earlier than expected. Can you easily produce a comparable gctrace for Go 1.14.4? It would be good to see how the behavior has changed (aside from just mark assist really taking over CPU cycles during GC). Also running with GODEBUG=gctrace=1,gcpacertrace=1 would give us more insight into where the pacer is at during all of this.

My other observation is that we're spending a comparatively high amount of CPU time in mark assists vs. background marking (like 410 vs. 180 on (a very very rough guess) average).

@choleraehyq
Copy link
Contributor Author

@mknyszek Thanks for your response. I'll produce GODEBUG=gctrace=1,gcpacertrace=1 for Go 1.14.4 and Go1.15beta1 later.

@choleraehyq
Copy link
Contributor Author

@mknyszek
1.15beta1:

gc 26471 @17546.913s 1%: 0.14+9.6+0.13 ms clock, 3.3+107/53/0.017+3.2 ms cpu, 789->792->405 MB, 809 MB goal, 24 P
pacer: sweep done at heap size 439MB; allocated 34MB during sweep; swept 101740 pages at +2.525911e-004 pages/byte
pacer: assist ratio=+5.194823e+000 (scan 210 MB in 790->810 MB) workers=6++0.000000e+000
pacer: H_m_prev=425088328 h_t=+9.500000e-001 H_T=828922239 h_a=+9.587276e-001 H_a=832632240 h_g=+1.000000e+000 H_g=850176656 u_a=+6.515289e-001 u_g=+3.000000e-001 W_a=33715288 goalΔ=+5.000000e-002 actualΔ=+8.727599e-003 u_a/u_g=+2.171763e+000
gc 26472 @17547.569s 1%: 0.11+13+0.13 ms clock, 2.7+135/67/0.72+3.1 ms cpu, 790->794->410 MB, 810 MB goal, 24 P
pacer: sweep done at heap size 447MB; allocated 37MB during sweep; swept 101956 pages at +2.501523e-004 pages/byte
pacer: assist ratio=+5.233032e+000 (scan 212 MB in 800->820 MB) workers=6++0.000000e+000
pacer: H_m_prev=430130792 h_t=+9.500000e-001 H_T=838755044 h_a=+9.562688e-001 H_a=841451432 h_g=+1.000000e+000 H_g=860261584 u_a=+6.476701e-001 u_g=+3.000000e-001 W_a=32354528 goalΔ=+5.000000e-002 actualΔ=+6.268762e-003 u_a/u_g=+2.158900e+000
gc 26473 @17548.270s 1%: 0.14+11+0.11 ms clock, 3.5+106/61/0.45+2.7 ms cpu, 800->802->404 MB, 820 MB goal, 24 P
pacer: sweep done at heap size 441MB; allocated 37MB during sweep; swept 103090 pages at +2.564391e-004 pages/byte
pacer: assist ratio=+5.187073e+000 (scan 208 MB in 789->809 MB) workers=6++0.000000e+000
pacer: H_m_prev=424267784 h_t=+9.500000e-001 H_T=827322178 h_a=+9.556830e-001 H_a=829733288 h_g=+1.000000e+000 H_g=848535568 u_a=+6.504157e-001 u_g=+3.000000e-001 W_a=32310320 goalΔ=+5.000000e-002 actualΔ=+5.682989e-003 u_a/u_g=+2.168052e+000
gc 26474 @17548.918s 1%: 0.14+11+0.10 ms clock, 3.4+114/64/0+2.4 ms cpu, 789->791->405 MB, 809 MB goal, 24 P
pacer: sweep done at heap size 443MB; allocated 38MB during sweep; swept 101644 pages at +2.525618e-004 pages/byte
pacer: assist ratio=+5.184743e+000 (scan 209 MB in 789->810 MB) workers=6++0.000000e+000
pacer: H_m_prev=424737464 h_t=+9.500000e-001 H_T=828238054 h_a=+9.580154e-001 H_a=831642504 h_g=+1.000000e+000 H_g=849474928 u_a=+6.816478e-001 u_g=+3.000000e-001 W_a=32395728 goalΔ=+5.000000e-002 actualΔ=+8.015420e-003 u_a/u_g=+2.272159e+000
gc 26475 @17549.568s 1%: 0.15+11+0.12 ms clock, 3.7+125/62/0.23+2.9 ms cpu, 789->793->405 MB, 810 MB goal, 24 P
pacer: sweep done at heap size 443MB; allocated 37MB during sweep; swept 101825 pages at +2.529064e-004 pages/byte
pacer: assist ratio=+5.206536e+000 (scan 210 MB in 790->810 MB) workers=6++0.000000e+000
pacer: H_m_prev=424913632 h_t=+9.500000e-001 H_T=828581582 h_a=+9.583943e-001 H_a=832148440 h_g=+1.000000e+000 H_g=849827264 u_a=+6.435652e-001 u_g=+3.000000e-001 W_a=32494416 goalΔ=+5.000000e-002 actualΔ=+8.394312e-003 u_a/u_g=+2.145217e+000
gc 26476 @17550.240s 1%: 0.17+10+0.17 ms clock, 4.2+102/57/0+4.1 ms cpu, 790->793->405 MB, 810 MB goal, 24 P
pacer: sweep done at heap size 440MB; allocated 34MB during sweep; swept 102081 pages at +2.532855e-004 pages/byte
pacer: assist ratio=+5.187335e+000 (scan 209 MB in 791->811 MB) workers=6++0.000000e+000
pacer: H_m_prev=425343136 h_t=+9.500000e-001 H_T=829419115 h_a=+9.592915e-001 H_a=833371192 h_g=+1.000000e+000 H_g=850686272 u_a=+5.246262e-001 u_g=+3.000000e-001 W_a=33006632 goalΔ=+5.000000e-002 actualΔ=+9.291502e-003 u_a/u_g=+1.748754e+000
gc 26477 @17550.907s 1%: 0.15+16+0.11 ms clock, 3.7+113/89/0.41+2.8 ms cpu, 791->794->407 MB, 811 MB goal, 24 P
pacer: sweep done at heap size 452MB; allocated 45MB during sweep; swept 102015 pages at +2.522713e-004 pages/byte
pacer: assist ratio=+5.203425e+000 (scan 210 MB in 793->814 MB) workers=6++0.000000e+000
pacer: H_m_prev=426773288 h_t=+9.500000e-001 H_T=832207911 h_a=+9.552392e-001 H_a=834443872 h_g=+1.000000e+000 H_g=853546576 u_a=+7.531415e-001 u_g=+3.000000e-001 W_a=32474776 goalΔ=+5.000000e-002 actualΔ=+5.239223e-003 u_a/u_g=+2.510472e+000
gc 26478 @17551.574s 1%: 0.17+10+0.10 ms clock, 4.1+128/58/0+2.4 ms cpu, 793->795->403 MB, 814 MB goal, 24 P
pacer: sweep done at heap size 440MB; allocated 36MB during sweep; swept 102118 pages at +2.544717e-004 pages/byte
pacer: assist ratio=+5.213050e+000 (scan 209 MB in 787->807 MB) workers=6++0.000000e+000
pacer: H_m_prev=423518616 h_t=+9.500000e-001 H_T=825861301 h_a=+9.573207e-001 H_a=828961760 h_g=+1.000000e+000 H_g=847037232 u_a=+6.936951e-001 u_g=+3.000000e-001 W_a=32706768 goalΔ=+5.000000e-002 actualΔ=+7.320714e-003 u_a/u_g=+2.312317e+000
gc 26479 @17552.243s 1%: 0.17+11+0.12 ms clock, 4.1+125/65/0+2.9 ms cpu, 787->790->406 MB, 807 MB goal, 24 P
pacer: sweep done at heap size 441MB; allocated 35MB during sweep; swept 101713 pages at +2.521361e-004 pages/byte
pacer: assist ratio=+5.184178e+000 (scan 209 MB in 791->812 MB) workers=6++0.000000e+000
pacer: H_m_prev=425740744 h_t=+9.500000e-001 H_T=830194450 h_a=+9.596009e-001 H_a=834281960 h_g=+1.000000e+000 H_g=851481488 u_a=+6.086897e-001 u_g=+3.000000e-001 W_a=32872696 goalΔ=+5.000000e-002 actualΔ=+9.600935e-003 u_a/u_g=+2.028966e+000
gc 26480 @17552.894s 1%: 0.21+13+0.099 ms clock, 5.2+117/75/0+2.3 ms cpu, 791->795->404 MB, 812 MB goal, 24 P
pacer: sweep done at heap size 446MB; allocated 41MB during sweep; swept 102090 pages at +2.537692e-004 pages/byte
pacer: assist ratio=+5.190716e+000 (scan 208 MB in 789->809 MB) workers=6++0.000000e+000
pacer: H_m_prev=424571784 h_t=+9.500000e-001 H_T=827914978 h_a=+9.571176e-001 H_a=830936904 h_g=+1.000000e+000 H_g=849143568 u_a=+5.606337e-001 u_g=+3.000000e-001 W_a=32750248 goalΔ=+5.000000e-002 actualΔ=+7.117584e-003 u_a/u_g=+1.868779e+000

1.14.4

gc 27831 @17894.457s 1%: 0.069+9.3+0.13 ms clock, 1.6+75/52/43+3.1 ms cpu, 774->777->398 MB, 794 MB goal, 24 P
pacer: sweep done at heap size 433MB; allocated 34MB during sweep; swept 99747 pages at +2.519551e-004 pages/byte
pacer: assist ratio=+5.289955e+000 (scan 209 MB in 777->796 MB) workers=6++0.000000e+000
pacer: H_m_prev=417832176 h_t=+9.500000e-001 H_T=814772743 h_a=+9.569395e-001 H_a=817672280 h_g=+1.000000e+000 H_g=835664352 u_a=+6.598878e-001 u_g=+3.000000e-001 W_a=32201064 goalΔ=+5.000000e-002 actualΔ=+6.939477e-003 u_a/u_g=+2.199626e+000
gc 27832 @17895.117s 1%: 0.11+10+0.13 ms clock, 2.8+102/57/0+3.2 ms cpu, 777->779->399 MB, 796 MB goal, 24 P
pacer: sweep done at heap size 431MB; allocated 32MB during sweep; swept 100263 pages at +2.525574e-004 pages/byte
pacer: assist ratio=+5.261804e+000 (scan 209 MB in 779->799 MB) workers=6++0.000000e+000
pacer: H_m_prev=418989008 h_t=+9.500000e-001 H_T=817028565 h_a=+9.589195e-001 H_a=820765752 h_g=+1.000000e+000 H_g=837978016 u_a=+7.430334e-001 u_g=+3.000000e-001 W_a=32348800 goalΔ=+5.000000e-002 actualΔ=+8.919533e-003 u_a/u_g=+2.476778e+000
gc 27833 @17895.796s 1%: 0.081+10+0.086 ms clock, 1.9+126/59/0.25+2.0 ms cpu, 779->782->400 MB, 799 MB goal, 24 P
pacer: sweep done at heap size 433MB; allocated 32MB during sweep; swept 100573 pages at +2.525820e-004 pages/byte
pacer: assist ratio=+5.288461e+000 (scan 210 MB in 781->801 MB) workers=6++0.000000e+000
pacer: H_m_prev=420240192 h_t=+9.500000e-001 H_T=819468374 h_a=+9.560798e-001 H_a=822023352 h_g=+1.000000e+000 H_g=840480384 u_a=+7.193190e-001 u_g=+3.000000e-001 W_a=32212664 goalΔ=+5.000000e-002 actualΔ=+6.079803e-003 u_a/u_g=+2.397730e+000
gc 27834 @17896.491s 1%: 0.089+10+0.10 ms clock, 2.1+116/58/0+2.5 ms cpu, 781->783->400 MB, 801 MB goal, 24 P
pacer: sweep done at heap size 436MB; allocated 35MB during sweep; swept 100785 pages at +2.530934e-004 pages/byte
pacer: assist ratio=+5.264838e+000 (scan 209 MB in 781->801 MB) workers=6++0.000000e+000
pacer: H_m_prev=420274976 h_t=+9.500000e-001 H_T=819536203 h_a=+9.582013e-001 H_a=822983000 h_g=+1.000000e+000 H_g=840549952 u_a=+6.268437e-001 u_g=+3.000000e-001 W_a=32558864 goalΔ=+5.000000e-002 actualΔ=+8.201290e-003 u_a/u_g=+2.089479e+000
gc 27835 @17897.158s 1%: 0.12+11+0.10 ms clock, 3.0+101/63/0+2.4 ms cpu, 781->784->399 MB, 801 MB goal, 24 P
pacer: sweep done at heap size 438MB; allocated 39MB during sweep; swept 100706 pages at +2.538344e-004 pages/byte
pacer: assist ratio=+5.265186e+000 (scan 209 MB in 778->798 MB) workers=6++0.000000e+000
pacer: H_m_prev=418723736 h_t=+9.500000e-001 H_T=816511285 h_a=+9.558774e-001 H_a=818972296 h_g=+1.000000e+000 H_g=837447472 u_a=+7.413076e-001 u_g=+3.000000e-001 W_a=32383576 goalΔ=+5.000000e-002 actualΔ=+5.877409e-003 u_a/u_g=+2.471025e+000
gc 27836 @17897.841s 1%: 0.25+10+0.12 ms clock, 6.0+122/57/0+2.9 ms cpu, 778->781->399 MB, 798 MB goal, 24 P
pacer: sweep done at heap size 434MB; allocated 34MB during sweep; swept 100310 pages at +2.524687e-004 pages/byte
pacer: assist ratio=+5.262858e+000 (scan 209 MB in 779->799 MB) workers=6++0.000000e+000
pacer: H_m_prev=419331768 h_t=+9.500000e-001 H_T=817696947 h_a=+9.569871e-001 H_a=820626864 h_g=+1.000000e+000 H_g=838663536 u_a=+7.384108e-001 u_g=+3.000000e-001 W_a=32127416 goalΔ=+5.000000e-002 actualΔ=+6.987108e-003 u_a/u_g=+2.461369e+000
gc 27837 @17898.497s 1%: 0.099+9.4+0.066 ms clock, 2.3+112/53/3.6+1.6 ms cpu, 779->782->398 MB, 799 MB goal, 24 P
pacer: sweep done at heap size 430MB; allocated 32MB during sweep; swept 100490 pages at +2.536468e-004 pages/byte
pacer: assist ratio=+5.246158e+000 (scan 207 MB in 777->797 MB) workers=6++0.000000e+000
pacer: H_m_prev=418136288 h_t=+9.500000e-001 H_T=815365761 h_a=+9.565709e-001 H_a=818113304 h_g=+1.000000e+000 H_g=836272576 u_a=+6.429185e-001 u_g=+3.000000e-001 W_a=32098272 goalΔ=+5.000000e-002 actualΔ=+6.570926e-003 u_a/u_g=+2.143062e+000
gc 27838 @17899.159s 1%: 0.11+11+0.12 ms clock, 2.8+113/55/0.85+2.9 ms cpu, 777->780->397 MB, 797 MB goal, 24 P
pacer: sweep done at heap size 433MB; allocated 36MB during sweep; swept 100103 pages at +2.534873e-004 pages/byte
pacer: assist ratio=+5.293898e+000 (scan 208 MB in 775->794 MB) workers=6++0.000000e+000
pacer: H_m_prev=416791608 h_t=+9.500000e-001 H_T=812743635 h_a=+9.582346e-001 H_a=816175768 h_g=+1.000000e+000 H_g=833583216 u_a=+6.642465e-001 u_g=+3.000000e-001 W_a=32246360 goalΔ=+5.000000e-002 actualΔ=+8.234649e-003 u_a/u_g=+2.214155e+000
gc 27839 @17899.835s 1%: 0.10+10+0.087 ms clock, 2.5+109/61/0.30+2.0 ms cpu, 775->778->399 MB, 794 MB goal, 24 P
pacer: sweep done at heap size 432MB; allocated 33MB during sweep; swept 100084 pages at +2.523202e-004 pages/byte
pacer: assist ratio=+5.276543e+000 (scan 210 MB in 778->798 MB) workers=6++0.000000e+000
pacer: H_m_prev=418635096 h_t=+9.500000e-001 H_T=816338437 h_a=+9.571561e-001 H_a=819334224 h_g=+1.000000e+000 H_g=837270192 u_a=+7.323980e-001 u_g=+3.000000e-001 W_a=32012560 goalΔ=+5.000000e-002 actualΔ=+7.156081e-003 u_a/u_g=+2.441327e+000
gc 27840 @17900.532s 1%: 0.074+9.3+0.10 ms clock, 1.7+109/52/3.8+2.5 ms cpu, 778->781->398 MB, 798 MB goal, 24 P
pacer: sweep done at heap size 431MB; allocated 33MB during sweep; swept 100452 pages at +2.536920e-004 pages/byte
pacer: assist ratio=+5.290418e+000 (scan 209 MB in 777->797 MB) workers=6++0.000000e+000
pacer: H_m_prev=417904296 h_t=+9.500000e-001 H_T=814913377 h_a=+9.583837e-001 H_a=818416968 h_g=+1.000000e+000 H_g=835808592 u_a=+6.339392e-001 u_g=+3.000000e-001 W_a=32590352 goalΔ=+5.000000e-002 actualΔ=+8.383716e-003 u_a/u_g=+2.113131e+000

@mknyszek
Copy link
Contributor

Thanks for the quick turnaround!

Ah, OK. I see what's going on re: early GCs. GCs appear to be relatively cheap for your application, so we're maxing out on how late we can start a GC. This happens in both 1.14 and 1.15, so that on its own shouldn't be the problem. Otherwise, at first glance, these two traces look very similar. I'll take a closer look.

@mknyszek
Copy link
Contributor

Sorry for the delayed reply.

Taking a closer look at both the traces, I don't see any significant differences that would lead to what you see in the execution trace. CPU time spent in assists and background GC are similar in both, though there's a lot more time spent in assists than in background GC. This suggests to me that your application is allocating fairly heavily, but there isn't much work for the GC to do (either there are just a few big allocations, there isn't much depth in the points-to graph, or most of the heap just doesn't survive). Maybe one difference is that u_a seems a little higher in Go 1.14, suggesting that perhaps assists are more effective in Go 1.14? But, it fluctuates a lot so it's hard to tell. It's notable though that u_a (the actual utilization) is 2x higher than the goal utilization (u_g) which is kind of surprising, and does align with all the time spent in assists. I feel like there may be a broader pacing bug here? The GC should be trying to bring down the utilization in both 1.14.4 and 1.15 by making the GCs longer (see "Goal 2." under "Optimization Goals" in https://docs.google.com/document/d/1wmjrocXIWTr1JxU-3EQBI6BK6KgtiFArkG47XK73xIQ/edit#).

@randall77 @aclements any thoughts on this?

Just to be totally clear, is that execution trace screencap you included reproducible in Go 1.14 or is this issue totally new? The data in each GC trace suggests that this should be reproducible.

@choleraehyq
Copy link
Contributor Author

@mknyszek I take a look at the execution trace of Go1.14, it's indeed quite similar to Go1.15. But my application latency is really higher using Go1.15, with the same application code and workload. I'll dig into it later.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@mknyszek mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
nvanbenschoten added a commit to nvanbenschoten/go that referenced this issue Sep 19, 2022
At a high level, the runtime garbage collector can impact user goroutine
latency in two ways. The first is that it pauses all goroutines during
its stop-the-world sweep termination and mark termination phases. The
second is that it backpressures memory allocations by instructing user
goroutines to assist with scanning and marking in response to a high
allocation rate.

There is plenty of observability into the first of these sources of
user-visible latency. There is significantly less observability into the
second. As a result, it is often more difficult to diagnose latency
problems due to over-assist (e.g. golang#14812, golang#27732, golang#40225). To this
point, the ways to determine that GC assist was a problem were to use
execution tracing or to use GODEBUG=gctrace=1 tracing, neither of which
is easy to access programmatically in a running system. CPU profiles
also give some insight, but are rarely as instructive as one might
expect because heavy GC assist time is scattered across a profile.
Notice even in https://tip.golang.org/doc/gc-guide, the guidance on
recognizing and remedying performance problems due to GC assist is
sparse.

This commit adds a counter to the MemStats and GCStats structs called
AssistTotalNs, which tracks the cumulative nanoseconds in GC assist
since the program started. This provides a new form of observability
into GC assist delays, and one that can be manipulated programmatically.

There's more work to be done in this area. This feels like a reasonable
first step.
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/431877 mentions this issue: runtime: export total GC Assist ns in MemStats and GCStats

@mknyszek
Copy link
Contributor

Mark assists should have gone down significantly in a variety of cases thanks to the pacer changes in Go 1.18 (specifically, the target mark assist percentage is now 0). If you have a chance @choleraehyq, please double-check! I'm inclined to mark this as resolved for now.

@mknyszek mknyszek added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 19, 2022
@choleraehyq
Copy link
Contributor Author

@mknyszek I don't find any latency problem now, thanks!

Repository owner moved this from Triage Backlog to Done in Go Compiler / Runtime Sep 22, 2022
@golang golang locked and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge GarbageCollector NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants