-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: CL 501976 performance and jitter regression #65647
Comments
I need some more specific instructions to reproduce.
That last build produces some strange results:
Is there some |
The full project uses third-party C libraries. They aren't used when running the performance measurement but are needed for compilation. (The libraries are used for the GUI but we don't need a GUI when doing measurement) I'll prepare a minimal project tree that only does the performance measurement so as to make it easier to compile and test.
The Cosmcark.bin file is in https://github.com/JetSetIlly/gopher2600_performance_profiles |
@JetSetIlly since you already have everything setup can you benchmark |
@randall77 If you clone this project now and run 'go build .' it should compile without requiring any external C libraries (ie. no SDL or OpenGL). There's also no need to worry about the Cosmcark.bin https://github.com/JetSetIlly/gopher2600_performance_profiles The measurements are slightly different to the previous results but the point is that the measurement for 1.22.0 is consistently and significantly lower than 1.21.7 @Jorropo I tried that and the results are the same/similar to 1.21.7. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I am not seeing any significant performance differences between 1.21.6 and 1.22.0 (and tip, for that matter). 1.21.6
1.22.0
model name : 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz |
@randall77 Thanks for looking at this. If we take an average of the runs for 1.21.6 and 1.22.0 then there's a drop of 1.86% in performance. I would say that's significant. At least, I've never seen drops in performance like this between go versions before. I would agree that it's hard to get consistent measurements, but there is a consistency in a drop between the two versions. On my hardware the drop is a lot larger (Intel(R) Core(TM) i3-3225 CPU @ 3.30GHz) as previously stated. The test runs for 5 minutes but it doesn't need to run that long. On line 22 of main.go, there's a string specifying the duration. It's currently "5m". You can always lower that if it makes it easier. If there's anything I can do to reduce the benchmark noise let me know. |
Sure, but the run-to-run variance can be as much as 10%. It's not clear to me that the 1.86% drop is real. We'd need a lot more runs to have any confidence that the 1.86% drop isn't just noise.
This is a possible cause. That chip is quite a bit older than the one I'm using. (~12 years vs ~4 years old.)
It would help a lot with investigations if we could get the benchmark variance down. Can you make any progress on that front? The benchmark appears single-threaded and doesn't allocate much, so I'm kind of confused as to where the variance might be coming from. (It is even possible the source of variance and the source of slowdown are related somehow.) |
I understand. My view of the code is very different to yours of course, so I'm perhaps more sensitive to the variation. I'll do some more runs here to assure myself that there is a real effect here.
This CPU does have POPCNT
This is interesting. If I run with GOMAXPROCS=1 then the variation between the versions goes away. I'll do a lot more runs/tests here but with a small sample I can see very clearly that with v1.22.0 GOMAXPROCS=1 performs the same as 1.27.1 and GOMAXPROCS=n (ie. not set) the performance is lower by an average of 6% With 1.27.1, the performance is the same regardless of the GOMAXPROCS value |
I've updated the test program to output a rating every second, running for a total of one minute. This gives a clearer picture of what's happening I think. An example run of the four variations, shows that 1.21.7 is consistent regardless of GOMAXPROCS but that with 1.22.0 there is a significant difference. Table of results
|
Old test program
go1.22.0
My machine is R7 7840hs and GOAMD64=v3. |
@JetSetIlly I got this
|
@qiulaidongfeng Nice! I didn't know about benchstat. These are my results
So, at first flush it looks like the variation is more pronounced with a less powerful CPU And when GOMAXPROC is set to 1 (on the same hardware)
|
i want to say thanks for golang 1.22, i see some significant performance improvement, notably in memory usage. though CPU seemed to be a little up but memory management wise seemed a great improvement. to me, the consistent memory is more important than the cpu for now. this is super good, making the software extremely stable. thank you all for the hard work. by the way, please open up a discussion for people to praise golang especially when it comes to some optimization as good as this. the open source community can really have a place to show our appreciation to the hard work you guys have done. it's impressive, it's great and software dev has never been this nice since golang's development. the trajectory is extremely future optimistic. i wish it can be done on all devices (e.g. mobile / iot) and ai; and this will be the only goto language for everyone. the founders who thought on setting the foundation for this language is truly amazing. i've never done a language more productive, manageable, scalable and maintainable than golang. been a user since 1.6 and great to see it becoming more set with the module dev management and stuff. |
@ouvaa You've made me very sad. In case it wasn't obvious, I'm not complaining about the development of Go. I hope I'm helping to improve it by identifying an area that has impacted my use case. If it's impacted me then it is likely impacting other people. edit: I also hope my reply there didn't come across as too confrontational. But I'm feeling a bit sensitive because the future of a project that I've spent a lot of time on is now uncertain. Love and Peace ❤️ (and needless to say I fully agree with your recognition of the Go team's effort) |
I tried GOEXPERIMENT=noallocheaders ,no change in performance. |
@JetSetIlly Can you have your test program output the format that benchstat accepts, and then use benchstat and git bisect (good=go1.21.0 bad=go1.22.0) ? |
I can do that. Just to clarify on the amount of tests. If I run each test for one minute and output data every second, that's 60 data points. Each of those data points represents millions of iterations of the emulation. Not every code path will be hit millions of times, but certainly in the the hundreds of thousands. So each benchstat file will contain 60 datapoints. I usually use |
See https://pkg.go.dev/testing#hdr-Benchmarks |
I understand. I've turned the data around such that the iterator column shows the number of frames generated in a second and the value column shows the number of milliseconds per frame I've run the bisect and found ad94306 to be the first "bad" commit. benchstat_devel go1.22-ad943066f6 Thu Jul 20 21:39:57 2023 +0000.txt The last "good" commit is dd5db4d benchstat_devel go1.22-dd5db4df56 Thu Jul 20 21:39:53 2023 +0000.txt To double check, I compared 1.27.1 with the last "good" commit dd5db4d and it shows no variation Comparing 1.27.1 with ad94306 (the first "bad commit) does show a variation |
ad94306 was a simple enough change so I've removed that addition from gotip and performance is close enough to go1.21.7 to be unnoticeable. For certain, the "jitter" has gone in the measurements, which was the cause for the perceived drop in performance. |
@JetSetIlly Are you seeing differences without PGO? I've run your reproducer on my machine (thanks for making it easy to use!), and this is what I see:
i.e., with PGO disabled ( Here is the same comparison compared to the 1.21.7 with PGO baseline:
This seems in line with https://go.dev/doc/pgo#source-stability. i.e., when upgrading from 1.21 to 1.22, a lot of code in the standard library has changed, so some portions of the PGO profile likely no longer apply and we see a slight degradation in performance that recovers once we collect a fresh profile. If you are seeing differences without PGO, then there may be more to investigate. |
@prattmic Yes. I am seeing differences without PGO
The +5.34% is important but the ± 3% figure is perhaps more important. This figure is showing the large variation in data points. This means that the time it takes to generate a frame is less predictable. This is the result of comparing the two versions with PGO enabled (new PGO file generated with v1.22.0)
|
A note on my comment about the time taken to generate a frame being "less predictable". This isn't a problem when I impose a realistic FPS cap. (ie. if I cap generation at 60fps then the benchmarks are equally good for 1.21.7 and 1.22.0) It's only when the program is putting the CPU under intense load do the differences appear. |
Interesting, that is quite different from what I am seeing. The regression and variance disappears when you revert https://go.dev/cl/501976? Could you share runtime execution traces from these runs, both with and without https://go.dev/cl/501976? |
Without PGO:
Running with (newly created) PGO:
The combination of a new PGO file with the commit reversion gives excellent performance.
|
Ah, by execution traces, I meant traces from |
Sorry. My mistake. |
Sometimes I think it's better to view data visually. This show chart shows the difference very clearly I think. 1.22.0 lots of jitter. https://docs.google.com/spreadsheets/d/1DUTT7dHXPRFo5nf4-bhiNk2aHkd7MP4gtx41jWPtAUA/edit?usp=sharing |
Thanks for the traces. Here are what 5s regions from each look like. Without https://go.dev/cl/501976: With https://go.dev/cl/501976: In the first case, we can see that work tended to stay on the same thread for almost exactly 1s at a time (I'm actually not sure why it is quite so consistent). Afterwards, it is much less consistent, and the average time to stay in one place is noticeably lower. These migrations are described in #65694. Gosched puts the G on the global run queue, so it makes sense that it is more likely to move to a different P/M after Gosched. An explicit wakep in Gosched will ensure that there is another M awake, which will be racing to run the G. This is particularly unfortunate in your case, since there are idle Ps and thus there is no need to preempt at all. Last year, @felixge proposed https://go.dev/cl/460541 to avoid unnecessary preemption when there are idle Ps. If you can apply that CL (might not apply cleanly anymore), I'd be curious to hear the impact. I suspect it would eliminate you jitter and regression. |
@prattmic Thanks for the feedback. I'll take a look more closely later this evening. In the meantime, I've updated the testing tool so it's easier to use (command line arguments etc.) I've also run the three versions of interest for 5 minutes (as opposed to 1 minute). We're probably past this high level analysis by now but I think it's interesting. Benchstat results
The large average drop is visible but the size of the jitter isn't obvious from these figures (at least to me) so I've prepared another graph on Google Sheets. Cheers |
I forked from v1.22.0 and applied 460541. However, there is no change from v1.22.0
I've run it twice and double-checked the patching/compilation process but definitely no effect. This is the trace file I also applied 460541 to a fork with the previously applied regression. In that case there's no difference to the fork with the regression only. |
This change to 1.22.0 produces good results:
|
Huh, interesting that https://go.dev/cl/460541 doesn't resolve the regression. It does resolve the more frequent thread migrations, with the trace once again looking like the "good" case: That leaves me a bit confused about where the regression is coming from. wakep wakes another thread, but if that thread doesn't steal our work, it should have a pretty minimal impact. |
@prattmic 460541 definitely helps in some way. With that patch the drops in performance are no longer as severe as they are in v1.22.0. That said, there is still a general overall drop. |
@prattmic Any progress on this? I'm happy to provide more data or testing if required. |
I've added a simple "smooth" program that compares two benchstat files. This might be more helpful than the graphical representation I provided in the Google sheets. These are the pertinent comparisons: 1.21.7 vs 1.22.0
1.21.7 vs 1.22.0 with commit 460541
1.21.7 vs 1.22.0 with reversion identified by bisect
|
Curious, is there any update on this? |
Not from me I'm afraid. I've now upgraded to go 1.22.0 and accepted the problem for what it is. TBH, because my application is mostly capped at around "60fps", this issue isn't that great for me. But there are instances when I want as much throughput as possible and I maintain that there's a definite drop with 1.22.0. For other people and for other applications where throughput is a primary goal, I think this is an issue that still needs addressing by the Go development team. |
Go version
go version go1.22.0 linux/amd64
Output of
go env
in your module/workspace:What did you do?
I am comparing the performance my project when compiled with Go 1.22.0 and when compiled with Go 1.21.7
The full instructions for my testing procedure
and the result CPU profilesand the test program have been placed in a specially created Github repositoryWhat did you see happen?
The performance of my project has dropped significantly with v1.22.0. Using the metric of performance used in my project, the performance with Go v1.21.7 is 146.28fps and the performance with Go v1.22.0 is 133.48fps. This is a drop of 8.75%
What did you expect to see?
I expected either no significant change or an increased performance rating. This method of measurement has been used for many years and has shown increases in performance as expected when new Go versions have been released. For example, the introduction of profile-guided optimisation, showed the expected increase in performance.
The text was updated successfully, but these errors were encountered: