New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible fix for CCCC slowdown #3288
Conversation
No functional change bench: 4109336
|
would be good to have some performance numbers for this somehow. We don't have testing access to that machine. What might be interesting is to look at some hardware counters and see how much this influences read/write stats? |
|
So I instrumented the code and it looks like during normal game play under STC conditions the check filters out 87% of the refreshes. While we don't have direct access to the machine it is likely CCCC will update SF from abrok before they get the memory upgrade especially if it says it may fix the slowdown. So we have a short window unique opportunity to find out if this matters on systems with low bandwidth relative to processor performance. If it doesn't then we won't have to worry it will bite us in the future and we will revert of course. If it does then we got a fix. Both are good outcomes. |
|
let me try to do a benchmark with many threads (but better memory bandwidth) and see what happens. |
|
No difference as far as I can see on this system: |
|
Using bench is quite a different refresh pattern compared to game play because the starting bench positions are completely different from each other unlike consecutive positions in a game. |
|
@CoffeeOne if you could do experiment 2 (depopulate some memory channels) that would be very interesting. I think the 'reasonable' setup would be to remove 1 module per CPU. |
|
I also think this is worth merging because it is known not to be harmful and we only have a short window of opportunity for CCCC to get it from abrock before they upgrade their memory. After that we can easily revert depending on the results. I'm not against any additional testing or investigation it's just that I wouldn't wait before we miss our chance at CCCC where the problem is know. For testing I think the most important thing is that we test on a machine configuration that experiences a slowdown issue due to memory bandwidth. Something like 2x as does the CCCC machine. If removing some memory from your machine causes a slowdown replicating this condition that should be good. |
|
I don't like the idea to commit to master to see if things help for some online tournament. If nobody can reproduce it locally it is too much of a corner case to worry about. |
|
Hi, So I tried again with the same executables: I had to lower the size of the TT table, because the machine has now 8GB only. So the branch is still faster, but now only ~0.5% (before it was 2%) |
|
@CoffeeOne Thanks for testing. Since the nps for master is not much lower than it was before you removed the memory it still doesn't look like we are reproducing a bandwidth limited setup. The CCCC machine is about 2x slower than expected. |
|
@vondele Do you have a way to remove memory from your machine? Since it's virtually the same as CCCC you may have the best shot of recreating the issue. |
|
no I can't. |
|
I'll close this, as it is not obviously a fix. |


Don't unnecessarily refresh TT entries to lower memory bandwidth.
No functional change
bench: 4109336