Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: improve timer performance #53953

Open
OmidHekayati opened this issue Jul 19, 2022 · 9 comments
Open

runtime: improve timer performance #53953

OmidHekayati opened this issue Jul 19, 2022 · 9 comments
Assignees
Labels
compiler/runtime NeedsInvestigation
Milestone

Comments

@OmidHekayati
Copy link

OmidHekayati commented Jul 19, 2022

Needed changes to improve performance (At least in theory):

  • Change timers []*timer to timers []timerBucket
  • Make new timerBucket struct as
type timerBucket struct {
	timer *timer
	// Two reason to have timer when here:
	// - hot cache to prevent dereference timer to get when field
	// - It can be difference with timer when filed in timerModifiedXX status.
	when int64
}
  • Delete when int64 from timer struct in favor of timerBucket when field.
  • Rename nextwhen int64 to when int64 in timer struct
  • Make some few other changes to respect above changes in related functions

By this changes, we claim that we improve performance in siftupTimer and siftdownTimer when need to compare timers added when to timing heap, we don't need to get data from many different location in the memory (dereference many timers) and we have timers added when in sequential array.

Additional suggestions to improve code readability:

  • make timer.go and timers.go in runtime package
  • make timers struct
  • move timers field from p struct in runtime2.go to timers struct and embed this struct instead
  • move all timers logic from many files in runtime package to timers.go such as logics in time.go - almost everthing, proc.go - checkTimers, ...

p.s: If the idea is ok, I can help to make changes by send a PR.

@seankhliao
Copy link
Member

seankhliao commented Jul 19, 2022

do you have benchmarks to prove that these will provide real world performance improvements?

@OmidHekayati
Copy link
Author

OmidHekayati commented Jul 19, 2022

do you have benchmarks to prove that these will provide real world performance improvements?

No, but It's not to hard to write some codes.

Anyway, as said, It is obvious (in theory) that when we need dereference many timer struct in many location (due to each timer sturct allocate in very different and random memory location) to have a field (here when int64 field), we loose hot L1 CPU (or some huge timers slice L2) cache.

@ianlancetaylor ianlancetaylor changed the title runtime/timer: Improve performance runtime: improve timer performance Jul 19, 2022
@gopherbot gopherbot added the compiler/runtime label Jul 19, 2022
@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jul 19, 2022

I don't quite see how modtimer works if we remove the when field from timer. The modtimer function would not have access to the timerBucket, so it would not know the current value of the when field, so it would not know whether to set timerModifiedEarlier or timerModifiedLater. But that knowledge seems fairly important for performance of typical programs.

@OmidHekayati
Copy link
Author

OmidHekayati commented Jul 20, 2022

I don't quite see how modtimer works if we remove the when field from timer. The modtimer functio would not have access to the timerBucket, so it would not know the current value of the when field, so it would not know whether to set timerModifiedEarlier or timerModifiedLater. But that knowledge seems fairly important for performance of typical programs.

I agree with you that in one scenario, It is better to check timerBucket.when with new timer when in modtimer, but it is very rare situation that a timer changed more than once before its timerout.

Timer set to 60 sec later, after 10 sec logic change timer to 120 sec later, after another 10 sec logic change timer to 60 sec later. We change timer.status to timerModifiedEarlier but it must change to timerModifiedLater base on timer added to timing heap. But base on timer logic we change it correctly and just to prevent heap sorting on every timers changes we break the timer rules.
Anyway almost we don't lose anything in this situation, just call updateTimerModifiedEarliest wrongly and one atomic and one compare waste. In other functions, We don't make a difference between this two status as case timerModifiedEarlier, timerModifiedLater:

@toothrot toothrot added the NeedsInvestigation label Jul 20, 2022
@toothrot toothrot added this to the Backlog milestone Jul 20, 2022
@mknyszek
Copy link
Contributor

mknyszek commented Jul 20, 2022

When it comes down to it, what matters most is if it improves the performance of real-world programs. You can feel free to implement it and send a PR and try it out, but the decision on whether to merge it is going to come down to actual performance data (though of course if you have any nice cleanups, those would be welcomed on their own). Assigning to you @OmidHekayati for now.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jul 20, 2022

@OmidHekayati We got to our current approach through a series of real world programs and bug reports that showed bad performance with earlier approaches. I would be quite reluctant to say things like "it is very rare situation that a timer changed more than once before its timerout" or "almost we don't lose anything in this situation." We added the current features because they mattered to real programs, not because we wanted increased complexity.

@OmidHekayati
Copy link
Author

OmidHekayati commented Jul 21, 2022

@ianlancetaylor You are right, I must choose better words to transfer my thoughts in better way (English is not my native language). I agree with you that no one want to increase complexity for nothing.

@mknyszek Thx. I will write some benchmark first to prove the idea and after your approvment, I will take a time to write the actual PR.

p.s: Any way I must say that we occur a real problem in one of our project to implement TCP on userspace in Go with millions of timer! So it is very obvious that a little performance improvements can change the game. So this issue is not just hobby for me.

@OmidHekayati
Copy link
Author

OmidHekayati commented Jul 22, 2022

As you can see our benchmark here, The below result show that when timers allocate in very different memory location with many timers in timing, it is take almost 10X more time to order the timers heap with exiting go runtime library.

Some consideration:

  • I know we don't reorder timers heap very much e.g. in every timer adding or modifing, So this benchmark can fool us about real situation.
  • May be you have better solution to fill memory to prevent timer allocate in order in memory to respect real software requirements, But I don't want to complicate the benchmarks at least now.

Any idea to improve benchmark to cover better real usage?

goos: windows
goarch: amd64
cpu: Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz

With fillsMemory:
Benchmark_newTiming-8   	  463105	      2778 ns/op	   27368 B/op	       2 allocs/op
Benchmark_oldTiming-8   	  248739	     24886 ns/op	   27376 B/op	       2 allocs/op

Without fillsMemory:
Benchmark_newTiming-8   	 7114608	       178.2 ns/op	      80 B/op	       1 allocs/op
Benchmark_oldTiming-8   	 5867421	       195.9 ns/op	      88 B/op	       1 allocs/op

OmidHekayati added a commit to OmidHekayati/go that referenced this issue Aug 5, 2022
The existing implementation has a poor structure and is a mess.
I just moved exist codes to seprated files and structures
to improve code and logic readability.
No logic has been changed in this cleanup.

Updates golang#53953
@gopherbot
Copy link

gopherbot commented Aug 5, 2022

Change https://go.dev/cl/421615 mentions this issue: runtime: cleanup timer codes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime NeedsInvestigation
Projects
Status: Todo
Development

No branches or pull requests

6 participants