-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String Bank Performance Gains #1367
Conversation
Sweet! |
Wow, this looks insane! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡ ⚡ ⚡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing documentation, benchmarks, and super helpful PR notes. The speedup is nice too 😁
// This is the old implementation of the string bank to use for comparison benchmarks | ||
func BankLegacy(s string) string { | ||
ss, _ := oldBank.LoadOrStore(s, s) | ||
return ss.(string) | ||
} | ||
|
||
func ClearBankLegacy() { | ||
oldBank = sync.Map{} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exported is fine in the test context, since _test
files aren't compiled outside of testing, but not necessary. Probably worth a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that about compilation, always good to expand my knowledge of Go esoteria a bit.
b.StopTimer() | ||
randStrings := generateBenchData(totalStrings, totalUnique) | ||
|
||
for i := 0; i < b.N; i++ { | ||
b.StartTimer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a future PR, I think this is more idiomatic (unless I'm missing something!):
b.StopTimer() | |
randStrings := generateBenchData(totalStrings, totalUnique) | |
for i := 0; i < b.N; i++ { | |
b.StartTimer() | |
randStrings := generateBenchData(totalStrings, totalUnique) | |
b.ResetTimer() | |
for i := 0; i < b.N; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Reset() call Start() on an already Stopped() bench?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption is that a new call to a BenchmarkFoo
func (if being handled by the bench runner for a higher-b.N
bench) should have a totally fresh timer. But... can't say for sure.
ClearBankLegacy() | ||
runtime.GC() | ||
debug.FreeOSMemory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Does Go not guarantee a GC in between benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea, belt and suspenders :)
…ngen decoding Signed-off-by: Matt Bolt <mbolt35@gmail.com>
40fded4
to
5eca029
Compare
Summary
This change updates the
stringutil.Bank
operation to use amap[string]string
andsync.Mutex
instead of async.Map
. The choice ofsync.Map
is based on the golang documentation of the "proper" use-cases forsync.Map
:The results from the benchmarks are as follows:
stringutil.Bank
Legacy
stringutil.Bank
implementation usingsync.Map
New
stringutil.Bank
implementation usingmap[string]string
andsync.Mutex
For 90% collision rate (90% of the total 1,000,000 strings were duplicates), the new Bank used 2201692 less allocations and allocated 20% the total bytes compared to the legacy Bank. It ran 2.5x faster than the legacy Bank as well.
stringutil.BankFunc
Using this type of map also opened the door for some further optimization with
util.Buffer
in thebytesToString(b []byte) string
implementation. We can now use the existing[]byte
to check the cache versus allocating a new string just to throw it away.The
BankFunc
benchmarks are very comparable toBank
:Buffer.bytesToString
Using the
BankFunc
option with theBuffer
andbingen-file-loader
, these bench tests represent a 40-day sequential load of scale Allocation Data:Using Old
Bank()
which allocates the new string no matter what, throws away the string if dupe exists, and usessync.Map
for string cache:Using New
Bank()
which allocates the new string no matter what, throws away the string if dupe existsUsing
BankFunc()
for pinnedbyte -> string
Load And Store. Strings aren't allocated on the "key check", only when the allocation should be stored:Load times for this change were
7-8s
faster for the full 40 days.Running Concurrently Loaded Bingen went from 26s serially to
Total Time: 13078ms
. This dropped 4-5s on concurrently loaded 40d scale data.Signed-off-by: Matt Bolt mbolt35@gmail.com
What does this PR change?
How will this PR impact users?
How was this PR tested?