Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrated to sharded latch & commit buffer #12

Merged
merged 48 commits into from Jul 14, 2021
Merged

Migrated to sharded latch & commit buffer #12

merged 48 commits into from Jul 14, 2021

Conversation

kelindar
Copy link
Owner

@kelindar kelindar commented Jul 14, 2021

This PR introduces a couple of major changes in this library.

  1. The in-memory commit buffer is now essentially []byte which can be potentially written to disk in future for disaster recovery, see Persistency (at least for fast recovery - maybe some log of requests?) #5. This commit buffer stores various operations and their offsets and has been optimized.
  2. While keeping the isolation level at read committed, its introducing a shared mutex, as discussed Txn Commit can result in transaction guarantee violation #6. This increases overall concurrency of a single large collection. The collection is essentially divided in "chunks" of 16K elements and we're now using 128 latches (sharded RWMutex) in order to control concurrent access.

Performance hit of sharded mutex is relatively small, and in case of updates the new commit buffer it actually reduces heap allocations as it no longer requires to allocate a single interface{} on the heap.

Benchmark go1.16

I ran a small benchmark with various workloads (90% read / 10% write, etc) on a collection of 1 million elements with different goroutine pools. In this example we're combining two types of transactions:

  • Read transactions that query a random index and iterate over the results over a single column.
  • Write transactions that update a random element (point-write).

Note that the goal of this benchmark is to validate concurrency, not throughput this represents the current "best" case scenario when the updates are random and do less likely to incur contention. Reads, however quite often would hit the same chunks as only the index itself is randomized.

As expected, it scales quite well to large number of goroutines unless the workload extremely write-heavy in which case exclusive latches on chunks would lead to contentions across the board and decrease the performance.

90%-10%       1 procs      143,221,213 read/s         70 write/s
90%-10%       8 procs    1,081,511,102 read/s        483 write/s
90%-10%      16 procs    1,068,562,727 read/s        455 write/s
90%-10%      32 procs    1,042,382,561 read/s        442 write/s
90%-10%      64 procs    1,039,644,346 read/s        446 write/s
90%-10%     128 procs    1,049,228,432 read/s        442 write/s
90%-10%     256 procs    1,027,362,194 read/s        477 write/s
90%-10%     512 procs    1,023,097,576 read/s        457 write/s
90%-10%    1024 procs      996,585,722 read/s        436 write/s
90%-10%    2048 procs      948,455,719 read/s        494 write/s
90%-10%    4096 procs      930,094,338 read/s        540 write/s
50%-50%       1 procs      142,015,047 read/s        598 write/s
50%-50%       8 procs    1,066,028,881 read/s      4,300 write/s
50%-50%      16 procs    1,039,210,987 read/s      4,191 write/s
50%-50%      32 procs    1,042,789,993 read/s      4,123 write/s
50%-50%      64 procs    1,040,410,050 read/s      4,102 write/s
50%-50%     128 procs    1,006,464,963 read/s      4,008 write/s
50%-50%     256 procs    1,008,663,071 read/s      4,170 write/s
50%-50%     512 procs      989,864,228 read/s      4,146 write/s
50%-50%    1024 procs      998,826,089 read/s      4,258 write/s
50%-50%    2048 procs      939,110,917 read/s      4,515 write/s
50%-50%    4096 procs      866,137,428 read/s      5,291 write/s
10%-90%       1 procs      135,493,165 read/s      4,968 write/s
10%-90%       8 procs    1,017,928,553 read/s     37,130 write/s
10%-90%      16 procs    1,040,251,193 read/s     37,521 write/s
10%-90%      32 procs      982,115,784 read/s     35,689 write/s
10%-90%      64 procs      975,158,264 read/s     34,041 write/s
10%-90%     128 procs      940,466,888 read/s     34,827 write/s
10%-90%     256 procs      930,871,315 read/s     34,399 write/s
10%-90%     512 procs      892,502,438 read/s     33,955 write/s
10%-90%    1024 procs      834,594,229 read/s     32,953 write/s
10%-90%    2048 procs      785,583,770 read/s     32,882 write/s
10%-90%    4096 procs      688,402,474 read/s     34,646 write/s

Benchmark go1.17beta1

I also ran the exact same benchmark on go 1.17 beta1 and the improvement is quite impressive. I suspect because of golang/go#40724 major change by the amazing Go team.

90%-10%       1 procs      237,130,690 read/s        112 write/s
90%-10%       8 procs    1,651,884,038 read/s        717 write/s
90%-10%      16 procs    1,604,529,778 read/s        684 write/s
90%-10%      32 procs    1,568,422,932 read/s        705 write/s
90%-10%      64 procs    1,368,854,176 read/s        603 write/s
90%-10%     128 procs    1,376,234,760 read/s        601 write/s
90%-10%     256 procs    1,444,827,685 read/s        634 write/s
90%-10%     512 procs    1,382,944,862 read/s        630 write/s
90%-10%    1024 procs    1,385,708,505 read/s        641 write/s
90%-10%    2048 procs    1,400,975,478 read/s        678 write/s
90%-10%    4096 procs    1,272,429,528 read/s        645 write/s
50%-50%       1 procs      240,843,311 read/s        949 write/s
50%-50%       8 procs    1,658,665,375 read/s      6,591 write/s
50%-50%      16 procs    1,653,341,392 read/s      6,674 write/s
50%-50%      32 procs    1,558,058,949 read/s      6,176 write/s
50%-50%      64 procs    1,430,884,504 read/s      5,751 write/s
50%-50%     128 procs    1,451,153,699 read/s      5,661 write/s
50%-50%     256 procs    1,443,416,127 read/s      5,726 write/s
50%-50%     512 procs    1,355,457,178 read/s      5,645 write/s
50%-50%    1024 procs    1,249,493,888 read/s      5,221 write/s
50%-50%    2048 procs    1,162,011,258 read/s      5,484 write/s
50%-50%    4096 procs    1,102,741,629 read/s      5,286 write/s
10%-90%       1 procs      203,623,696 read/s      7,311 write/s
10%-90%       8 procs    1,062,318,113 read/s     38,339 write/s
10%-90%      16 procs    1,077,146,140 read/s     37,950 write/s
10%-90%      32 procs    1,068,210,272 read/s     38,919 write/s
10%-90%      64 procs    1,098,461,537 read/s     39,370 write/s
10%-90%     128 procs    1,035,202,595 read/s     37,986 write/s
10%-90%     256 procs    1,020,512,476 read/s     38,001 write/s
10%-90%     512 procs    1,147,669,716 read/s     41,260 write/s
10%-90%    1024 procs    1,103,880,028 read/s     42,898 write/s
10%-90%    2048 procs      973,620,731 read/s     41,779 write/s
10%-90%    4096 procs      887,719,199 read/s     41,167 write/s

@kelindar kelindar requested a review from Florimond July 14, 2021 21:03
@kelindar kelindar merged commit 77dc531 into main Jul 14, 2021
@kelindar kelindar deleted the smutex branch July 14, 2021 21:22
sthagen added a commit to sthagen/kelindar-column that referenced this pull request Jul 14, 2021
Migrated to sharded latch & commit buffer (kelindar#12)
@objectref
Copy link

objectref commented Aug 26, 2021

Hi! Thanks for the information, it seems that register-based calling convention is giving excellent performance benefits. Did you run the benchmark with the officially released Go 1.17 too?

@kelindar
Copy link
Owner Author

@objectref I did over the weekend, the numbers looked roughly the same as the beta.

@objectref
Copy link

Ok, thank you! Impressive difference, indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants