Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Michael-Scott queue and remove snapshot mechanism #122

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

polytypic
Copy link
Contributor

@polytypic polytypic commented Jan 21, 2024

This PR optimizes the Michael-Scott queue implementation. The main optimizations are avoiding false sharing and then optimization of the node representation, which effectively halves the length of the linked list of nodes by "inlining" the atomic into the node record and improves performance dramatically.

This PR also removes the snapshot mechanism. The snapshot mechanism is not used in any tests in this repository. Furthermore, the snapshot mechanism cannot be made to work (efficiently) as such with the traditional Michael-Scott queue representation where the head points to a node whose value has been taken from the queue. That is because, to avoid space leaks, the value in the head node must be overwritten and that simply breaks the snapshot mechanism. The previous implementation avoided this problem by using an inefficient node representation that effectively made the linked list of nodes twice as long and operations on the queue twice as slow.

Here is a run of the benchmarks on my M3 Max before the optimizations:

➜  saturn git:(rewrite-bench) ✗ dune exec --release -- ./bench/main.exe -budget 1 'free Queue' | jq '[.results.[].metrics.[] | select(.name | test("over")) | {name, value}]'
[                                    
  {
    "name": "messages over time/one domain",
    "value": 25.832570841950652
  },
  {
    "name": "messages over time/1 nb adder, 1 nb taker",
    "value": 30.775546858386015
  },
  {
    "name": "messages over time/1 nb adder, 2 nb takers",
    "value": 28.61298006857959
  },
  {
    "name": "messages over time/2 nb adders, 1 nb taker",
    "value": 26.324158773037244
  },
  {
    "name": "messages over time/2 nb adders, 2 nb takers",
    "value": 26.72249173339718
  }
]

And here after the optimizations:

➜  saturn git:(optimize-michael-scott-queue) ✗ dune exec --release -- ./bench/main.exe -budget 1 'free Queue' | jq '[.results.[].metrics.[] | select(.name | test("over")) | {name, value}]'
[                                    
  {
    "name": "messages over time/one domain",
    "value": 49.09980418998089
  },
  {
    "name": "messages over time/1 nb adder, 1 nb taker",
    "value": 98.92424842899835
  },
  {
    "name": "messages over time/1 nb adder, 2 nb takers",
    "value": 73.48287261205158
  },
  {
    "name": "messages over time/2 nb adders, 1 nb taker",
    "value": 79.05834136653924
  },
  {
    "name": "messages over time/2 nb adders, 2 nb takers",
    "value": 70.05359476527885
  }
]

Here is a run of the benchmarks without the inlined node representation:

➜  saturn git:(optimize-michael-scott-queue) ✗ dune exec --release -- ./bench/main.exe -budget 1 'free Queue' | jq '[.results.[].metrics.[] | select(.name | test("over")) | {name, value}]'
[                                    
  {
    "name": "messages over time/one domain",
    "value": 27.799662289702503
  },
  {
    "name": "messages over time/1 nb adder, 1 nb taker",
    "value": 46.095344138917085
  },
  {
    "name": "messages over time/1 nb adder, 2 nb takers",
    "value": 42.19755884308922
  },
  {
    "name": "messages over time/2 nb adders, 1 nb taker",
    "value": 63.98353506270374
  },
  {
    "name": "messages over time/2 nb adders, 2 nb takers",
    "value": 58.21147004626938
  }
]

As you can see, on the one domain and 1 nb adder, 1 nb taker configurations the inlined node representation is roughly twice as fast! On the more contentious configurations, the additional cores likely both help to overcome some of the additional overhead (when not using the inlined representation) and the additional contention brings down the performance of inlined representation so the performance difference is not quite as dramatic.

You can run cp test/michael_scott_queue/michael_scott_queue_node.ml src_lockfree or git reset --hard HEAD~1 to use the non-inlined node representation:

modified   src_lockfree/michael_scott_queue_node.ml
@@ -3,12 +3,10 @@ module Atomic = Transparent_atomic
 type ('a, _) t =
   | Nil : ('a, [> `Nil ]) t
   | Next : {
-      mutable next : ('a, [ `Nil | `Next ]) t;
+      next : ('a, [ `Nil | `Next ]) t Atomic.t;
       mutable value : 'a;
     }
       -> ('a, [> `Next ]) t
 
-let[@inline] make value = Next { next = Nil; value }
-
-external as_atomic : ('a, [ `Next ]) t -> ('a, [ `Nil | `Next ]) t Atomic.t
-  = "%identity"
+let[@inline] make value = Next { next = Atomic.make Nil; value }
+let[@inline] as_atomic (Next r : ('a, [ `Next ]) t) = r.next

@polytypic polytypic force-pushed the optimize-michael-scott-queue branch 9 times, most recently from 572bf7e to 4026c70 Compare January 22, 2024 10:23
@polytypic polytypic marked this pull request as ready for review January 22, 2024 10:57
@polytypic polytypic requested a review from a team January 22, 2024 11:25
@polytypic polytypic changed the title Optimize Michael-Scott queue Optimize Michael-Scott queue and remove snapshot mechanism Jan 22, 2024
@polytypic polytypic force-pushed the optimize-michael-scott-queue branch 8 times, most recently from 2e44242 to 311adcf Compare January 22, 2024 20:05
@polytypic polytypic mentioned this pull request Jan 23, 2024
Copy link
Collaborator

@lyrm lyrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR !
I made a few minor comments (including one related to the use of Obj.magic that seems safer here then in the Spsc_queue optimization PR).

I really like the refactoring work to factorize the exception/option version of a function !

Comment on lines +23 to +24
head : ('a, [ `Next ]) Node.t Atomic.t;
tail : ('a, [ `Next ]) Node.t Atomic.t;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to put ('a option, ['Next]) Node.t Atomic.t as a type for both the head and the tail and with minimum changes below, the code works :

  • without Obj.magic
  • with (roughly) a 10% decrease in performance (I did not try it a lot so it may fall into error margins).

Also here Obj.magic is used for value that should never be read so this seems to be an acceptable use for it !

loop b
in
loop b
type ('a, _) poly = Option : ('a, 'a option) poly | Value : ('a, 'a) poly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice way of refactoring to avoid copy/paste code ! I will remember it :)


let peek_opt { head; _ } =
let rec pop_as :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function be [@inline] ? (as well as the other _as functions below). I would I guess you use a GADT instead of a function to avoid the issue inlining have with functions.

let[@inline] as_atomic (Next r : ('a, [ `Next ]) t) = r.next
let[@inline] make value = Next { next = Nil; value }

external as_atomic : ('a, [ `Next ]) t -> ('a, [ `Nil | `Next ]) t Atomic.t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_as_atomic seems like a better name for readability (and I think it is the name you usually use). Also, I won't be again a comment in the code about this line :)

@polytypic polytypic force-pushed the optimize-michael-scott-queue branch 4 times, most recently from 513ec24 to 8b3d673 Compare February 17, 2024 17:21
@polytypic polytypic force-pushed the optimize-michael-scott-queue branch 3 times, most recently from 61bc326 to ceb8e26 Compare March 1, 2024 13:13
@polytypic polytypic force-pushed the optimize-michael-scott-queue branch from ceb8e26 to d495cb1 Compare March 1, 2024 13:50
@polytypic polytypic force-pushed the optimize-michael-scott-queue branch from d495cb1 to f580b8e Compare March 13, 2024 07:46
This moves the node representation to a separate file so that it is easy
to change only the node representation.

This also changes the queue representation to be closer to the traditional
Michael-Scott queue where the head points to a node whose value is no longer
logically part of the queue.  The "snapshot" mechanism cannot work with this
representation, because the nodes must be mutated as they are logically removed
from the queue to free the values to avoid space leaks.

This commit also implements various micro-optimizations and tweaks to the
implementation that improve performance.  For example, backoffs are added after
failed CAS operations, which improves performance in case of contention.
This effectively halves the length of the linked list of nodes and roughly
doubles the thruput of the queue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants