Fix race condition in LanguageModelSession when streaming responses#126
Fix race condition in LanguageModelSession when streaming responses#126mattt merged 4 commits intohuggingface:mainfrom
LanguageModelSession when streaming responses#126Conversation
- Fix race condition when streaming responses.
There was a problem hiding this comment.
Pull request overview
This PR attempts to fix race conditions in LanguageModelSession by replacing the mixed synchronization approach (MainActor dispatching and a RespondingState actor) with a unified OSAllocatedUnfairLock<State> pattern. The changes bundle mutable state (transcript and isResponding) into a State struct protected by a lock.
Changes:
- Replace stored properties
isRespondingandtranscriptwith computed properties that read from locked state - Remove
RespondingStateactor and replace withStatestruct protected byOSAllocatedUnfairLock - Convert all
MainActor.runcalls tostate.withLockcalls for synchronization
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
OK, I replaced Obviously, I would prefer not to introduce any synchronization primitives to the codebase, and I'm not sure how you'll feel about the addition, @mattt., but I'm pretty happy with the fix overall. ...now I'll deal with the Observation stuff Copilot called out. |
| public private(set) var isResponding: Bool = false | ||
| public private(set) var transcript: Transcript | ||
| public var isResponding: Bool { | ||
| access(keyPath: \.isResponding) |
There was a problem hiding this comment.
According to Philippe Hausler on the Swift Forums, the access() and withMutation() calls need to happen outside of locks.
| withMutation(keyPath: \.isResponding) { | ||
| state.access { $0.beginResponding() } |
There was a problem hiding this comment.
According to Philippe Hausler on the Swift Forums, the access() and withMutation() calls need to happen outside of locks.
|
OK, this should be ready to go. I didn't mean for it to get so large when I started out. Sorry about that. |
@atdrendel Not at all! Thank you so much for diving into the implementation details of Swift concurrency and observability. I think what's there now reflects some combination of wishful thinking and misunderstanding on my part. Taking a look now. |
|
@atdrendel I feel like I've gone on the same journey a few times, trying to implement cross-platform synchronization. We're not making it easy for ourselves, with such early deployment targets (maybe we should relax those?), but I swear, it's "oh, right, we can't use that" over and over again 🫠 Where we diverge is that I give up and reach for Do you think that approach works alright? Like, are you able to reproduce the race condition you observed in that branch? |
Thanks for taking a look at it, @mattt. Yeah, I wasn't thrilled with the idea of adding a new synchronization primitive to your project, but I felt comfortable with it because the same pattern was used in swift-async-algorithms and swift-nio, but I can understand your hesitancy. The reason why I said we couldn't use I checked out your branch. It looked OK and ran fine, but I'm not a huge fan of keeping locks and the thing they are locking stored separately. It would be way too easy, in the future, to forget to lock access to the |
That's a fair point! There's a competing interest in minimizing API surface area (even in the implementation), because it's hard to keep track of what is and what isn't in Foundation Models. But I think correctness trumps parsimony here. I'll go ahead and close #130, pick this up, and get it merged. |
|
@mattt Thanks, Matt. Oh, and to answer your earlier question about deployment targets, I think keeping them as low as possible is the way to go. I've started to contribute to AnyLanguageModel because I see a world where I can replace a bunch of the custom code in our app with AnyLanguageModel, but increasing the deployment target to iOS 18 right now would be tough for us. We're actually just now increasing it to iOS 17 from iOS 16 because mlx-swift-lm now requires iOS 17. mlx-swift-examples only ever required iOS 16, which had been our minimum. |
|
This is now available in 0.7.1 |
LanguageModelSession.transcriptwas usually, but not always, protected from race conditions by limiting changes to theMainActor. Of course, the "but not always" (such as instreamResponse(to:generating:includeSchemaInPrompt:options:)) means that race conditions were possible andLanguageModelSession's@unchecked Sendablewasn't really accurate. I tested this by addingMainActor.assumeIsolated()in the functions that didn't dispatch toMainActorand ran the tests. The assertions fired, showing that access toLanguageModelSession.transcriptwas not correctly protected against race conditions.When I looked more carefully at the code of
LanguageModelSession, I noticed that there were different synchronization methods being used.LanguageModelSession.transcriptwas, as I said, kind of being protected by sometimes dispatching toMainActor.RespondingStatewas, on the other hand, an actor, which introduced (to my mind) unnecessary thread hops to the codebase.My solution was to bundle up the mutable state of
LanguageModelSessioninto aStatestruct and guard it withOSAllocatedUnfairLock<State>Locked<State>.Running the tests, everything seemed to work fine, and it reduced the number
awaitcalls we need to make. Honestly, when compared with the time cost of streaming AI responses, neither thread hops nor acquiring locks matter much, but I would guess acquiring uncontested locks would be a bit faster than thread hops.