Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Executors Unexpected Performance Penalty #69053

Open
MahdiBM opened this issue Oct 8, 2023 · 5 comments
Open

Custom Executors Unexpected Performance Penalty #69053

MahdiBM opened this issue Oct 8, 2023 · 5 comments
Assignees
Labels
concurrency runtime Feature: umbrella label for concurrency runtime features

Comments

@MahdiBM
Copy link

MahdiBM commented Oct 8, 2023

Description
An actor using a SwiftNIO EventLoop turns out to be slightly slower than an actor not using any custom executors.

I came across this unexpected behavior when trying to optimize Vapor's OpenAPI integration to stream response bodies without constantly context-switching between a Vapor.BodyStreamWriter's EventLoop and a Swift-concurrency thread.

The full thread of how i ended up going down this route is available here on the Swift Open Source slack server. However this issue not-only contains sufficient info regarding the problem, but also contains additional info about how i exactly tested this.

I also think that this might not necessarily be a bug, but just a suboptimal behavior which is planned to be optimized when e.g. Tasks can accept custom executors as well.
The tests have been done in RELEASE mode to ensure the legitimacy of the results.

I will refer to the implementation with actor using a NIO EventLoop custom executor as actor 1 / first implementation.
And the implementation with non-custom-executor actor as actor 2 / second implementation.

Clone swift-openapi-generator/Examples/GreetingService.
Edit Package.swift to use this version of Vapor:

.package(url: "https://github.com/vapor/vapor", exact: "4.84.4"),

For implementation 1, edit Package.swift to use this edition of swift-openapi-vapor (mahdibm/mmbm-test-actor):

.package(url: "https://github.com/mahdibm/swift-openapi-vapor", branch: "mmbm-test-actor"),

For #2 implementation (mahdibm/mmbm-test-actor2):

.package(url: "https://github.com/mahdibm/swift-openapi-vapor", branch: "mmbm-test-actor2"),

The actor in the #1 implementation looks like this:

#if compiler(>=5.9)
    @available(macOS 14.0, iOS 17.0, watchOS 10.0, tvOS 17.0, *)
    actor Writer {
        let unownedExecutor: UnownedSerialExecutor
        let writer: any Vapor.BodyStreamWriter
        let body: OpenAPIRuntime.HTTPBody

        init(
            writer: any Vapor.BodyStreamWriter,
            body: OpenAPIRuntime.HTTPBody
        ) {
            self.unownedExecutor = writer.eventLoop.executor.asUnownedSerialExecutor()
            self.writer = writer
            self.body = body
        }

        func write() async {
            do {
                for try await chunk in body {
                    try await writer.write(.buffer(ByteBuffer(bytes: chunk))).get()
                }
                try await writer.write(.end).get()
            } catch {
                try? await writer.write(.error(error)).get()
            }
        }
    }
#endif // compiler(>=5.9)

The #2 implementation does not set a custom executor, but is otherwise identical:

-        let unownedExecutor: UnownedSerialExecutor
        let writer: any Vapor.BodyStreamWriter
        let body: OpenAPIRuntime.HTTPBody

        init(
            writer: any Vapor.BodyStreamWriter,
            body: OpenAPIRuntime.HTTPBody
        ) {
-            self.unownedExecutor = writer.eventLoop.executor.asUnownedSerialExecutor()
            self.writer = writer
            self.body = body
        }

I used a simple Vapor app to perform the tests:

import Vapor
import Dispatch
import Logging
import AsyncHTTPClient
import NIOCore

/// This extension is temporary and can be removed once Vapor gets this support.
private extension Vapor.Application {
    static let baseExecutionQueue = DispatchQueue(label: "vapor.codes.entrypoint")
    
    func runFromAsyncMainEntrypoint() async throws {
        try await withCheckedThrowingContinuation { continuation in
            Vapor.Application.baseExecutionQueue.async { [self] in
                do {
                    try self.run()
                    continuation.resume()
                } catch {
                    continuation.resume(throwing: error)
                }
            }
        }
    }
}

@main
enum Entrypoint {
    /// Didn't touch the default Vapor-template `main()` function much as i wanted to just perform some simple tests.
    static func main() async throws {
        var env = try Environment.detect()
        try LoggingSystem.bootstrap(from: &env)
        
        let app = Application(env)
        defer { app.shutdown() }

        let start = Date().timeIntervalSince1970
        print("start")
        let count = 100_000

        for idx in 0..<count {
            let percent = Double(idx) / Double(count / 100)
            if percent.truncatingRemainder(dividingBy: 5) == 0 {
                print(Int(percent), "%")
            }
            /// Call the running Vapor-OpenAPI app.
            let request = HTTPClientRequest(url: "http://localhost:8080/api/greet")
            let res = try await app.http.client.shared.execute(
                request,
                deadline: .now() + .seconds(1)
            )
            let collected = try await res.body.collect(upTo: .max)
            let string = String(buffer: collected)
        }

        let end = Date().timeIntervalSince1970
        print("took \(end - start)")
    }
}

This would take ~85s for the first implementation, and ~74s for the second.

The issue is clear here. The first implementation using a custom executor is ~15% slower than non-custom-executor implementation although Vapor.BodyStreamWriter performs its work on a NIO EventLoop which when working with Swift-concurrency system, triggers a context switch.

The tests were all done using Xcode 15 release, and on macOS 14.0.

@MahdiBM MahdiBM added bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. triage needed This issue needs more specific labels labels Oct 8, 2023
@ktoso
Copy link
Contributor

ktoso commented Oct 9, 2023

Thanks a lot for the write up! I suspect there's a lot of hops happening due to the body async stream hopping to global pool and back but I'll investigate to be sure.

Thanks for the real world test case!

Cc @hborla @rjmccall

@FranzBusch
Copy link
Member

Thanks for filing this. We are currently also benchmarking our NIO custom executors over here: apple/swift-nio#2536

Tl;dr: Right now our custom executor implementation allocates for every single job that is running. I have some patches to fix this up. However, there is an allocation in the runtime in swift_continuation_throwingResumeImpl more specifically the createQueue call there allocates.
Additionally, as long as we do not have task executors @ktoso comment is a big performance penalty as well due to the constant context switching.

@ktoso ktoso self-assigned this Oct 9, 2023
@ktoso ktoso added concurrency runtime Feature: umbrella label for concurrency runtime features and removed bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. triage needed This issue needs more specific labels labels Oct 9, 2023
@adam-fowler
Copy link

From what I can tell in this case they'll never be any advantage to using a custom executor that runs on the EventLoop as you have already left the EventLoop when you created the unstructured Task to run your async code.

@FranzBusch
Copy link
Member

Yeah the example above does have a bunch of caveats to it and I think we should ignore those for a second and assume that the underlying transport is fully structured as well. This is what I am showing in my PR and from my findings we can get very very close to the full NIO performance when using Concurrency as long as we get task executors.

Just opened a PR that should tremendously increase the performance of our SerialExecutor conformance for NIOPosix: apple/swift-nio#2538.

@MahdiBM
Copy link
Author

MahdiBM commented Oct 9, 2023

Yeah my final findings after a few hours of testing was basically what @ktoso said.
I think the missing piece here is just task executors.

@adam-fowler what I was trying to make happen was to skip the possibly-multiple context-switches that will happen in the 'write()' function... Basically buying one context switch so the 'write()' function can work without context switches after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concurrency runtime Feature: umbrella label for concurrency runtime features
Projects
None yet
Development

No branches or pull requests

4 participants