Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential deadlock only reproducible in Linux when reusing connections #90

Closed
0xpablo opened this issue Jun 25, 2020 · 4 comments
Closed

Comments

@0xpablo
Copy link

0xpablo commented Jun 25, 2020

Describe the bug
I'm using APNSwift to deliver push notifications.
I'm having a deadlock (or something that never fulfill the EventLoopFuture that is returned on the .send method of APNSwiftConnection).

For context, the service I’m trying to run polls messages from a SQS queue, and then sends them using the APNSwift library.
I kept the service up all night processing notifications on a couple macOS machines and this morning they were still working. I can’t get the Linux service to work for more than 40 minutes.

I have 2 main components:

  • The QueueProcessor (1 instance)
  • The PushNotificationSender (1 instance)

While I’m sending messages via APNSwift, I stop polling from SQS.
This is how my PushNotificationSender looks like (I omitted some functions for clarity):

class PushNotificationSender {
    let group = MultiThreadedEventLoopGroup(numberOfThreads: System.coreCount)
    
    lazy var productionAPNS = connectProduction()
    lazy var sandboxAPNS = connectSandbox()
    
    func sendPush(_ pushNotificationMessage: PushNotificationMessage) -> EventLoopFuture<Void> {
        let endpoint = pushNotificationMessage.production ?
            productionAPNS.flatMapError { _ in self.connectProduction() } :
            sandboxAPNS.flatMapError { _ in self.connectSandbox() }
        
        return endpoint
            .flatMapErrorThrowing { _ in
                throw PushNotificationSenderError.connectionError
            }
            .flatMap {
                $0.send(
                    self.apnsPayload(pushNotificationMessage: pushNotificationMessage),
                    to: pushNotificationMessage.token,
                    expiration: Date().addingTimeInterval(Constants.notificationExpiration))
                    .flatMapErrorThrowing(self.flatMapErrorThrowing(error:))
            }
    }private func connectProduction() -> EventLoopFuture<APNSwiftConnection> {
        APNSwiftConnection.connect(configuration: .configuration(environment: .production),
                                   on: group.next())
    }
    
    private func connectSandbox() -> EventLoopFuture<APNSwiftConnection> {
        APNSwiftConnection.connect(configuration: .configuration(environment: .sandbox),
                                   on: group.next())
    }
}

I’m not entirely sure if it’s a deadlock or not, but it happens on the send method.

❌ Looking at my server logs (as I passed a logger to the APNSwift lib), this is what I see:

2020-06-24T20:21:46+0000 debug: Send - starting up
2020-06-24T20:21:46+0000 info: Send - sending
2020-06-24T20:21:46+0000 debug: Request - building

This is the source of the APNSwift library where that is logged

✅ This is the sequence of events when the APNS connection is not reused and everything’s fine:

2020-06-25T09:59:30+0200 debug: Connection - starting
2020-06-25T09:59:30+0200 debug: Connection - bringing up
2020-06-25T09:59:30+0200 info: Connection - up
2020-06-25T09:59:30+0200 debug: Send - starting up
2020-06-25T09:59:30+0200 info: Send - sending
2020-06-25T09:59:30+0200 debug: Request - building
2020-06-25T09:59:30+0200 debug: Response - received
2020-06-25T09:59:30+0200 info: Response - successful

✅ This is the sequence of events when the APNS connection is reused and everything’s fine too:

2020-06-25T10:23:08+0200 debug: Send - starting up
2020-06-25T10:23:08+0200 info: Send - sending
2020-06-25T10:23:08+0200 debug: Request - building
2020-06-25T10:23:08+0200 debug: Response - received
2020-06-25T10:23:08+0200 info: Response - successful

I was able to add a workaround, which is to not reuse connections. After doing that, I can’t reproduce the issue anymore on the linux machines. The way I was trying to reuse connections is this:

productionAPNS.flatMapError { _ in self.connectProduction() }

(Assuming the connection would error when closed, which I think it does on macOS)
I also just saw a bug in my code, which is that I’m not keeping the last connection upon reconnection. My productionAPNS var was never updated. To my understanding this bug only should make connections not being reused, but it shouldn’t cause the behavior that I’m seeing, right?

Platform:

  • OS: Ubuntu
  • Version 18.04
  • APNSSwift Dependency: 2.0.0-rc1

Thanks!

@kylebrowning
Copy link
Member

Thank you for this detailed response! WE've done a connection pool implementation over here.

That requires AsyncKit from vapor. But theoretically it should be the same

@tanner0101
Copy link
Collaborator

tanner0101 commented Jun 25, 2020

From what I can tell this seems like a slight misunderstanding with how futures work.

    lazy var productionAPNS = connectProduction()
    lazy var sandboxAPNS = connectSandbox()

This code will create a new APNS connection when the class initializes. The connection may succeed or fail. The resulting future will always have that value.

        let endpoint = pushNotificationMessage.production ?
            productionAPNS.flatMapError { _ in self.connectProduction() } :
            sandboxAPNS.flatMapError { _ in self.connectSandbox()  }

This code will attempt to reconnect if the original connection failed. Note that this reconnect will happen every single time. flatMapError will always be called if the future resulted in an error. The future does not update automatically or anything.

What you want here is a connection pool with 1 connection in it. Connection pooling can be tough to get right, so I'd recommend using vapor/async-kit's pool: https://github.com/vapor/async-kit/tree/master/Sources/AsyncKit/ConnectionPool. The only dependency this package has is NIO. Vapor's APNS package shows how to create a connection pool source for APNSwift's connection type: https://github.com/vapor/apns/blob/master/Sources/APNS/APNSConnectionSource.swift

@0xpablo
Copy link
Author

0xpablo commented Jun 25, 2020

Hi there @kylebrowning and @tanner0101, yep, that code has a bug as you say (I point this in the last paragraph of my description but I wanted to leave the initial code). I should have updated the references when creating new connections.

The thing is that it looks like at some point the future does not fail but I think the connection is broken, and when a new notification comes in, this is what gets logged:

2020-06-24T20:21:46+0000 debug: Send - starting up
2020-06-24T20:21:46+0000 info: Send - sending
2020-06-24T20:21:46+0000 debug: Request - building

See that there's no Connection - starting logs there. My expectations with my buggy code were that once the connection was broken, a new connection would be created every time a notification was sent. (If I fixed the bug then it would be able to reuse the connection sometimes).

Thanks for the heads up, I'm happy to give AsyncKit a try 🙂, which is probably the right thing to do, but I would like to understand what is going on and if there is a bug in APNSSwift or NIO or if it's just my code. I was surprised that it works on macOS but not on Linux. Is it expected that with that code, the .send()'s EventLoopFuture is never fulfilled, once a (probably broken) connection is retried? (on macOS it does and it creates a new connection).

Thanks a lot for your time guys!

EDIT: this is what I intended to do originally (which I think would exhibit the same issue on Linux):

        let endpoint = pushNotificationMessage.production ?
            productionAPNS.flatMapError { _ in
                self.productionAPNS = self.connectProduction()
                return self.productionAPNS
            } : sandboxAPNS.flatMapError { _ in
                self.sandboxAPNS = self.connectSandbox()
                return self.sandboxAPNS
            }

@kylebrowning
Copy link
Member

Im going to close this ticket as it seems to be an implementation detail. If you'd like to come discuss connection pools, were in #vapor discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants