Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ApnsClient is unable to reconnect #435

Closed
notz opened this issue Mar 25, 2017 · 11 comments · Fixed by #492
Closed

ApnsClient is unable to reconnect #435

notz opened this issue Mar 25, 2017 · 11 comments · Fixed by #492
Labels
Milestone

Comments

@notz
Copy link

notz commented Mar 25, 2017

After some time we get a ClientNotConnectedException and the ApnsClient is unable to reconnect. I tried several ways to to reconnect. First i implemented the suggestion like on the readme.

if (e.getCause() instanceof ClientNotConnectedException) {
        System.out.println("Waiting for client to reconnect…");
        apnsClient.getReconnectionFuture().await();
        System.out.println("Reconnected.");
    }

After that was not working i tried to force a disconnect and connect again, but that's also not working:

apnsClient.disconnect().await(TIMEOUT);
logger.info("reconnect apns client after disconnection: " + apnsClient.isConnected());
apnsClient.connect(ApnsClient.PRODUCTION_APNS_HOST).await(TIMEOUT);

By the way the apnsClient.isConnected gives false, connect without exception but the next sending fails with ClientNotConenctedException.

Currently only help is to restart the application.

Used Pushy 0.9.2 with netty-tcnative 1.1.33.Fork25

@jchambers
Copy link
Owner

Strange. How often does this happen? Is it reproducible? Two things would be helpful from a debugging perspective:

  1. Logs at DEBUG or TRACE that explain what's happening when the client is trying to reconnect.
  2. tcpdump data showing us what's happening at a network level.

Thanks!

@notz
Copy link
Author

notz commented Apr 3, 2017

It happens nearly every week (on both servers, with low traffic).

At the time it is not working I have a established connection in netstat:

tcp        0      0 xxx.xxx.xxx.xxx:56782     17.188.155.33:https     ESTABLISHED

And if the application is trying to send a push, their is nothing send (tracked with tcpdump). Their are only the ping packets every minute.

11:32:12.774346 IP (tos 0x0, ttl 64, id 59373, offset 0, flags [DF], proto TCP (6), length 127)
    xxx.xxx.xxx.xxx.56782 > 17.188.155.33.https: Flags [P.], cksum 0x636d (incorrect -> 0x1f2a), seq 674663732:674663807, ack 4125721724, win 2888, options [nop,nop,TS val 1126488330 ecr 1994097256], length 75
11:32:12.950016 IP (tos 0x0, ttl 64, id 59374, offset 0, flags [DF], proto TCP (6), length 52)
    xxx.xxx.xxx.xxx.56782 > 17.188.155.33.https: Flags [.], cksum 0x6322 (incorrect -> 0xf7d9), seq 75, ack 76, win 2888, options [nop,nop,TS val 1126488373 ecr 1994157432], length 0
11:33:12.950857 IP (tos 0x0, ttl 64, id 59375, offset 0, flags [DF], proto TCP (6), length 127)
    xxx.xxx.xxx.xxx.56782 > 17.188.155.33.https: Flags [P.], cksum 0x636d (incorrect -> 0x2472), seq 75:150, ack 76, win 2888, options [nop,nop,TS val 1126503374 ecr 1994157432], length 75
11:33:13.126473 IP (tos 0x0, ttl 64, id 59376, offset 0, flags [DF], proto TCP (6), length 52)
    xxx.xxx.xxx.xxx.56782 > 17.188.155.33.https: Flags [.], cksum 0x6322 (incorrect -> 0xd16c), seq 150, ack 151, win 2888, options [nop,nop,TS val 1126503418 ecr 1994217609], length 0
11:34:13.127012 IP (tos 0x0, ttl 64, id 59377, offset 0, flags [DF], proto TCP (6), length 127)
    xxx.xxx.xxx.xxx.56782 > 17.188.155.33.https: Flags [P.], cksum 0x636d (incorrect -> 0xbc73), seq 150:225, ack 151, win 2888, options [nop,nop,TS val 1126518418 ecr 1994217609], length 75
11:34:13.302649 IP (tos 0x0, ttl 64, id 59378, offset 0, flags [DF], proto TCP (6), length 52)
    xxx.xxx.xxx.xxx.56782 > 17.188.155.33.https: Flags [.], cksum 0x6322 (incorrect -> 0xab01), seq 225, ack 226, win 2888, options [nop,nop,TS val 1126518462 ecr 1994277785], length 0
11:35:13.303398 IP (tos 0x0, ttl 64, id 59379, offset 0, flags [DF], proto TCP (6), length 127)
    xxx.xxx.xxx.xxx.56782 > 17.188.155.33.https: Flags [P.], cksum 0x636d (incorrect -> 0x1e60), seq 225:300, ack 226, win 2888, options [nop,nop,TS val 1126533462 ecr 1994277785], length 75
11:35:13.478966 IP (tos 0x0, ttl 64, id 59380, offset 0, flags [DF], proto TCP (6), length 52)
    xxx.xxx.xxx.xxx.56782 > 17.188.155.33.https: Flags [.], cksum 0x6322 (incorrect -> 0x8496), seq 300, ack 301, win 2888, options [nop,nop,TS val 1126533506 ecr 1994337961], length 0

I have currently only enabled debug log level on pushy package and get this on send:

11:34:42.765 com.relayrides.pushy.apns.ApnsClient Failed to send push notification because client is not connected: SimpleApnsPushNotification .....
11:34:42.766 com.relayrides.pushy.apns.ApnsClient Disconnecting. 

ApnsClient says also it is not connected and connecting again is not working. And tcpdump shows that their is a open connection. I'm using DI and queueing so their is only one instance of apns client in my application. Seems that he is losing the correct state internally.

@jchambers
Copy link
Owner

Seems that he is losing the correct state internally.

Based on the information you've provided, I agree. Thank you very much for digging into the issue! We'll see what we can figure out.

@jchambers jchambers added the bug label Apr 3, 2017
@jchambers
Copy link
Owner

One more question: do you see any lines in your logs like this?

Disconnected. Next automatic reconnection attempt in {} seconds.

Thanks!

@notz
Copy link
Author

notz commented Apr 3, 2017

No. Currently if have this code in place of a client not connected exception:

if (e.getCause() instanceof ClientNotConnectedException) {
    try {
        apnsClient.disconnect().await(TIMEOUT);

        logger.info("reconnect apns client after disconnection: " + apnsClient.isConnected());

        apnsClient.connect(sandbox ? ApnsClient.DEVELOPMENT_APNS_HOST : ApnsClient.PRODUCTION_APNS_HOST)
                .await(TIMEOUT);

        if (!retry) {
            return sendMessage(...);
        }
    } catch (Exception e1) {
        logger.info("failed to reconnect: " + e.getMessage());
    }
}

@jchambers
Copy link
Owner

To clarify, what happens if you don't manually disconnect? Does the client appear to be trying to reconnect itself?

As a separate question, is there anything in the logs that gives a hint about why the client disconnected in the first place? I'm wondering if this might be related to #433.

@notz
Copy link
Author

notz commented Apr 3, 2017

I found this one time in my log (i have only the last 7 days). But that was 2 days before it stopped working.

01:17:07.410 com.relayrides.pushy.apns.ApnsClientHandler Closing channel due to ping timeout.
01:17:07.491 com.relayrides.pushy.apns.ApnsClient Disconnected. Next automatic reconnection attempt in 1 seconds.
01:17:08.502 com.relayrides.pushy.apns.ApnsClient Attempting to reconnect.
01:17:09.151 com.relayrides.pushy.apns.ApnsClient Connection to api.push.apple.com/17.188.153.36:443 restored.

But their is nothing in the log before the first ClientNotConnectedException is thrown.

12:24:00.941 com.relayrides.pushy.apns.ApnsClient Received response from APNs gateway: SimplePushNotificationResponse 
01:03:18.264 com.relayrides.pushy.apns.ApnsClient Failed to send push notification because client is not connected: SimpleApnsPushNotification 

So their was ~40 minute no push to send.

@jchambers
Copy link
Owner

Thanks. That helps narrow down the field of possibilities quite a bit!

@notz
Copy link
Author

notz commented May 31, 2017

Maybe the issue is related to some dependency pushy has.
I have now now excluded netty-boring-ssl and i don't experience the failure anymore (at least in the last 10 days). Don't know if this is related or not. I already use openssl (netty-tcnative) for a websocket server in my application.

<dependency>
    <groupId>com.relayrides</groupId>
    <artifactId>pushy</artifactId>
    <version>0.9.3</version>
    <exclusions>
        <exclusion>
             <groupId>io.netty</groupId>
             <artifactId>netty-tcnative-boringssl-static</artifactId>
         </exclusion>
    </exclusions>
</dependency>

@jchambers
Copy link
Owner

Very strange. Thanks for the update!

@binauyang
Copy link

I meet the same problem: the apnsClient.isConnected gives false, connect without exception but the next sending fails with ClientNotConenctedException. And my pushy version is 0.10.1, I need your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants