New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InternalServerError from APNs #408
Comments
Well, an
I think it's safe to describe these as "unexpected." When we get one of these errors, it means that Pushy succeeded in handing a message to the APNs server, but then the server replied and said "something has gone wrong on our side." We don't really know what went wrong, but it's generally safe to assume that it's a temporary thing that Apple will eventually try to fix.
I don't think so; APNs defines specific cases where the server will ask us to reconnect (by way of an HTTP/2 In all, I think Pushy is holding up its end of the bargain here, and the problem is on Apple's side. I don't think there are any technical changes we should make here, and so I'll close this issue for now. @bazi if you think that's incorrect, please leave a comment here and we'll happily revisit the issue. Cheers! |
I encountered the exact same problem (against Apple's sandbox environment). I had a retry mechanism causing it to retry sending the push. It continued to fail for 48 hours before I restarted the server software. After that, there were no problems anymore. Notably, the same internal error failure was encountered for both different topics used (for two different apps). Also using Pushy 0.9, but Oracle Java 1.8.0_60. |
Huh… I still think this is an upstream problem, but @markusrydh, can you be more specific about what you did to make this happen? Can you reproduce it? |
The (development) server was more or less idle (no push for a day), then the server sent a VoIP-push to the client. Immediately it got a InternalServerError and all pushes continued to get that until restarted. I googled a bit and it seems that other libraries using the HTTP/2 token-based approach is getting the same problem. It seems that a connection close/reopen seems to do the trick. Although it might be a problem on the Apple side, it would be nice if there was a workaround. I am currently thinking about having to setting up a new connection if I get repeated InternalServerErrors, but of course it would be nice if the library did this behind the scenes. Or maybe a "force close" option, to make the connection immediately close and then re-open the connection (as it would for a normal broken connection). It would be a more lightweight option than disconnect(). Just a reference to some of the issues I found in node-apn that seems to be related: |
And no, I have not been able to reproduce yet. |
Thanks for the links; I've shared some information there that I hope will be helpful to the
Seem right? If so, I'll file a bug report with Apple with the information we have so far. |
I've reported this to Apple as rdar://30168506. |
@jchambers Point 2 is questionable. I suspected before that idle connections caused issues but we got the issue in our prod environment where we usually do not have that long idle times. Anyways, your steps seem reasonable to me. |
Acknowledged; will update the ticket upstream. |
Another observation: To get rid of I have suspicions that this may be related to the issue. Not sure though. I have to mention that we use custom event loop group with 4 threads and all error logs were from the same thread, in stack trace below - nioEventLoopGroup-2-2
|
We recently upgraded our servers from Pushy 0.7.2 to 0.9.2 (and API tokens - hurrah!) and we just ran into this situation on our development environment over the weekend. I noticed in the node thread that someone put in a hack to reset every hour. Do folks have a workaround for the issue while we wait for Apple to fix their end? |
@bazi My apologies—I completely missed your comment from the end of January. Thanks for the additional data! I'm not sure what to make of it, but will look into it. @ksquaredkey Not sure. I saw your post on the mailing list, and I think that's the right place to ask. I'll see what I can learn through other channels. |
Yes, I do agree that this issue and #435 are related. We have implemented regular reconnection mechanism which is triggered every 30 minutes or on |
We've been seeing this a lot recently as well. Any chance the reconnecting mechanism could be included in Pushy itself? |
As mentioned earlier, I'd really rather not if at all possible. I acknowledge that this is a real problem for real users, but I still think the right thing to do—for now at least—is to do exactly what @bazi is doing:
I'm still trying as hard as I can to push for an upstream fix, which seems like The Right Thing To Do. If I get a clear sign that no fix is coming, we'll talk about adding a workaround directly to Pushy. |
By "upstream fix" I imagine you mean Apple?
Makes sense. I'm not holding my breath for Apple doing anything about it though :/ |
Correct. I do have reason to believe they're looking into the issue now (and maybe weren't before), so this isn't a "wait forever" situation. |
That's great news :) |
Funny, I'm back to this issue again, myself. We just tried implementing the disconnect/connect and that does not help the issue. The ApnsClient continues to send the expired token on this new connection and Apple continues to send us the InternalServerFailure response. The workarounds folks suggested to me on the mailing list (last month) of destroying and creating new ApnsClient instances does not work for us. Our server is Scala based and all of our infrastructure is immutable. It would be a major rewrite to support nuking the ApnsClients itself. Digging through the source, I figured out the crux of the issue (Apple's bug notwithstanding) is the invalidation of the authentication token "token" in the AuthenticationTokenSupplier. The only way it gets invalidate (as of 0.9.3) is on the receipt and processing of an "ExpiredProviderToken" response. Changing connections will not help. Near as I can tell, Jon's recommendation in the Radar of a GOAWAY will not help. The issue is there is only one way to clear that token - the ExpiredProviderToken. (well, ok, two ways, nuke the object containing it) So, it seems to me that we just need a mechanism to invalidate the token prior to trying an > 1 hour idle time send and let a new token be created. I did not want to get into all sorts of timeout management, etc inside pushy, and others might not care for the added complexity, so I propose: Adding a public method to ApnsClient .invalidateAuthToken() that will call .invalidateToken() on the AuthenticationTokenSupplier instance. Calling this will then case a new Auth Token to be created on the next Push Request send(). This would be just a minimal change to Pushy but still allow my code to be in control of working around Apple's bug. I will probably do a PoC of this as a Fork of Pushy but was hoping for some feedback on whether folks thought this was A Really Bad Idea. I'm stuck between Apple's bug and a ship date - I cannot wait for Apple to fix their end. Thoughts? |
This is a thoughtful and helpful question, and I feel bad for the brevity of the reply, but here it goes: I think this is fixed in #451. |
On reflection, that was way too brief. In #451, we moved token-handling into the |
Thanks Jon! I was looking on the 0.9.3 branch yesterday. Looking at master, I see what you are talking about - looks good. Given the level of changes that represents in master, am I correct in assuming that that code is still a while away from coming out in a new release? Just checking as I think I'm going to still have to make my local mod for now and throw a ticket on our board to remove it when the new version of Pushy comes out. |
I was actually planning on doing a release tomorrow-ish, but it may take a few days to move through the Sonatype tubes to Maven Central since we're moving to a new group ID ( |
oh very nice!!! You made my day (which happens to be my birthday - thanks for the present ;-) |
Folks, with apologies for the long delay, I've just received a reply from Apple requesting more information. They write:
I'd happily accept an answer to the first question ("are you still seeing this issue?") here. If you still experiencing the issue and if you're comfortable sharing bundle IDs, timestamps, IPs, and push notification details, please email me directly at jon@turo.com. Thanks! |
Well, it has been half a year that I opened this issue. Since then we had applied our custom fixes (like periodic reconnects), and there were couple of pushy releases as well. |
I had this problem, and found this thread. @bazi could you please share any graceful way to do periodic reconnects from pushy client? |
@StarWindMoonCloud This is actually very timely. Did you experience this problem recently? With Pushy? If so, could you please email me the information requested by Apple (please see #408 (comment)). Thank you! |
Sent @jchambers |
Thanks, @StarWindMoonCloud. I've passed the information along to Apple. We'll see what happens next. |
@jchambers Any update from Apple? It happened again at my side today. |
…and there's the understatement of the month ;) No reply yet, but it's still the weekend. In my experience, they usually stay focused on a problem once they've acknowledged it. Let's give them a few days, but after that, it's probably reasonable to use one of the workarounds mentioned in this thread (like closing/reopening a connection when you get an |
To follow up, I have received no additional information from Apple about this issue since last month. |
@StarWindMoonCloud sorry for late response. Here are steps to do a re-connection:
Make sure that during re-connection phase you pause all your notification sending operations. Release all your references to disconnected client(s) and use newly created client(s). |
Now that we've moved to a pooled connection model, callers don't have direct control over connections and we may need to take additional steps to address the issue within Pushy itself. Does anybody know if this is still A Thing in practice, though? |
I do still get InternalServerErrors from Apple using an admittedly old version of Pushy (0.9.2). Attaching a log. It is not an uncommon error, and I am using a manual disconnect/connecting with new connection to get it to work. Don't know if newer version of pushy would change this in any way? Will update to newer version, but maybe I can't use the latest version if I cannot restart connection? |
I'd say it's very unlikely. The problem has been reported across a wide variety of APNs libraries on a variety of platforms (a popular node.js APNs library had the same problem, for example), and so it appears to be beyond our control.
That's an oversight on my part, and we should have a new release in a couple days that works around the problem. I'd recommend holding off until then. |
This should be fixed in #529, which will ship soon. |
Fanstastic, will simplify our code quite a bit! Thanks Jon! |
We are using Pushy in our production environment and it works great! Besides we have a staging environment where tests are performed prior to launching new versions. Staging environment is not always used and most of the time it is in idle state.
Today when we started to use staging, we got
InternalServerError
errors (stack trace given below). I guess it is related to long idle state of pushy, not sure though. We didn't have any choice but to restart the app.In which cases APNs may return such an error? Should pushy be closing and reconnecting in such cases? Or is it a duty of our side to reconnect upon such an error?
Using pushy v0.9, java 1.8.0_102 (Oracle)
The text was updated successfully, but these errors were encountered: