-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add DynamicInetProvider to get ip of gateway dynamiclly #557
Conversation
dnsjava只是负责域名解析,为什么会有这样的改善呢?我们实践中碰到了大量的Http2ChannelClosedException及Http2GoAwayException,我在pushy中加了重试机制来处理推送没有发出的情况,比如acquire channel fail和write fail的推送会重试。但对于其他异常,因为不清楚apns是否发出,所以不敢贸然重试。初步估计是因为连接质量比较低,你们那边有没有碰到这样的问题呢? |
@qiankunli 对不起,我不能用中文回答,但这是我的想法。 Some users have reported that some APNs servers are dramatically faster than others for reasons that are not entirely clear. If I'm understanding this pull request correctly (and I have not had much time to read it, regrettably), it actually accumulates a list of all of the IP addresses it has ever seen when resolving Normally, DNS resolution only gives us ~8 results at a time, but with a short TTL. If this change means that we're remembering all of the IP addresses we've seen in the past, it would mean we're distributing connections across servers that Apple is generally not asking people to use. That may mean we wind up connecting to less loaded servers, or servers that are (again, for reasons unknown) faster. It's also worth noting that right now, Pushy doesn't deliberately distribute connections even across the ~8 IPs that DNS already gives us. It may be that the performance gains that @jason-joo is seeing just comes from diversification. Some of the servers are worse, but some are better, and on average, we see better results across the board. This seems likely given the relatively high number of connections Jason is using. That said, I'm not sure if this is either the strategy or implementation we want to use. My knee-jerk reaction is that we should respect what's coming from DNS and not try to connect to servers that aren't in the rotation. I'm also a little hesitant about adding a new dependency when Netty (to some extent) already does this kind of thing. Could we achieve similar results with If there are significant performance gains to be had, we should consider this issue seriously. At the same time, I'd like to think a bit more about the specific approach. In the meantime, @jason-joo, thanks very much for the contribution! |
@jchambers Thanks for your response. I have read your post carefully and i am agree most of your opinion. I am very glad to make some explaination further.
It's correctly as you said in your post. But we must notice that: the records always need more time to be refreshed allover the world take 48 hours longest because of so many/deep dns forwarders. So if you get a record with 60 seconds ttl, it means you would better get it again from your upstream after 60 seconds, better no less time for better performance(cache), no more time for a valid record. So it doesn't means your application must refresh it after that, it is totally a protocol's design. While in practice we will see the connection to the ip gotten during application starting will not be changed due to jvm dns cache, and connection wouldn't be closed except Apple server decide to do it. On the contrary, according to above reasons, we are obeying the rules of the ttl rules of dns protocol(we take 300 seconds expiration, configurable). Maybe apple take the short ttl mainly for load balancing purpose.
Pushy added support for concurrence connections in recent versions(good job for it) to improve performance. But there is a problem: the client will only create the concurrence connections on the save server(IP address) because of that we can only get only ONE ip after dns resolving and the jvm may cache it for a quite long time during life cycle. So we will create N connections on a same server in actually startup. In this scene there are mainly two problems:
By distributing the connection among different ip we will RANDOMLY get a better running condition even when you just boot up a fresh instance. The performance for your cluster can be scaled out horizontally(to be validated in future because we have already got enough performance, 450w/15mins/5 nodes). @qiankunli 具体的原因见上面的回复,总结来说,就是基于以下几点:
目前我们并没有加太多重试(因为这个系统主要在为精准批量营销推送服务),实际观测失败率不足千分之一,未来也可以加入。 |
my thoughts from the sidelines ;) -- I agree that #2 sounds like a problem. Ideally we should keep the server's location as a String and defer DNS resolution until right before we attempt to open a TCP connection, in order to allow DNS-based load balancing to work correctly. However I am not sure that depending on a non-default DNS resolver implementation library is the right way to do it. I would prefer to keep using the JDK's builtin DNS implementation if possible, to maintain the principle of least surprise for Pushy users. |
@jmason glad to hear from you. if you are pointing to the default dns setting of user's actually server OS, i totally agree you on offering default dns server settings fetching in the implementation. |
It's true that the default JVM behaviour on DNS caching is not good -- I haven't run a service in production without the "-Dsun.net.inetaddr.ttl=10 -Dnetworkaddress.cache.ttl=10" flags in many years for this reason! I guess as long as the DynamicInetProvider behaviour is not the default, that would be ok, IMO. |
@jmason You are right. And the new feature must be enable manually to keep the forward compatibility. |
FWIW, this was fixed in #571, and just released as part of Pushy 0.12. |
I have checked it out and tested but i find it doesn't solve the problem. Because it will create all the connections at once when the worker boot up with some load. ConcurrentConnections = 16 Maybe it would be better after a LONG time running(some connections will break up)? |
And surely there is still one problem in my branch: it won't work in tomcat. I am planning to figure it after some work in hands. |
@jasonjoo2010 Thank you again for your contribution here! I think we can achieve similar results much more simply by using Netty's built-in I'm going to close this pull request for now, and would love your feedback on #594 as your time allows. Thanks again! |
provide two implementation by default:
totally same as previous
can load ip list periodly from dns servers given. maybe about 70~100 ip in list after several minutes running.
we improve the performance from 4 ~ 20k to 30 ~ 38k per node with the time reducing from 40 ~ 55 mins to less than 14 mins. 4.18m tokens with 5 2cores/4g nodes.
our config:
ipush.threadnum = 6 //io threads in nettty
ipush.connections = 128 //concurance
ipush.qps = 5000 //how many msgs remaining waiting for "completion notification", we will not add more to netty when we reach this.
before dynamic:
after dynamically
several days during optimizing: