Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse DNS lookup during SSL handshake causes Administrator slowdown #2987

Closed
rbeckman-nextgen opened this issue May 11, 2020 · 6 comments
Closed
Milestone

Comments

@rbeckman-nextgen
Copy link
Collaborator

@rbeckman-nextgen rbeckman-nextgen commented May 11, 2020

We've had various complaints about the Administrator being slow whenever it performs a request to the server.

http://www.mirthcorp.com/community/forums/showthread.php?t=8777
http://www.mirthcorp.com/community/forums/showthread.php?t=9494
http://www.mirthcorp.com/community/forums/showthread.php?t=9531

We've finally narrowed down the problem to a reverse DNS lookup that Java performs during an SSL handshake. In Windows this lookup needs to timeout if no hostname is found which causes the slowdown. Note that this problem should only occur if an IP address is used for the connection. This problem should exist in both 2.x and 3.x, but may seem worse in 3.x because 3.x allows multiple connections to the server to (ironically) make the Administrator feel more responsive. Each time a connection is created this lag of a few seconds created. If a connection is reused there is no lag. Therefore, if one were to set a fast refresh time (ie. 5 seconds) on the dashboard, after the first request the connections are reused and everything seems responsive. Also ironically, if one were to set a longer refresh time (ie. 15 seconds), a new connection is created for each refresh and the Administrator becomes unresponsive during each refresh.

Our timeout settings are currently

idleConnectionTimeoutThread.setTimeoutInterval(5000);
idleConnectionTimeoutThread.setConnectionTimeout(5000);

Here are some links that help explain the problem and show what others have tried.

"To fix the problem, cache your server address as an InetAddress object and reuse it in the Socket constructor whenever you are making a new connection to your server." (We don't call the socket constructor directly so this may not help us)
http://www.velocityreviews.com/forums/t147274-very-slow-ssl-connection-from-win-to-linux.html

Another possible solution by extending SSLSocketFactory but our stack trace never shows this being called so not sure if it will work.
http://lists.spline.inf.fu-berlin.de/pipermail/jacorb-developer/2013-June/000303.html

A lot of background information on the problem
https://forums.oracle.com/thread/1534033
http://www.velocityreviews.com/forums/t298104-ssl-how-to-suppress-reverse-dns-lookups.html

Temporary Workaround
If someone really wants to fix this problem for a specific client machine connecting to a specific server, they can make sure a DNS record is accessible for the server in one of they following methods: local cache lookup, WINS server query, broadcast, LMHOSTS lookup, Hosts lookup, and DNS server query. The easiest method would probably be to edit the HOSTS file on their client machine.

If we want to really deal with this problem though we need to prevent the reverse DNS lookup from occurring at all. Based on what others have said in the links above, it doesn't appear as if its necessary. When it times out, the getHostFromNameService method simply returns the IP address anyways.

Here is the stack trace to the offending line. Although ServerConnection is a class on the server, this actually occurs on the Administrator.

InetAddress.getHostFromNameService(InetAddress, boolean) line: 559
Inet4Address(InetAddress).getHostName(boolean) line: 502
Inet4Address(InetAddress).getHostName() line: 474
SSLSocketImpl.getHost() line: 1956
ClientHandshaker(Handshaker).getHostSE() line: 257
ClientHandshaker.getKickstartMessage() line: 1023
ClientHandshaker(Handshaker).kickstart() line: 620
SSLSocketImpl.kickstartHandshake() line: 1290
SSLSocketImpl.performInitialHandshake() line: 1187
SSLSocketImpl.writeRecord(OutputRecord, boolean) line: 654
AppOutputStream.write(byte[], int, int) line: 100
BufferedOutputStream.flushBuffer() line: 65
BufferedOutputStream.flush() line: 123
PostMethod(EntityEnclosingMethod).writeRequestBody(HttpState, HttpConnection) line: 502
PostMethod(HttpMethodBase).writeRequest(HttpState, HttpConnection) line: 1973
PostMethod(HttpMethodBase).execute(HttpState, HttpConnection) line: 993
HttpMethodDirector.executeWithRetry(HttpMethod) line: 397
HttpMethodDirector.executeMethod(HttpMethod) line: 170
HttpClient.executeMethod(HostConfiguration, HttpMethod, HttpState) line: 396
HttpClient.executeMethod(HttpMethod) line: 324
ServerConnection.executePostMethodAsync(String, NameValuePair[]) line: 219
Client.getChannelStatusList() line: 963
Frame$18.doInBackground() line: 2428
Frame$18.doInBackground() line: 1
SwingWorker$1.call() line: 277
FutureTask$Sync.innerRun() line: 303
SwingWorker$2(FutureTask).run() line: 138
Frame$18(SwingWorker<T,V>).run() line: 316
ThreadPoolExecutor$Worker.runTask(Runnable) line: 895
ThreadPoolExecutor$Worker.run() line: 918
Thread.run() line: 662

Imported Issue. Original Details:
Jira Issue Key: MIRTH-3070
Reporter: wayneh
Created: 2013-11-25T10:18:38.000-0800

@rbeckman-nextgen rbeckman-nextgen added this to the 3.0.1 milestone May 11, 2020
@rbeckman-nextgen
Copy link
Collaborator Author

@rbeckman-nextgen rbeckman-nextgen commented May 11, 2020

Here's some more information on the TEMPORARY workaround that can applied by editing the HOSTS file. This is an advanced workaround and the user assumes all risk for applying it. This only affects Windows machines and this is only necessary if the user is connecting via an IP Address. If their server is accessible via a domain name, they should just use that instead.

  1. On the Windows machine that will be running the Administrator, find out where your HOSTS file is http://www.rackspace.com/knowledge_center/article/how-do-i-modify-my-hosts-file

  2. Backup the HOSTS file (optional)

  3. Edit the HOSTS file (Administrator access is required to save it). They can either run notepad or a text editing program as administrator, or they can copy the HOSTS file to a place it can be edited (ie. desktop), then copy it back to the original folder and replace the old file.

  4. Add a line at the end that consists of . For instance, if the ip address that your Administrator will be connecting to is 192.168.1.100, you would add a new line to the HOSTS file that consists of the following: 192.168.1.100 192.168.1.100
    There MUST be a space in between.

Once the file has been saved or overwritten, that should alleviate the issue. No restart should be necessary. Note that this only improves load times that were slow due to the reverse DNS problem. A user must still consider their latency to the server and the amount of data being transferred (ie. # of channels etc).

Imported Comment. Original Details:
Author: wayneh
Created: 2013-11-26T13:35:14.000-0800

@rbeckman-nextgen
Copy link
Collaborator Author

@rbeckman-nextgen rbeckman-nextgen commented May 11, 2020

First of all, it is not possible to remove the 4-5 second timeout during the SSL handshake if no reverse DNS record exists. Therefore we are now allowing SSL connections from the Administrator to remain idle for 24 hours before closing the connection. Each time a new connection is created there will be an extra overhead of 4-5 seconds. However a user should typically only notice this once or twice when opening a new Administrator. Since the same connections are now reused the Administrator should feel a lot smoother afterwards.

Once again this problem only existed if

  1. Administrator is running on Windows
  2. Trying to connect to a server via IP Address
  3. No reverse DNS hostname can be found anywhere.

It's still possible to remove the overhead completely by

  1. Accessible the server via a domain name
  2. Adding the IP address to the HOSTS file
  3. Enabling an nmbd service on the server if the Administrator and Server are on LAN.
  4. etc

Imported Comment. Original Details:
Author: wayneh
Created: 2013-11-27T15:53:05.000-0800

@rbeckman-nextgen
Copy link
Collaborator Author

@rbeckman-nextgen rbeckman-nextgen commented May 11, 2020

Verified that the problem has now been alleviated for the most part. Tested by editing a channel, waiting a while, and then attempting to go back to the Channels view. Before the UI would appear to freeze for about 4-5 seconds before finally switching, but that no longer happens after the change. I did not notice the slowdown happen even a first time after the change. Obviously it would still happen at most once or twice a day or so if a user leaves the administrator open indefinitely, but that doesn't really matter.

Imported Comment. Original Details:
Author: narupley
Created: 2013-12-03T10:17:22.000-0800

@rbeckman-nextgen
Copy link
Collaborator Author

@rbeckman-nextgen rbeckman-nextgen commented May 11, 2020

If the user has this issue, they'll probably still see a 4-5 second delay on "Logging in...", but shouldn't see too many noticeably delayed connections after that. I think it's still good to note if you're still seeing a 5 second delay when logging in on Windows with an IP address, this Java/Windows issue is the reason why.

Imported Comment. Original Details:
Author: jacobb
Created: 2013-12-03T13:39:20.000-0800

@rbeckman-nextgen
Copy link
Collaborator Author

@rbeckman-nextgen rbeckman-nextgen commented May 11, 2020

That may be why I didn't notice anything at all after the change, because it's just "incorporated" into the Logging In wait time.

Imported Comment. Original Details:
Author: narupley
Created: 2013-12-03T13:42:05.000-0800

@rbeckman-nextgen
Copy link
Collaborator Author

@rbeckman-nextgen rbeckman-nextgen commented May 11, 2020

The logging in opens one connection, but its extremely likely a second connection will be opened soon after logging in. The second connection being opened will be more noticeable.

FYI, After spending more time using 3.0.0 with the problem conditions, the problem was actually more severe than I first thought. Since our idle connection checks were so frequent. It's possible to get into a strange condition where every single request requires a new connection (and the reverse DNS lookup), regardless of how frequent the requests are made. The longer a request would normally take (ie. if there were a log of channels, or lag), the more likely this would occur. It's likely this happened to some people and made the Administrator pretty unbearable to use.

Regardless the situation should be much improved in 3.0.1

Imported Comment. Original Details:
Author: wayneh
Created: 2013-12-03T13:53:52.000-0800

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.