Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring DefaultAddressResolverGroup by flag #2261

Merged
merged 6 commits into from
Nov 15, 2019

Conversation

ikhoon
Copy link
Contributor

@ikhoon ikhoon commented Nov 14, 2019

Motivation:
Domain name resolution failure was reported on Windows OS. #2243
Before solving the original problem, this workaround enables DefaultAddressResolverGroup by the flag. This flag is disabled by default.
The DefaultAddressResolverGroup uses JDK's built-in domain name lookup mechanism. See here for more information.

Modification:

  • Add useJdkDnsResolver flag to enable DefaultAddressResolverGroup.
  • Use DefaultAddressResolverGroup.INSTANCE if it is possible.

Result:
A user can now use DefaultAddressResolverGroup by enabling useJdkDnsResolver flag.

Motivation:
Domain name resolution failure was reported on Windows OS line#2243
Before solving the origin problem, this workaround enables [`DefaultAddressResolverGroup`](https://netty.io/4.1/api/io/netty/resolver/class-use/DefaultAddressResolverGroup.html)
by default for Windows OS.
The `DefaultAddressResolverGroup` uses JDK's built-in domain name lookup mechanism.
See [here](https://netty.io/4.1/api/io/netty/resolver/DefaultNameResolver.html) for more information.

Modification:
* Add `useDefaultDnsResolver` flag to enable `DefaultAddressResolverGroup` by default for Windows OS
* Register DefaultAddressResolverGroup.INSTANCE if it is possible.

Result:
A user who runs Armeria on Windows OS can now use `DefaultAddressResolverGroup` for domain name resolution by default.
@ikhoon ikhoon added the defect label Nov 14, 2019
@ikhoon ikhoon added this to the 0.96.0 milestone Nov 14, 2019
@codecov
Copy link

codecov bot commented Nov 14, 2019

Codecov Report

Merging #2261 into master will decrease coverage by 0.04%.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2261      +/-   ##
============================================
- Coverage      73.6%   73.56%   -0.05%     
+ Complexity     9807     9802       -5     
============================================
  Files           859      859              
  Lines         37721    37727       +6     
  Branches       4648     4649       +1     
============================================
- Hits          27766    27754      -12     
- Misses         7578     7599      +21     
+ Partials       2377     2374       -3
Impacted Files Coverage Δ Complexity Δ
.../linecorp/armeria/client/ClientFactoryBuilder.java 53.1% <0%> (-0.75%) 26 <0> (ø)
.../linecorp/armeria/client/DefaultClientFactory.java 77.04% <100%> (+0.38%) 20 <1> (+1) ⬆️
...c/main/java/com/linecorp/armeria/common/Flags.java 61.37% <100%> (+0.33%) 55 <1> (+1) ⬆️
...com/linecorp/armeria/client/HttpClientFactory.java 89.13% <100%> (+0.11%) 36 <1> (+1) ⬆️
.../linecorp/armeria/internal/Http2ObjectEncoder.java 71.42% <0%> (-8.58%) 12% <0%> (-1%)
...om/linecorp/armeria/client/HttpSessionHandler.java 61.01% <0%> (-8.48%) 29% <0%> (-3%)
...meria/common/stream/RegularFixedStreamMessage.java 81.63% <0%> (-6.13%) 13% <0%> (-1%)
...meria/internal/AbstractHttp2ConnectionHandler.java 93.1% <0%> (-3.45%) 13% <0%> (-1%)
...a/com/linecorp/armeria/common/util/Exceptions.java 40.35% <0%> (-2.64%) 26% <0%> (-1%)
.../com/linecorp/armeria/server/docs/ServiceInfo.java 78.94% <0%> (-1.76%) 20% <0%> (-1%)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b478d56...fff0639. Read the comment docs.

@anuraaga
Copy link
Collaborator

Looking at the linked doc this seems to do blocking I/O - so I think we'd have to do more refactoring to use blocking executor. Not sure if it's worth it without getting more reports. I guess with Windows it's easy to enable the firewall on your app which could block udp packets coming back

@ikhoon
Copy link
Contributor Author

ikhoon commented Nov 14, 2019

Looking at the linked doc this seems to do blocking I/O.

When I was implementing this logic(yesterday), I also read the documentation and got advice from @trustin.
He said we don't need to change the executor(EventLoop). It was late evening so I couldn't ask him anymore. 😅 I was also wondering about the reason. I think this is the time to ask him. 😎

@trustin Could you tell us why we don't need to change the given executor? 🙏

@ikhoon
Copy link
Contributor Author

ikhoon commented Nov 14, 2019

I think we'd have to do more refactoring to use blocking executor.

Had talked with @trustin. DefaultNameResolver does not run a domain name resolution on the given EventExecutor.
https://github.com/netty/netty/blob/b91dd0678d290a70817ddc44bb5ced2baed275fd/resolver/src/main/java/io/netty/resolver/DefaultNameResolver.java#L40-L47
To make DefaultNameResolver use blocking executor, I think we might need to override upstream behavior.

I'm still wondering if it is worth implementing? Because DNS is cacheable, we will fix the original issue on the next step.
I will leave some comments about this on code :-) WDYT?

@anuraaga
Copy link
Collaborator

The performance might be ok - we'd need to verify that it's not possible to deadlock. I'm not sure what would happen if a client and DNS request use the same event loop, but I have seen deadlock when blocking the event loop during a client request so I'm very scared of doing it.

I might rethink whether this is really needed (I'm currently assuming we've only had one report, the attached one, which might not indicate such a big hammer but if there are more you know about then it might be), but even if adding I'd recommend not making it default. Tests working on appveyor (and my laptop and GitHub's Azure VMs :) ) I think give us reasonable confidence in our current defaults

@ikhoon
Copy link
Contributor Author

ikhoon commented Nov 14, 2019

I have seen deadlock when blocking the event loop during a client request so I'm very scared of doing it.

Just curious, Could you explain to me how blocking causes deadlock? 😀

@anuraaga
Copy link
Collaborator

I've seen it here - https://github.com/openzipkin/zipkin/pull/2703/files

It happens when a client request is made in a blocking way from a server request - if the client and server request are being handled by the same event loop, the client will wait for I/O forever while the server has blocked the event loop waiting for the client request to finish. I'm not sure if the same is an issue in this case, but it makes me very nervous :)

@ikhoon
Copy link
Contributor Author

ikhoon commented Nov 14, 2019

Tests working on appveyor (and my laptop and GitHub's Azure VMs :)

Could you take a DNS test for http://echo.jsontest.com and http://json.org ? If you can test it on your CI and laptop, I want to get your result. 😀

but even if adding I'd recommend not making it default.

We have discussed Windows domain resolution issue and decided to divide this into 3 steps.

1st step

  • Offer DefaultAddressResolverGroup which resolves DNS using Jdk resolver.
  • Turn off default option for every OS.
  • Use Blocking operation.

If this is fine, we could merge this PR after turning off the flag. If you don't agree with the blocking operation even though it turns off the default option, I will fork Netty DefaultAddressResolverGroup and fetch it to run domain resolution with a given executor. 🛠

2nd step

  • After testing, if Netty async DNS resolver has a problem with basic Windows configuration, turning on default flag only for Window.

3th step

  • Investigating the original problem and fetch Netty or Armeria code.

WDYT? @anuraaga /cc @minwoox @trustin

@anuraaga
Copy link
Collaborator

I think that looks fine since we don't change the default without more investigation. Only thing I'd add is for 2nd step, if changing the default also be sure that the blocking won't cause deadlock. I guess it won't - the pattern is probably doing one more I/O after the block which wouldn't happen since DNS is the lowest level possible - but want to be confidence since it's very confusing to debug

@ikhoon
Copy link
Contributor Author

ikhoon commented Nov 14, 2019

Only thing I'd add is for 2nd step, if changing the default also be sure that the blocking won't cause deadlock.

That sounds good to me. When we need to change the default, I will change the default DNS resolver to run with blocking executor. 👍

@ikhoon ikhoon changed the title Enable DefaultAddressResolverGroup by default for Windows OS Allow configuring DefaultAddressResolverGroup by flag Nov 14, 2019
@ikhoon
Copy link
Contributor Author

ikhoon commented Nov 14, 2019

Updated:

  • This flag is disabled by default.
  • Update PR title, description, and code.

core/src/main/java/com/linecorp/armeria/common/Flags.java Outdated Show resolved Hide resolved
core/src/main/java/com/linecorp/armeria/common/Flags.java Outdated Show resolved Hide resolved
core/src/main/java/com/linecorp/armeria/common/Flags.java Outdated Show resolved Hide resolved
Comment on lines +24 to +25
import org.junit.jupiter.api.condition.DisabledIfSystemProperty;
import org.junit.jupiter.api.condition.EnabledIfSystemProperty;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for education :-)

Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @ikhoon!

@trustin trustin merged commit 3e4876b into line:master Nov 15, 2019
@trustin trustin removed the defect label Nov 23, 2019
@ikhoon ikhoon deleted the dns-for-window branch January 3, 2020 02:51
fmguerreiro pushed a commit to fmguerreiro/armeria that referenced this pull request Sep 19, 2020
Motivation:
Domain name resolution failure was reported on Windows OS. line#2243
Before solving the original problem, this workaround enables [`DefaultAddressResolverGroup`](https://netty.io/4.1/api/io/netty/resolver/class-use/DefaultAddressResolverGroup.html) by the flag. This flag is disabled by default.
The `DefaultAddressResolverGroup` uses JDK's built-in domain name lookup mechanism. See [here](https://netty.io/4.1/api/io/netty/resolver/DefaultNameResolver.html) for more information.

Modification:
* Add `useJdkDnsResolver` flag to enable `DefaultAddressResolverGroup`.
* Use `DefaultAddressResolverGroup.INSTANCE` if it is possible.

Result:
A user can now use  `DefaultAddressResolverGroup` by enabling `useJdkDnsResolver` flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants