Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue211: support dns over https if local DNS is not working / available #476

Merged

Conversation

ClemensRobbenhaar
Copy link
Contributor

As per comment: #211 (comment) and in the hope that this PR is considered "decent":

Add support for DNS-over-HTTPS lookups:

  • use the "DohResolver" from the dnsjava library to make DoH lookups
  • to enable and configure it, add two new properties
    • "enableDnsOverHttpResolves" (boolean)
    • "dnsOverHttpServer" URL to the DoH Server
  • as one use case for DoH is being located behind a firewall, also support using a proxy to access the DoH server; the proxy from the FetchHTTP bean is reused in that case

Alternatively one could have introduced separate properties for the proxy to be used for DoH; if this is the preferred approach instead of making the FetchDNS bean factory aware to load the proxy settings from the FetchHTTP bean, please let me know so I can update the PR.

- use the "DohResolver" from the dnsjava library
  to make DoH lookups
- to enable and configure it, add two new
  properties
  * "enableDnsOverHttpResolves" (boolean)
  * "dnsOverHttpServer" URL to the DoH Server
- as one use case for DoH is being located
  behind a firewall, also support using a proxy
  to access the DoH server; the proxy from
  the FetchHTTP bean is reused in that case

Fixes internetarchive#211
@ClemensRobbenhaar ClemensRobbenhaar changed the title Issue211: support dns over https if local DNS is not working / avaibable Issue211: support dns over https if local DNS is not working / available Apr 8, 2022
@ato
Copy link
Collaborator

ato commented Apr 8, 2022

Nice work! Two suggestions:

  1. Since it is mandatory to use both options together I think we should just have a single dnsOverHttpServer option which can be set to a URL to enable DOH or cleared to disable it. This is also how the other settings like the proxy ones work; there's no separate enableHttpProxy boolean option.

  2. Copying the FetchHTTP proxy settings into the global system properties seems likely to cause surprises when running multiple jobs at the same time. Since we have no choice for dnsjava but to set it globally I think it would make more sense to do the reverse and have FetchHTTP default to using the values of the proxy system properties unless overridden with per-job config. We could perhaps even add a command-line option to make setting the system properties on startup more convenient:

    ./bin/heritrix --web-admin password --proxy http://localhost:8080/
    

    That would then enable using the proxy for all modules (FetchHTTP, FetchDNS, ...) across all jobs.

@ClemensRobbenhaar
Copy link
Contributor Author

Ah, yes, I was afraid that setting the global properties will cause trouble that I have not anticipated. For concurrent jobs this will be indeed a problem. I will try to see if I can put it the other way around as suggested.

Also ok for using a single dnsOverHttpServer option to enable DoH.

An update to the PR will come as soon as I finished testing the changes, which might need a little time.

- instead of "borrowing" the configured proxy from
  the fetchHttp bean, use proxy values defined via
  global options, to avoid interference with other
  jobs running in parallel (or at least make them
  explicit).
  The "fetchHttp" bean also uses these settings,
  if no bean specific settings are used.
- remove the "enableDnsOverHttpResolves", and rely
  on a non-empty "dnsOverHttpServer" value to signal
  that DoH should be used.
@ClemensRobbenhaar
Copy link
Contributor Author

Now with an update:

  • the enableDnsOverHttpResolves property has been scrapped in favor of using a non-empty dnsOverHttpServer to use DoH
  • there are now two new command line options: --proxy-host and --proxy-port to set global proxy options. I preferred two to avoid having to parse the option value; if no --proxy-port is defined, a default value of 8000 is assumed. Also I found no intuitive short options (single key) values, so I did not define any.
  • as this means that with the SOCKS5 proxy options there are now three different ways to set a proxy, I felt this warrants a separate section in the documentation. (Actually there is a fourth way by adding -Dhttp.proxyHost=... etc to the JAVA_OPTS, but I did not document these.)

- scrap setter for "enableDnsOverHttpResolve", too
- docs: lets see if we can set a link
  to another chapter
@ato ato merged commit 55078c0 into internetarchive:master Apr 9, 2022
@ato
Copy link
Collaborator

ato commented Apr 10, 2022

Looks good. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants