Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: dnscrypt-proxy service fails to start often #841

Closed
matbech opened this issue May 30, 2019 · 10 comments · Fixed by #843
Closed

Windows: dnscrypt-proxy service fails to start often #841

matbech opened this issue May 30, 2019 · 10 comments · Fixed by #843

Comments

@matbech
Copy link
Contributor

matbech commented May 30, 2019

The dnscrypt-proxy service fails to start often (50%) during system startup and the following 3 relevant events are logged in the event log:

Warning 30.05.2019 18:32:58 DNS Client Events 1014 (1014)
Name resolution for the name raw.githubusercontent.com timed out after none of the configured DNS servers responded.

Error 30.05.2019 18:32:58 Service Control Manager 7000 None
The dnscrypt-proxy service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.

Error 30.05.2019 18:32:58 Service Control Manager 7009 None
A timeout was reached (45000 milliseconds) while waiting for the dnscrypt-proxy service to connect.

This might be related to this issue reported before: #548

My understanding is that dnscrypt-proxy is trying to update the sources.'public-resolvers' during the startup. The first entry in the urls list is:
https://raw.githubusercontent.com/DNSCrypt/dnscrypt-resolvers/master/v2/public-resolvers.md
Now dnscrypt-proxy seems to use the system dns client to resolve the raw.githubusercontent.com host name. However, the system dns client is configured to use dnscrypt-proxy (127.0.0.1) which is starting up.

Using dnscrypt-proxy version 2.0.23
netprobe_timeout was set to 60 and to 0 and it didn't affect the issue.

network connectivity (Ethernet connection) should be available before the service is starting.

@jedisct1
Copy link
Member

If network connectivity takes a long time to be available, you should significantly increase netprobe_timeout.

Setting it to 0 does the opposite, and should only be used if the device has constant, reliable connectivity.

@matbech
Copy link
Contributor Author

matbech commented May 30, 2019

The device does have a constant and reliable connectivity. But changing the netprobe_timeout setting did not affect the issue, it was already present with the default setting (60).

I have now changed the ignore_system_dns config from the default (false) to true:
ignore_system_dns = true

@jedisct1
Copy link
Member

Name resolution for the name raw.githubusercontent.com timed out after none of the configured DNS servers responded.

What are the configured DNS resolvers? Is there a valid one besides 127.0.0.1?

@matbech
Copy link
Contributor Author

matbech commented May 30, 2019

Only 127.0.0.1

@matbech
Copy link
Contributor Author

matbech commented May 30, 2019

I have done some additional debugging. Here are the relevant config options:
fallback_resolver = '1.1.1.1:53'
ignore_system_dns = true
netprobe_timeout = 60

System configured DNS servers: 127.0.0.1

Now to reproduce the issue:

  • Delete public-resolvers.md

  • Restart system

  • dnscrypt-proxy service fails to start. This is the log:
    [2019-05-30 21:05:19] [DEBUG] Rebuilding transport
    [2019-05-30 21:05:19] [DEBUG] Cache file [public-resolvers.md] not present
    [2019-05-30 21:05:19] [INFO] Loading source information from URL [https://raw.githubusercontent.com/DNSCrypt/dnscrypt-resolvers/master/v2/public-resolvers.md]
    [2019-05-30 21:05:19] [DEBUG] Ignoring system DNS
    [2019-05-30 21:05:19] [DEBUG] Resolving [raw.githubusercontent.com] using fallback resolver [1.1.1.1:53]
    [2019-05-30 21:05:19] [DEBUG] Cache file [public-resolvers.md.minisig] is still fresh
    [2019-05-30 21:05:19] [DEBUG] Delay till next update: 21h28m52.8193667s
    [2019-05-30 21:05:19] [INFO] Loading from [https://raw.githubusercontent.com/DNSCrypt/dnscrypt-resolvers/master/v2/public-resolvers.md] failed
    [2019-05-30 21:05:19] [DEBUG] Cache file [public-resolvers.md] not present
    [2019-05-30 21:05:19] [INFO] Loading source information from URL [https://download.dnscrypt.info/resolvers-list/v2/public-resolvers.md]
    [2019-05-30 21:05:19] [DEBUG] Ignoring system DNS
    [2019-05-30 21:05:19] [DEBUG] Resolving [download.dnscrypt.info] using fallback resolver [1.1.1.1:53]
    [2019-05-30 21:05:19] [DEBUG] Cache file [public-resolvers.md.minisig] is still fresh
    [2019-05-30 21:05:19] [DEBUG] Delay till next update: 21h28m52.8064769s
    [2019-05-30 21:05:19] [INFO] Loading from [https://download.dnscrypt.info/resolvers-list/v2/public-resolvers.md] failed
    [2019-05-30 21:05:19] [CRITICAL] Unable to use source [public-resolvers]: [No IP found for [download.dnscrypt.info]]
    [2019-05-30 21:05:19] [FATAL] No servers configured

  • Now, I manually start the service
    [2019-05-30 21:06:01] [DEBUG] Rebuilding transport
    [2019-05-30 21:06:01] [DEBUG] Cache file [public-resolvers.md] not present
    [2019-05-30 21:06:01] [INFO] Loading source information from URL [https://raw.githubusercontent.com/DNSCrypt/dnscrypt-resolvers/master/v2/public-resolvers.md]
    [2019-05-30 21:06:01] [DEBUG] Ignoring system DNS
    [2019-05-30 21:06:01] [DEBUG] Resolving [raw.githubusercontent.com] using fallback resolver [1.1.1.1:53]
    [2019-05-30 21:06:01] [DEBUG] [raw.githubusercontent.com] IP address [151.101.120.133] added to the cache
    [2019-05-30 21:06:01] [DEBUG] Cache file [public-resolvers.md.minisig] is still fresh
    [2019-05-30 21:06:01] [DEBUG] Delay till next update: 21h28m10.9216139s
    [2019-05-30 21:06:01] [NOTICE] Source [public-resolvers.md] loaded
    ...

I have also added the "Dnscache" (DNS Client) to the service's dependencies (DependOnService), but the result is the same.

@jedisct1
Copy link
Member

Is 1.1.1.1 accessible? Can you try with another resolver, maybe your router IP address or 9.9.9.9?

@matbech
Copy link
Contributor Author

matbech commented May 30, 2019

It's the same problem with 9.9.9.9 etc.

Can you add some logging here:
https://github.com/jedisct1/dnscrypt-proxy/blob/master/dnscrypt-proxy/config.go#L424

I would like to know if netProbe actually succeeds. There are some cases where errors are not logged. For example in this case:
https://github.com/jedisct1/dnscrypt-proxy/blob/master/dnscrypt-proxy/config.go#L615

If possible, always log if network connectivity is detected. This is currently only the case if retried is true.
https://github.com/jedisct1/dnscrypt-proxy/blob/master/dnscrypt-proxy/config.go#L639

@jedisct1
Copy link
Member

Looks like you know Go, and have issues on an operating system I don't use. So your help to sort this out will be appreciated.

@matbech
Copy link
Contributor Author

matbech commented May 30, 2019

See: #842
Will debug it here now and propose a fix.

@matbech
Copy link
Contributor Author

matbech commented May 30, 2019

Fix and explanation is here: #843

jedisct1 pushed a commit that referenced this issue May 31, 2019
Write at least 1 byte. This ensures that sockets are ready to use for writing.
Windows specific: during the system startup, sockets can be created but the underlying buffers may not be setup yet. If this is the case Write fails with WSAENOBUFS: "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full
This fixes: #841
@DNSCrypt DNSCrypt locked and limited conversation to collaborators Jun 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants