Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS query fails when querying with DC and the DC name has suffix that conflicts with domain/alt_domain #17156

Closed
shamil opened this issue Apr 26, 2023 · 4 comments · Fixed by #17160
Labels
theme/dns Using Consul as a DNS provider, DNS related issues

Comments

@shamil
Copy link
Contributor

shamil commented Apr 26, 2023

Overview of the Issue

When domain/alt_domain name begins with a portion of the DC name, dns query may fail.
It will be easier to see the issue in the reproduction steps below...

Reproduction Steps

For the sake of reproduction, I will start consul in development mode, but the issue also occurs in production deployments.

Steps to reproduce this issue:

  1. Run consul with the following params:
    # see "datacenter" and "alt-domain", they both have the "test" string, this is what causes the bug
    # see below...
    consul agent -dev -datacenter dc-test -alt-domain test.consul
  2. Run the following DNS queries
    # works with alt-domain
    dig +short -p 8600 @localhost consul.service.test.consul 
    
    # works with main domain
    dig +short -p 8600 @localhost consul.service.consul 
    
    # works with DC and alt-domain
    dig +short -p 8600 @localhost consul.service.dc-test.test.consul 
    
    # doesn't work, with DC and main domain,
    # because consul thinks that DC is "dc-" and "test.consul" is alt-domain
    dig +short -p 8600 @localhost consul.service.dc-test.consul 

Log Fragments

In the logs, for the last example in the reproduction steps, we can clearly see that consul doesn't parse the DC properly:

2023-04-26T20:52:52.914+0300 [WARN]  agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc- method=Health.ServiceNodes

Additional information

  1. Currently, to mitigate this issue adding dot (.) to do alt-domain helps, eg. alt-domain=.test.consul, but I'm not sure this is the desired behaviour.
  2. The issue occurs even if I swap domain with alt-domain, e.g alt-domain=consul, domain=test.consul, so it's not specific to alt-domain.

Consul info for both Client and Server

Client/Server info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease = 
	revision = 5e08e229
	version = 1.15.2
	version_metadata = 
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = true
	leader_addr = 127.0.0.1:8300
	server = true
raft:
	applied_index = 39
	commit_index = 39
	fsm_pending = 0
	last_contact = 0
	last_log_index = 39
	last_log_term = 2
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:ae8ab07f-3bbc-83b3-1386-d7ca940201c3 Address:127.0.0.1:8300}]
	latest_configuration_index = 0
	num_peers = 0
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 2
runtime:
	arch = amd64
	cpu_count = 12
	goroutines = 139
	max_procs = 12
	os = linux
	version = go1.20.1
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 1
	event_time = 2
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

@jkirschner-hashicorp jkirschner-hashicorp added the theme/dns Using Consul as a DNS provider, DNS related issues label Apr 26, 2023
@jkirschner-hashicorp
Copy link
Contributor

I suspect this might be because of the trimDomain function:

consul/agent/dns.go

Lines 1050 to 1062 in f93bb65

func (d *DNSServer) trimDomain(query string) string {
longer := d.domain
shorter := d.altDomain
if len(shorter) > len(longer) {
longer, shorter = shorter, longer
}
if strings.HasSuffix(query, longer) {
return strings.TrimSuffix(query, longer)
}
return strings.TrimSuffix(query, shorter)
}

If the query is consul.service.dc-test.consul , domain is consul, and altDomain is test.consul... then L1058 will cause test.consul to be trimmed from query, returning consul.service.dc- as the query with the domain removed.

@shamil
Copy link
Contributor Author

shamil commented Apr 26, 2023

Thanks @jkirschner-hashicorp for pointing out to the code, I guess this can be easily fixed, with something like:

longer := "." + strings.TrimLeft(d.domain, ".")
shorter := "." + strings.TrimLeft(d.altDomain, ".")

@jkirschner-hashicorp
Copy link
Contributor

@shamil : Would you be interested in contributing such a fix? If so, here is some guidance on making changes to Consul.

@shamil
Copy link
Contributor Author

shamil commented Apr 26, 2023

@jkirschner-hashicorp sure, submitted #17160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/dns Using Consul as a DNS provider, DNS related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants