Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS destination resolution failure when too many SRV records #2651

Closed
jklingenmeyer opened this issue Feb 24, 2021 · 5 comments
Closed

DNS destination resolution failure when too many SRV records #2651

jklingenmeyer opened this issue Feb 24, 2021 · 5 comments

Comments

@jklingenmeyer
Copy link
Contributor

Description

DNS core resolver fails in returning a valid IP when there are too many SRV results in the DNS reply.
It acts like if no records were found, so request is not relayed and a 478 reply is generated instead (in the example of a DNS name in $ru or $du).

Troubleshooting

Reproduction

It is easy to reproduce with DNS failover + NAPTR enabled (cf parameters used far below)
and with such DNS records:

# dig +short NAPTR ko.sip.provider.com
50 30 "S" "SIP+D2U" "" _sip._udp.ko.sip.provider.com.

# dig +short SRV _sip._udp.ko.sip.provider.com.
10 10 5060 endpoint-01.k0.sip.provider.com.
10 10 5060 endpoint-02.k0.sip.provider.com.
10 10 5060 endpoint-03.k0.sip.provider.com.
10 10 5060 endpoint-04.k0.sip.provider.com.
10 10 5060 endpoint-05.k0.sip.provider.com.
10 10 5060 endpoint-06.k0.sip.provider.com.
10 10 5060 endpoint-07.k0.sip.provider.com.
10 10 5060 endpoint-08.k0.sip.provider.com.
10 10 5060 endpoint-09.k0.sip.provider.com.

# Each SRV result above has a corresponding
# 'A' record so that command below gives a correct IP:
# dig +short A endpoint-01.k0.sip.provider.com.

To reproduce, relay a request towards it, like:
$du="sip:ko.sip.provider.com"

Debugging data

One interesting thing is that Kamailio behaves exactly the same as the sip-dig tool.
But sip-dig seems to be limited on the DNS reply size it can handle (cf my comment below about the RFC).
Does Kamailio have this same kind of limitation regarding DNS resolution?

Log Messages

Failure example: with 9 SRV records
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (ko.sip.provider.com(26), 35), h=275
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff6a20f0000, 0x7ff6a27777d8), called from core: core/dns_cache.c: dns_destroy_entry(151)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff6a27777a0 alloc'ed from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 58) called from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 64) returns address 0x7ff72363d8f8 frag. 0x7ff72363d8c0 (size=64) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 92) called from core: core/resolve.c: dns_naptr_parser(405)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 96) returns address 0x7ff72363d9a0 frag. 0x7ff72363d968 (size=96) on 1 -th hit
DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300dba7, end=0x558fb300dba7)
DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300dba7, end=0x558fb300dba7)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) returns address 0x7ff6a27748a8 frag. 0x7ff6a2774870 (size=232) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff72363d9a0), called from core: core/resolve.c: free_rdata_list(678)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff72363d968 alloc'ed from core: core/resolve.c: dns_naptr_parser(405)
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff72363d8f8), called from core: core/resolve.c: free_rdata_list(679)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff72363d8c0 alloc'ed from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a27748a8 (ko.sip.provider.com, 35), 35, *(nil)) (0)
DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding ko.sip.provider.com(26) 35 (flags=0) at 275
DEBUG: <core> [core/dns_cache.c:2614]: dns_naptr_sip_iterate(): found a valid sip NAPTR rr _sip._udp.ko.sip.provider.com, proto 1
DEBUG: <core> [core/resolve.c:1182]: naptr_choose(): o:-1 w:-1 p:0, o:50 w:30 p:1
DEBUG: <core> [core/resolve.c:1197]: naptr_choose(): changed
DEBUG: <core> [core/dns_cache.c:2625]: dns_naptr_sip_iterate(): choosed NAPTR rr _sip._udp.ko.sip.provider.com, proto 1 tried: 0x0
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._udp.ko.sip.provider.com(36), 33), h=989
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._udp.ko.sip.provider.com", 0, 0), ret=-5, ip=
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._udp.ko.sip.provider.com(36), 33), h=989
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._udp.ko.sip.provider.com", 0, 0), ret=-5, ip=
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._tcp.ko.sip.provider.com(36), 33), h=772
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._tcp.ko.sip.provider.com", 0, 0), ret=-5, ip=
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sips._tcp.ko.sip.provider.com(37), 33), h=786
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sips._tcp.ko.sip.provider.com", 0, 0), ret=-5, ip=
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (ko.sip.provider.com(26), 1), h=275
DEBUG: <core> [core/dns_cache.c:2803]: dns_a_resolve(): (ko.sip.provider.com, 0) returning -7
DEBUG: <core> [core/dns_cache.c:3167]: dns_srv_sip_resolve(): (ko.sip.provider.com, 0, 0), ip, ret=-7
ERROR: tm [ut.h:284]: uri2dst2(): failed to resolve "ko.sip.provider.com" :unresolvable A or AAAA request (-7)
Comparison with a working example (only 3 SRV records)
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (ok.sip.provider.com(26), 35), h=275
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 58) called from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 64) returns address 0x7ff723613ff8 frag. 0x7ff723613fc0 (size=64) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 92) called from core: core/resolve.c: dns_naptr_parser(405)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 96) returns address 0x7ff7236140a0 frag. 0x7ff723614068 (size=96) on 1 -th hit
DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300dba7, end=0x558fb300dba7)
DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300dba7, end=0x558fb300dba7)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) returns address 0x7ff6a27755b8 frag. 0x7ff6a2775580 (size=376) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff7236140a0), called from core: core/resolve.c: free_rdata_list(678)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723614068 alloc'ed from core: core/resolve.c: dns_naptr_parser(405)
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff723613ff8), called from core: core/resolve.c: free_rdata_list(679)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723613fc0 alloc'ed from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a27755b8 (ok.sip.provider.com, 35), 35, *(nil)) (0)
DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding ok.sip.provider.com(26) 35 (flags=0) at 275
DEBUG: <core> [core/dns_cache.c:2614]: dns_naptr_sip_iterate(): found a valid sip NAPTR rr _sip._udp.ok.sip.provider.com, proto 1
DEBUG: <core> [core/resolve.c:1182]: naptr_choose(): o:-1 w:-1 p:0, o:50 w:30 p:1
DEBUG: <core> [core/resolve.c:1197]: naptr_choose(): changed
DEBUG: <core> [core/dns_cache.c:2625]: dns_naptr_sip_iterate(): choosed NAPTR rr _sip._udp.ok.sip.provider.com, proto 1 tried: 0x0
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._udp.ok.sip.provider.com(36), 33), h=989
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 68) called from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 72) returns address 0x7ff723613ff8 frag. 0x7ff723613fc0 (size=72) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 46) called from core: core/resolve.c: dns_srv_parser(318)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 48) returns address 0x7ff7236140a8 frag. 0x7ff723614070 (size=48) on 1 -th hit
DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300dbb4, end=0x558fb300dbb4)
DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300dbb4, end=0x558fb300dbb4)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 176) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 176) returns address 0x7ff6a2775900 frag. 0x7ff6a27758c8 (size=176) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff7236140a8), called from core: core/resolve.c: free_rdata_list(678)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723614070 alloc'ed from core: core/resolve.c: dns_srv_parser(318)
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff723613ff8), called from core: core/resolve.c: free_rdata_list(679)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723613fc0 alloc'ed from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a2775900 (_sip._udp.ok.sip.provider.com, 33), 33, *(nil)) (0)
DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding _sip._udp.ok.sip.provider.com(36) 33 (flags=0) at 989
DEBUG: <core> [core/dns_cache.c:2222]: dns_srv_get_nxt_rr(): (0x7ff6a2775900, 0, 0, 1457300027): selected 0/1 in grp. 0 (rand_w=0, rr=0x7ff6a2775968 rd=0x7ff6a2775980 p=10 w=10 rsum=10)
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (endpoint.ok.sip.provider.com(38), 1), h=530
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 70) called from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 72) returns address 0x7ff723613ff8 frag. 0x7ff723613fc0 (size=72) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 4) called from core: core/resolve.c: dns_a_parser(474)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 8) returns address 0x7ff7236140a8 frag. 0x7ff723614070 (size=8) on 1 -th hit
DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300db8e, end=0x558fb300db8e)
DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300db8e, end=0x558fb300db8e)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 136) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 136) returns address 0x7ff6a2775a18 frag. 0x7ff6a27759e0 (size=136) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff7236140a8), called from core: core/resolve.c: free_rdata_list(678)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723614070 alloc'ed from core: core/resolve.c: dns_a_parser(474)
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff723613ff8), called from core: core/resolve.c: free_rdata_list(679)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723613fc0 alloc'ed from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a2775a18 (endpoint.ok.sip.provider.com, 1), 1, *(nil)) (0)
DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding endpoint.ok.sip.provider.com(38) 1 (flags=0) at 530
DEBUG: <core> [core/dns_cache.c:2803]: dns_a_resolve(): (endpoint.ok.sip.provider.com, 0) returning 0
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._udp.ok.sip.provider.com", 0, 0), ret=0, ip=[RESOLVED_IP]
DEBUG: <core> [core/dns_cache.c:3241]: dns_naptr_sip_resolve(): (ok.sip.provider.com, 0, 0), srv0, ret=0

Possible Solutions

I had a quick look inside the code and did not find any limitation about a maximum number of records.
There are some max defined in dns_cache.c but I did not found a relation between them and my issue.

Could there be a limitation in result size? Here is what I got from my RFCs reading regarding that:

  • Extract from RFC 2782 DNS RR (mentioned in RFC 3263 as being the RFC to follow for implementing DNS in SIP):

Currently there's a practical limit of 512 bytes for DNS replies.
Until all resolvers can handle larger responses, domain administrators are strongly advised to keep their SRV replies below 512 bytes.

There is a RFC about how to deal with truncated messages:

If a truncated response comes back from an SRV query, the rules described in RFC 2181 (https://tools.ietf.org/html/rfc2181#page-11) shall apply.

Additional Information

  • Kamailio Version - kamailio 5.3.8
dns_try_naptr=yes
dns_tcp_pref = 1
dns_udp_pref = 1
dns_tls_pref = 1
dns_srv_lb=yes
use_dns_failover=yes
use_dns_cache=yes
dns_cache_max_ttl=30
  • Operating System: Debian 9.13 on Docker

Thanks

@miconda
Copy link
Member

miconda commented Mar 1, 2021

I haven't implemented the DNS code in Kamailio, but if you didn't find any define setting some limits in our C code and other external tools behave the same, then maybe the limit is from the libc dns resolving functions.

@oej
Copy link
Member

oej commented Mar 1, 2021

Could it be a DNS over TCP issue?

I remember testing with crazy size SRV record sets on SIPit and don't remember any issues. Just make sure your firewall supports DNS/TCP too.

Things could have changed since then, so don't take for granted that it works today :-)

@jklingenmeyer
Copy link
Contributor Author

Thanks for your appreciated replies!

I do not think this is a firewall issue because when using dig command I get no issues. Tried with two different sets of options:

  • dig +bufsize=512 : got Truncated, retrying in TCP mode then a correct reply received through TCP
  • or simple dig : in that case I directly get the answer through UDP

But of course it does not work when disallowing TCP retry mode and setting a 512 bytes buffer size
(dig +bufsize=512 +ignore)

Tests show clearly now a limit based on packet size (512 bytes) but I still do not know where it comes from precisely.
Will investigate deeper when I will have some time.

@miconda
Copy link
Member

miconda commented Apr 1, 2021

@jklingenmeyer - have you had any time to dig in further? Is it libc/OS limitation after all, or something inside Kamailio code?

@miconda
Copy link
Member

miconda commented May 3, 2021

No activity for long time and it may be a limitation coming from library functions, as commented above.

When having new troubleshooting details, reopen.

@miconda miconda closed this as completed May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants