Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dnsmasq to v2.86 #1143

Closed
wants to merge 79 commits into from
Closed

Update dnsmasq to v2.86 #1143

wants to merge 79 commits into from

Conversation

DL6ER
Copy link
Member

@DL6ER DL6ER commented Jun 22, 2021

By submitting this pull request, I confirm the following:

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

How familiar are you with the codebase?:

10


New FTL features in this version:

  • Update to dnsmasq v2.86 (see dnsmasq changelog below for the highlights)
  • Support Cisco Umbrella/OpenDNS Device ID & Remote IP (see Cisco Umbrella Remote IP/Device ID support #1096)
  • Automatically generated queries (typically DNSSEC DNSKEY and DS) are now included in the Query Log and dashboard.
    They can be suppressed by setting SHOW_DNSSEC=false in /etc/pihole/pihole-FTL.conf

CHANGELOG:

Handle DHCPREBIND requests in the DHCPv6 server code.
Thanks to Aichun Li for spotting this ommision, and the initial patch.

Fix bug which caused dnsmasq to lose track of processes forked to handle TCP DNS connections under heavy load. The code checked that at least one free process table slot was available before listening on TCP sockets, but didn't take into account that more than one TCP connection could arrive, so that check was not sufficient to ensure that there would be slots for all new processes. It compounded this error by silently failing to store the process when it did run out of slots. Even when this bug is triggered, all the right things happen, and answers are still returned. Only under very exceptional circumstances, does the bug manifest itself: see https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q2/014976.html
Thanks to Tijs Van Buggenhout for finding the conditions under which the bug manifests itself, and then working out exactly what was going on.

Major rewrite of the DNS server and domain handling code. This should be largely transparent, but it drastically improves performance and reduces memory foot-print when configuring large numbers domains of the form local=/adserver.com/ or local=/adserver.com/# Lookup times now grow as log-to-base-2 of the number of domains, rather than greater than linearly, as before. The change makes multiple addresses associated with a domain work address=/example.com/1.2.3.4 address=/example.com/5.6.7.8 It also handles multiple upstream servers for a domain better; using the same try/retry alogrithms as non domain-specific servers. This also applies to DNSSEC-generated queries. Finally, some of the oldest and gnarliest code in dnsmasq has had a significant clean-up. It's far from perfect, but it is better.

Revise resource handling for number of concurrent DNS queries. This used to have a global limit, but that has a problem when using different servers for different upstream domains. Queries which are routed by domain to an upstream server which is not responding will build up and trigger the limit, which breaks DNS service for all other domains which could be handled by other servers. The change is to make the limit per server-group, where a server group is the set of servers configured for a particular domain. In the common case, where only default servers are declared, there is no effective change.

Improve efficiency of DNSSEC. The sharing point for DNSSEC RR data used to be when it entered the cache, having been validated. After that queries requiring the KEY or DS records would share the cached values. There is a common case in dual-stack hosts that queries for A and AAAA records for the same domain are made simultaneously. If required keys were not in the cache, this would result in two requests being sent upstream for the same key data (and all the subsequent chain-of-trust queries.) Now we combine these requests and elide the duplicates, resulting in fewer queries upstream and better performance. To keep a better handle on what's going on, the "extra" logging mode has been modified to associate queries and answers for DNSSEC queries in the same way as ordinary queries. The requesting address and port have been removed from DNSSEC logging lines, since this is no longer strictly defined.

Connection track mark based DNS query filtering. Thanks to Etan Kissling for implementing this It extends query filtering support beyond what is currently possible with the --ipset configuration option, by adding support for: 1) Specifying allowlists on a per-client basis, based on their associated Linux connection track mark. 2) Dynamic configuration of allowlists via Ubus. 3) Reporting when a DNS query resolves or is rejected via Ubus. 4) DNS name patterns containing wildcards. Disallowed queries are not forwarded; they are rejected with a REFUSED error code.

DL6ER and others added 30 commits April 14, 2021 08:14
Signed-off-by: DL6ER <dl6er@dl6er.de>
Fix bug which caused dnsmasq to lose track of processes forked
to handle TCP DNS connections under heavy load. The code
checked that at least one free process table slot was
available before listening on TCP sockets, but didn't take
into account that more than one TCP connection could
arrive, so that check was not sufficient to ensure that
there would be slots for all new processes. It compounded
this error by silently failing to store the process when
it did run out of slots. Even when this bug is triggered,
all the right things happen, and answers are still returned.
Only under very exceptional circumstances, does the bug
manifest itself: see
https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q2/014976.html

Thanks to Tijs Van Buggenhout for finding the conditions under
which the bug manifests itself, and then working out
exactly what was going on.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Removed empty lines from end of src/*.[ch] files.
If the new last line became '#endif'
was the condition of the '#if' added.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
This is based on the information at
https://docs.umbrella.com/umbrella-api/docs/identifying-dns-traffic and
https://docs.umbrella.com/umbrella-api/docs/identifying-dns-traffic2 .
Using --umbrella by itself will enable Remote IP reporting. This can not
be used for any policy filtering in Cisco Umbrella/OpenDNS. Additional
information can be supplied using specific option specifications,
multiple can be separated by a comma:

--umbrella=orgid:1234,deviceid=0123456789abcdef

Specifies that you want to report organization 1234 using device
0123456789abcdef. For Cisco Umbrella Enterprise, see "Register (Create)
a device" (https://docs.umbrella.com/umbrella-api/docs/create-a-device)
for how to get a Device ID and "Organization ID endpoint"
(https://docs.umbrella.com/umbrella-api/docs/organization-endpoint) to
get organizations ID. For OpenDNS Home Users, there is no organization,
see Registration API endpoint
(https://docs.umbrella.com/umbrella-api/docs/registration-api-endpoint2)
for how to get a Device ID. Asset ID should be ignored unless
specifically instructed to use by support.

Signed-off-by: Brian Hartvigsen <brian.andrew@brianandjenny.com>
Signed-off-by: DL6ER <dl6er@dl6er.de>
dhcp_config_free and dhcp_opt_free already implement the same algorithm.
Reuse them. Adds forgotten hostname cleanup to config free.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
This should be largely transparent, but it drastically
improves performance and reduces memory foot-print when
configuring large numbers domains of the form
local=/adserver.com/
or
local=/adserver.com/#

Lookup times now grow as log-to-base-2 of the number of domains,
rather than greater than linearly, as before.
The change makes multiple addresses associated with a domain work
address=/example.com/1.2.3.4
address=/example.com/5.6.7.8
It also handles multiple upstream servers for a domain better; using
the same try/retry alogrithms as non domain-specific servers. This
also applies to DNSSEC-generated queries.

Finally, some of the oldest and gnarliest code in dnsmasq has had
a significant clean-up. It's far from perfect, but it _is_ better.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Patch by srk, based on submitted patch from liaichun@huawei.com
This used to have a global limit, but that has a problem when using
different servers for different upstream domains. Queries which are
routed by domain to an upstream server which is not responding will
build up and trigger the limit, which breaks DNS service for all other
domains which could be handled by other servers. The change is to make
the limit per server-group, where a server group is the set of servers
configured for a particular domain. In the common case, where only
default servers are declared, there is no effective change.

Signed-off-by: DL6ER <dl6er@dl6er.de>
The sharing point for DNSSEC RR data used to be when it entered the
cache, having been validated. After that queries requiring the KEY or
DS records would share the cached values. There is a common case in
dual-stack hosts that queries for A and AAAA records for the same
domain are made simultaneously.  If required keys were not in the
cache, this would result in two requests being sent upstream for the
same key data (and all the subsequent chain-of-trust queries.) Now we
combine these requests and elide the duplicates, resulting in fewer
queries upstream and better performance. To keep a better handle on
what's going on, the "extra" logging mode has been modified to
associate queries and answers for DNSSEC queries in the same way as
ordinary queries. The requesting address and port have been removed
from DNSSEC logging lines, since this is no longer strictly defined.

Signed-off-by: DL6ER <dl6er@dl6er.de>
SERV_USE_RESOLV set implies struct serv_local,
so don't can't set ->arrayposn

Thanks to Xingcong Li for the cod review which led to this.

Signed-off-by: DL6ER <dl6er@dl6er.de>
If we retry a DNSSEC query because our client retries on us, and
we have an answer but are waiting on a DNSSEC query to validate it,
log the name of the DNSSEC query, not the client's query.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
In the specific case of configuring an A record for a domain

address=/example.com/1.2.3.4

queries for *example.com for any other type will now return
NOERR, and not the previous erroneous NXDOMAIN. The same thing
applies for

address=/example.com/::1:2:3:4
address=/example.com/#

Signed-off-by: DL6ER <dl6er@dl6er.de>
…ing analyzed and shown (legacy behavior)

Signed-off-by: DL6ER <dl6er@dl6er.de>
DL6ER and others added 20 commits June 26, 2021 21:51
…s owner

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Use add_update_server for everything.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Doing so makes the loading process quadratic, which is a problem
when there are a large number.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Consistently treat a non-NULL return from [ud]bus-init() as a fatal error:
either die() if still starting, or log an error and disable
the relevant module if dnsmasq has already started.

Also rationalise calls to set and check listeners depending on
configuration.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
…41af3f36

Signed-off-by: DL6ER <dl6er@dl6er.de>
Signed-off-by: DL6ER <dl6er@dl6er.de>
Buffer may need to be twice MAXDNAME is escaping is
enabled in extract_name. The name may include weird characters.

Signed-off-by: DL6ER <dl6er@dl6er.de>
3c93e8eb41952a9c91699386132d6fe83050e9be regularised ubus_init()
by avoiding logging calls (it can be called before logging is up)
but it instead returned any error from ubus_add_object() which
made such an error fatal. It turns out this is awkward, so this
patch returns NULL always, so that the event-loop will continue
attemping to connect to ubus forever.

This is not necessarily optimal either, and should be looked at
by a UBUS grown-up, but it does solve the immediate problem.

Signed-off-by: DL6ER <dl6er@dl6er.de>
…le."

This reverts commit 8a1ef367e27e570cac40d3b09920a4a60c5f7e0b.

Signed-off-by: DL6ER <dl6er@dl6er.de>
This fixes a problem with ipset processing that got recently introduced
when `extract_request` filtering was tightened. During the recent change
an incorrect assumption was made that `extract_request` was only called
for requests but with ipset it is also called when processing responses.

The fix ensures that the new filters only apply to requests (QR=0 @ hdr)

Signed-off-by: Etan Kissling <etan.kissling@gmail.com>
Signed-off-by: DL6ER <dl6er@dl6er.de>
…query in memory because we don't really need that.

Signed-off-by: DL6ER <dl6er@dl6er.de>
…he char pointer of the extended DNS errors because we can get this at any time.

Signed-off-by: DL6ER <dl6er@dl6er.de>
simonkelley and others added 6 commits July 3, 2021 08:41
Domain patterns in --address, --server and --local have, for many years,
matched complete labels only, so
--server=/google.com/1.2.3.4
will apply to google.com and www.google.com but NOT supergoogle.com

This commit introduces an optional '*' at the LHS of the domain string which
changes this behaviour so as to include substring matches _within_ labels. So,
--server=/*google.com/1.2.3.4
applies to google.com, www.google.com AND supergoogle.com.
The index computation went awry when servers are disabled
by the loop-detection system.

Thanks to Xingcong Li for spotting this.
If hostname is reset on existing lease, propagate such change to leases
file and script.
Signed-off-by: DL6ER <dl6er@dl6er.de>
@DL6ER
Copy link
Member Author

DL6ER commented Aug 30, 2021

This PR is superseded by the running beta branch release/v5.9 which contains all commits from here.

@DL6ER DL6ER closed this Aug 30, 2021
@DL6ER DL6ER deleted the update/dnsmasq-v2.86 branch September 2, 2021 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants