Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] dns: add support for SRV records in DNS lookup #6379

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5dbe4cc
Add support for SRV records in DNS lookup
venilnoronha Mar 26, 2019
b8e7008
Add mock method for resolveSrv
venilnoronha Mar 26, 2019
0401016
Drop SRV priority and weight information
venilnoronha Apr 29, 2019
ca67957
Refactor callback logic to reduce duplication
venilnoronha Apr 29, 2019
369582c
Drop duplicate utility method
venilnoronha Apr 30, 2019
7b260d0
Wire SRV resolver with upstream implementation
venilnoronha May 1, 2019
2c6da1a
Wire SRV resolver with logical DNS cluster impl
venilnoronha May 1, 2019
a302103
Fix tests
venilnoronha May 10, 2019
9da69fd
Fix format
venilnoronha Jun 17, 2019
7c5dcdd
Update ResolveSrvCb definition
venilnoronha Jun 17, 2019
8f937ff
Fix compilation errors
venilnoronha Jun 18, 2019
5c42041
Implement a DNS SRV resolver
venilnoronha Jun 20, 2019
303ccca
Disallow double registration of SrvResolver
venilnoronha Jun 20, 2019
b0dc4da
Add a todo note in the Redis cluster impl
venilnoronha Jun 20, 2019
b4e9b37
Generate appropriate SRV URLs in DNS impls
venilnoronha Jun 20, 2019
dbbdfbd
Add unit tests for Utility::urlFromSocketAddress
venilnoronha Jun 20, 2019
601c1f6
Fix DNS cluster impl tests
venilnoronha Jun 20, 2019
406eb1e
Add unit tests for SrvInstance
venilnoronha Jun 20, 2019
b63a81c
Enhance utility test with specific error messages
venilnoronha Jun 20, 2019
a8e2903
Wrap SrvInstanceConstSharedPtr in DnsSrvResponse
venilnoronha Jul 17, 2019
dc152d2
Minimalize labmda captures
venilnoronha Jul 17, 2019
ea167bc
Drop specialized types and use existing ones
venilnoronha Jul 19, 2019
6faaddd
Enable safe thread and memory DNS operations
venilnoronha Jul 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion include/envoy/network/dns.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ class DnsResolver {
/**
* Called when a resolution attempt is complete.
* @param response supplies the list of resolved IP addresses and TTLs. The list will be empty if
* the resolution failed.
* the resolution failed.
*/
using ResolveCb = std::function<void(std::list<DnsResponse>&& response)>;

Expand All @@ -62,6 +62,17 @@ class DnsResolver {
*/
virtual ActiveDnsQuery* resolve(const std::string& dns_name, DnsLookupFamily dns_lookup_family,
ResolveCb callback) PURE;

/**
* Initiate an async DNS resolution for an SRV record.
* @param dns_name supplies the DNS name to lookup.
* @param dns_lookup_family the DNS IP version lookup policy.
* @param callback supplies the callback to invoke when the resolution is complete.
* @return if non-null, a handle that can be used to cancel the resolution.
* This is only valid until the invocation of callback or ~DnsResolver().
*/
virtual ActiveDnsQuery* resolveSrv(const std::string& dns_name, DnsLookupFamily dns_lookup_family,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the existing API does it this way, but I'd be interested if we could make ActiveDnsQuery RAII.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be willing to explain what you mean here more? The ActiveDnsQuery abstract class only has one method and no data members. The PendingResolution struct is derived from ActiveDnsQuery and contains data, is that what you want to be RAII? Do you know why a struct was used instead of a class? Was it just to avoid another class definition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was suggesting that the returned ActiveDnsQuery be a unique ptr. The idea is that if this returned object is then destructed, it gives you automagic cancellation of the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch Are you saying it's better to return a std::unique_ptr<ActiveDnsQuery> instead of ActiveDnsQuery*? I'm not quite sure how it actually cancels the pending dns request. Mind elaborating a bit more? (Sorry if the answer is obvious as I'm pretty new to C++ :) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, return std::unique_ptr<ActiveDnsQuery>. The destructor for ActiveDnsQuery can then perform the cancellation.

ResolveCb callback) PURE;
};

using DnsResolverSharedPtr = std::shared_ptr<DnsResolver>;
Expand Down
2 changes: 2 additions & 0 deletions source/common/config/well_known_names.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ class AddressResolverNameValues {
public:
// Basic IP resolver
const std::string IP = "envoy.ip";
// DNS SRV resolver
const std::string SRV = "envoy.srv";
};

using AddressResolverNames = ConstSingleton<AddressResolverNameValues>;
Expand Down
4 changes: 4 additions & 0 deletions source/common/network/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,10 @@ envoy_cc_library(
hdrs = ["resolver_impl.h"],
deps = [
":utility_lib",
"//include/envoy/event:dispatcher_interface",
"//include/envoy/event:timer_interface",
"//include/envoy/network:address_interface",
"//include/envoy/network:dns_interface",
"//include/envoy/network:resolver_interface",
"//include/envoy/registry",
"//source/common/config:well_known_names",
Expand Down Expand Up @@ -260,6 +263,7 @@ envoy_cc_library(
"//source/common/common:assert_lib",
"//source/common/common:cleanup_lib",
"//source/common/common:utility_lib",
"//source/common/config:well_known_names",
"//source/common/protobuf",
"@envoy_api//envoy/api/v2/core:address_cc",
"@envoy_api//envoy/api/v2/core:base_cc",
Expand Down
162 changes: 130 additions & 32 deletions source/common/network/dns_impl.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include "common/network/dns_impl.h"

#include <arpa/nameser.h>
#include <netdb.h>
#include <netinet/in.h>
#include <sys/socket.h>
Expand Down Expand Up @@ -67,6 +68,30 @@ void DnsResolverImpl::initializeChannel(ares_options* options, int optmask) {
ares_init_options(&channel_, options, optmask | ARES_OPT_SOCK_STATE_CB);
}

bool DnsResolverImpl::PendingResolutionBase::fireCallback(std::list<DnsResponse>&& response) {
if (completed_) {
if (!cancelled_) {
try {
callback_(std::move(response));
} catch (const EnvoyException& e) {
ENVOY_LOG(critical, "EnvoyException in c-ares callback");
dispatcher_.post([s = std::string(e.what())] { throw EnvoyException(s); });
} catch (const std::exception& e) {
ENVOY_LOG(critical, "std::exception in c-ares callback");
dispatcher_.post([s = std::string(e.what())] { throw EnvoyException(s); });
} catch (...) {
ENVOY_LOG(critical, "Unknown exception in c-ares callback");
dispatcher_.post([] { throw EnvoyException("unknown"); });
}
}
if (owned_) {
delete this;
return true;
}
}
return false;
}

void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, int timeouts,
ares_addrinfo* addrinfo) {
// We receive ARES_EDESTRUCTION when destructing with pending queries.
Expand Down Expand Up @@ -120,25 +145,8 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i
ENVOY_LOG(debug, "DNS request timed out {} times", timeouts);
}

if (completed_) {
if (!cancelled_) {
try {
callback_(std::move(address_list));
} catch (const EnvoyException& e) {
ENVOY_LOG(critical, "EnvoyException in c-ares callback");
dispatcher_.post([s = std::string(e.what())] { throw EnvoyException(s); });
} catch (const std::exception& e) {
ENVOY_LOG(critical, "std::exception in c-ares callback");
dispatcher_.post([s = std::string(e.what())] { throw EnvoyException(s); });
} catch (...) {
ENVOY_LOG(critical, "Unknown exception in c-ares callback");
dispatcher_.post([] { throw EnvoyException("unknown"); });
}
}
if (owned_) {
delete this;
return;
}
if (fireCallback(std::move(address_list))) {
return;
}

if (!completed_ && fallback_if_failed_) {
Expand Down Expand Up @@ -192,6 +200,23 @@ void DnsResolverImpl::onAresSocketStateChange(int fd, int read, int write) {
(write ? Event::FileReadyType::Write : 0));
}

ActiveDnsQuery* DnsResolverImpl::preparePendingResolution(
std::unique_ptr<PendingResolutionBase> pending_resolution) {
if (pending_resolution->completed_) {
// Resolution does not need asynchronous behavior or network events. For
// example, localhost lookup.
return nullptr;
} else {
// Enable timer to wake us up if the request times out.
updateAresTimer();

// The PendingResolutionBase will self-delete when the request completes
// (including if cancelled or if ~DnsResolverImpl() happens).
pending_resolution->owned_ = true;
return pending_resolution.release();
}
}

ActiveDnsQuery* DnsResolverImpl::resolve(const std::string& dns_name,
DnsLookupFamily dns_lookup_family, ResolveCb callback) {
// TODO(hennna): Add DNS caching which will allow testing the edge case of a
Expand All @@ -209,19 +234,7 @@ ActiveDnsQuery* DnsResolverImpl::resolve(const std::string& dns_name,
pending_resolution->getAddrInfo(AF_INET6);
}

if (pending_resolution->completed_) {
// Resolution does not need asynchronous behavior or network events. For
// example, localhost lookup.
return nullptr;
} else {
// Enable timer to wake us up if the request times out.
updateAresTimer();

// The PendingResolution will self-delete when the request completes
// (including if cancelled or if ~DnsResolverImpl() happens).
pending_resolution->owned_ = true;
return pending_resolution.release();
}
return preparePendingResolution(std::move(pending_resolution));
}

void DnsResolverImpl::PendingResolution::getAddrInfo(int family) {
Expand All @@ -242,5 +255,90 @@ void DnsResolverImpl::PendingResolution::getAddrInfo(int family) {
this);
}

ActiveDnsQuery* DnsResolverImpl::resolveSrv(const std::string& dns_name,
DnsLookupFamily dns_lookup_family, ResolveCb callback) {
std::unique_ptr<PendingSrvResolution> pending_srv_res(
new PendingSrvResolution(callback, dispatcher_, channel_, dns_name, dns_lookup_family, this));
pending_srv_res->getSrvByName();
return preparePendingResolution(std::move(pending_srv_res));
}

void DnsResolverImpl::PendingSrvResolution::onAresSrvStartCallback(int status, int timeouts,
unsigned char* buf, int len) {
// We receive ARES_EDESTRUCTION when destructing with pending queries.
if (status == ARES_EDESTRUCTION) {
ASSERT(owned_);
delete this;
return;
}

bool replies_parsed = false;
if (status == ARES_SUCCESS) {
struct ares_srv_reply* srv_reply;
status = ares_parse_srv_reply(buf, len, &srv_reply);

if (status == ARES_SUCCESS) {
size_t total = 0;
for (ares_srv_reply* current_reply = srv_reply; current_reply != NULL;
current_reply = current_reply->next) {
total++;
}

std::shared_ptr<std::atomic<ssize_t>> finished = std::make_shared<std::atomic<ssize_t>>(0);
std::shared_ptr<std::list<DnsResponse>> srv_records =
std::make_shared<std::list<DnsResponse>>();
std::shared_ptr<std::mutex> mutex = std::make_shared<std::mutex>();
for (ares_srv_reply* current_reply = srv_reply; current_reply != NULL;
current_reply = current_reply->next) {
resolver_->resolve(
current_reply->host, this->dns_lookup_family_,
[this, total, finished, srv_records, mutex,
current_reply](const std::list<DnsResponse>&& response) {
for (auto instance = response.begin(); instance != response.end(); ++instance) {
Address::InstanceConstSharedPtr inst_with_port(
Utility::getAddressWithPort(*instance->address_, current_reply->port));
mutex->lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we taking locks? Isn't everything thread local on this dispatcher?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are threads and dispatchers always coupled? I think so, but I'm not 100% sure. So any operations happening through a particular dispatcher will operate on the same thread?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purpose of DNS resolution, this should be true (that they are coupled).

srv_records->emplace_back(DnsResponse(inst_with_port, instance->ttl_));
mutex->unlock();
}
if (static_cast<unsigned>(++(*finished)) == total) {
onAresSrvFinishCallback(std::move(*srv_records));
}
});
}
replies_parsed = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an explicit replies_parsed variable here, or can the control logic be made more explicit?

}

ares_free_data(srv_reply);
}

if (timeouts > 0) {
ENVOY_LOG(debug, "DNS request timed out {} times while querying for SRV records", timeouts);
}

if (!replies_parsed) {
onAresSrvFinishCallback({});
}
}

void DnsResolverImpl::PendingSrvResolution::onAresSrvFinishCallback(
std::list<DnsResponse>&& srv_records) {
if (!srv_records.empty()) {
completed_ = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this true regardless of the length of the srv_records? I.e. the query is over, we're not waiting any longer?

}

fireCallback(std::move(srv_records));
}

void DnsResolverImpl::PendingSrvResolution::getSrvByName() {
ares_query(
channel_, dns_name_.c_str(), ns_c_in, ns_t_srv,
[](void* arg, int status, int timeouts, unsigned char* abuf, int alen) {
static_cast<PendingSrvResolution*>(arg)->onAresSrvStartCallback(status, timeouts, abuf,
alen);
},
this);
}

} // namespace Network
} // namespace Envoy
85 changes: 67 additions & 18 deletions source/common/network/dns_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,15 @@ class DnsResolverImpl : public DnsResolver, protected Logger::Loggable<Logger::I
// Network::DnsResolver
ActiveDnsQuery* resolve(const std::string& dns_name, DnsLookupFamily dns_lookup_family,
ResolveCb callback) override;
ActiveDnsQuery* resolveSrv(const std::string& dns_name, DnsLookupFamily dns_lookup_family,
ResolveCb callback) override;

private:
friend class DnsResolverImplPeer;
struct PendingResolution : public ActiveDnsQuery {
struct PendingResolutionBase : public ActiveDnsQuery {
// Network::ActiveDnsQuery
PendingResolution(ResolveCb callback, Event::Dispatcher& dispatcher, ares_channel channel,
const std::string& dns_name)
PendingResolutionBase(ResolveCb callback, Event::Dispatcher& dispatcher, ares_channel channel,
const std::string& dns_name)
: callback_(callback), dispatcher_(dispatcher), channel_(channel), dns_name_(dns_name) {}

void cancel() override {
Expand All @@ -49,18 +51,7 @@ class DnsResolverImpl : public DnsResolver, protected Logger::Loggable<Logger::I
cancelled_ = true;
}

/**
* ares_getaddrinfo query callback.
* @param status return status of call to ares_getaddrinfo.
* @param timeouts the number of times the request timed out.
* @param addrinfo structure to store address info.
*/
void onAresGetAddrInfoCallback(int status, int timeouts, ares_addrinfo* addrinfo);
/**
* wrapper function of call to ares_getaddrinfo.
* @param family currently AF_INET and AF_INET6 are supported.
*/
void getAddrInfo(int family);
bool fireCallback(std::list<DnsResponse>&& response);

// Caller supplied callback to invoke on query completion or error.
const ResolveCb callback_;
Expand All @@ -73,13 +64,64 @@ class DnsResolverImpl : public DnsResolver, protected Logger::Loggable<Logger::I
bool completed_ = false;
// Was the query cancelled via cancel()?
bool cancelled_ = false;
// If dns_lookup_family is "fallback", fallback to v4 address if v6
// resolution failed.
bool fallback_if_failed_ = false;
const ares_channel channel_;
const std::string dns_name_;
};

struct PendingResolution : public PendingResolutionBase {
PendingResolution(ResolveCb callback, Event::Dispatcher& dispatcher, ares_channel channel,
const std::string& dns_name)
: PendingResolutionBase(callback, dispatcher, channel, dns_name) {}

/**
* ares_getaddrinfo query callback.
* @param status return status of call to ares_getaddrinfo.
* @param timeouts the number of times the request timed out.
* @param addrinfo structure to store address info.
*/
void onAresGetAddrInfoCallback(int status, int timeouts, ares_addrinfo* addrinfo);

/**
* wrapper function of call to ares_getaddrinfo.
* @param family currently AF_INET and AF_INET6 are supported.
*/
void getAddrInfo(int family);

// If dns_lookup_family is "fallback", fallback to v4 address if v6 resolution failed.
bool fallback_if_failed_ = false;
};

struct PendingSrvResolution : public PendingResolutionBase {
PendingSrvResolution(ResolveCb callback, Event::Dispatcher& dispatcher, ares_channel channel,
const std::string& dns_name, DnsLookupFamily dns_lookup_family,
DnsResolverImpl* resolver)
: PendingResolutionBase(callback, dispatcher, channel, dns_name),
dns_lookup_family_(dns_lookup_family), resolver_(resolver) {}

/**
* c-ares ares_query() query callback for initiation.
* @param status the status of the call to ares_query().
* @param timeouts the number of times the request timed out.
* @param buf the result buffer.
* @param len the result buffer length.
*/
void onAresSrvStartCallback(int status, int timeouts, unsigned char* buf, int len);

/**
* c-ares ares_query() query callback for completion.
* @param srv_records a list of SRV records.
*/
void onAresSrvFinishCallback(std::list<DnsResponse>&& srv_records);

// wrapper function of call to ares_query().
void getSrvByName();

// The DnsLookupFamily for the SRV record.
const DnsLookupFamily dns_lookup_family_;
// The resolver instance.
DnsResolverImpl* resolver_;
};

// Callback for events on sockets tracked in events_.
void onEventCallback(int fd, uint32_t events);
// c-ares callback when a socket state changes, indicating that libevent
Expand All @@ -89,6 +131,13 @@ class DnsResolverImpl : public DnsResolver, protected Logger::Loggable<Logger::I
void initializeChannel(ares_options* options, int optmask);
// Update timer for c-ares timeouts.
void updateAresTimer();
/**
* Prepares a PendingResolutionBase object for destruction if owned and when
* the resolution is completed.
* @param a pointer to the PendingResolutionBase instance.
*/
ActiveDnsQuery*
preparePendingResolution(std::unique_ptr<PendingResolutionBase> pending_resolution);

Event::Dispatcher& dispatcher_;
Event::TimerPtr timer_;
Expand Down
Loading