Skip to content

erikh/border

Repository files navigation

Border: A modern approach to DNS & Load Balancers

Border is a combination of DNS service and Load Balancer, It uses this combination, internal health checking, and primitive consensus algorithms to automatically heal your network when there are issues. If you lose a border node, one can be immediately added to the cluster without additional configuration. Zones and Balancing rules will automatically propagate to the new node.

All network resources are treated like DNS entries, and DNS propagation ("Zone Transfers") do not use the primitive and insecure AXFR protocol. They use JOSE to distribute full configuration of the server via modern protocols with modern encryption and modern amenities. All nodes subscribe to a publisher node, which is in "legacy" terms the "master" or "owner" of the network. The publisher is responsible for distributing configuration to the other nodes.

Border does not depend on a consensus database. It holds its own elections and it is strongly recommended for this reason you run an even number of nodes in a production configuration, so that when a service goes down, an election can be held and it will be guaranteed that one will win eventually.

Rationale

Why do this?

In typical scenarios, you have one of a few configurations that all have their own problems:

  • In many cases, you have fixed IPs provided via a very rigidly deployed DNS service. Virtual IPs, or technologies like BGP and Anycast are employed to retain the value of these IP addresses. DNS is, in this case, a very brittle lynchpin providing static records to the network which when modified, tend to cause chaos as it is a major event.

  • Other cases leverage a round-robin distribution of A records with a very short TTL on the DNS server. This increases load to the DNS service, which is usually not an issue, but also increases the severity of a DNS outage. This works well with e.g. haproxy, but if the haproxy is removed prematurely to a DNS change, there will be forced network hiccups.

  • Other, simpler configurations either stack load balancers, proxy from haproxy to a secondary proxying webserver, e.g., nginx, but no matter how this is sliced, there is a single point of failure: the load balancer, but usually DNS is also a notable point of failure in the event the host running the load balancer is lost.

Additionally, third party health monitoring must be employed for all these scenarios. The beauty of Border is that it monitors itself through the conjunction of both protocols combined with consensus.

Border cannot eliminate all failures. TCP is still TCP and when the connection is gone, TCP must do something about that, which is usually to simply fail the connection. Border, however, tries to eliminate the administrative overhead of such an event.

Features

Border is trying to pack a lot of features and not just be a simple tool. We are dedicated to bringing a richer experience to fronting websites for administrators.

  • Provides TCP and HTTP Load Balancing
  • TLS Termination
  • Load Balancing of less typical DNS situations, such as SRV records (think tools like Samba or LDAP).
  • Health checks are a part of DNS, and when a health check is failed, DNS is automatically adjusted.
  • Zone Transfers do not use the unwieldy and frequently insecure AXFR protocol, instead opting for the protections provided by JOSE. Full configuration is synced, not just zones.
  • Built-in Let's Encrypt and ACME support
    • For TLS Termination
    • For DNSSEC (still need to look deeper into this one)
  • Self-Distributing architecture means fire-and-forget deployments, and a fully STONITH (Shoot the offending node in the head) architecture.
  • ngrok-like agent to help border traverse NAT firewalls as well as more entrenched network configurations behind e.g. Corporate Firewalls.
  • Split Horizon support baked into the service, on a per network and per zone basis.
  • Capacity management in the config, e.g., "this webserver can handle 10k connections at a time, and that one can handle 5k, so don't route more than that there".
    • Consensus based connection tracking so that load balancers can manage all servers in a criss-cut pattern, not just pool of servers that are lost when the balancer goes down.
  • We are debating adding a caching proxy ala Squid / Varnish, as it may also be a good fit in this service with a minimal footprint overall.

Some operational notes

Here is an example configuration. Documentation will come soon, but this displays most of the service's features at this time.

Elections are held when the publisher is no longer responsive. This is a configurable parameter (at least, eventually). At the point an election is held, all members vote for the service with what they think has the highest uptime. This should result in a clear winner as if the publisher has been working up to this point, this information should be communicated in the configuration.

The appropriate cadence for replacing a failing or terminated load balancer is such:

  • Termination event of original load balancer
  • Wait for health checks to fail, and any necessary elections to complete. This takes approximately a second.
  • Create a new peer in the configuration, and send it border client updateconfig <myconfigfile> to the publisher. You can use border client identifypublisher to determine the publisher.
  • Start the new peer with the updated configuration.

Please note that between the termination point and the raise of the new instance, that unless you have lost all border nodes, health checking will automatically heal the affected load balancing and DNS records pointing at the failed instance. This works today.

Load Balancers are configured like a DNS record. The A records for a website are maintained as records pointing to border. Contrast with ALIAS records on Amazon Web Services' Route 53 and Elastic Load Balancer. Each one will have configurable parameters similar to a SOA record's notion of TTL and cache.

Border agents (tunneling proxy) will use JOSE keys to identify themselves to the network instead of an IP address, and their DNS records will represent that. Contrast with a SPF record's use of the TXT record type.

Border agent is a well documented and well specified protocol that can be implemented by web service frameworks as well as web servers themselves such as Caddy or Nginx.

Border's impact on whois distribution basically expects you to use higher TTL records and let the DNS protocol do its job properly, allowing for fallback to other nameservers. In the event a nameserver fails, A quorum of 4 nameservers should keep another three alive, allowing you to adjust the whois records in the event the host is completely lost. We feel this is a safer DR strategy than investing a lot of infrastructure into retaining the IP addresses at all costs, as it is much simpler to maintain.

Status

Border is in an early alpha stage. It is functional, but lacks many of the features you would expect from a product of its type. Encouragement, testing, and patches (!) are strongly encouraged, but "betting the farm" on this product at this point would be an impressive display of your own lack of wisdom.

Author

Erik Hollensbe erik+github@hollensbe.org

About

A modern approach to handling the edge of your website's communication.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages