Tracking Issue: Support for `pingora-load-balancing` #39

jamesmunns · 2024-06-03T18:19:35Z

The pingora-load-balancing crate actually provides three key features related to managing upstreams:

Load Balancing, e.g. allowing for sharing all incoming requests between multiple upstream servers
Health Checks, e.g. ensuring all upstream servers are healthy or live, before proxying requests to them
Service Discovery, e.g. updating the list of upstream servers over time

river will be integrating pingora-load-balancing in order to add these features. This post discusses what does and does not exist today.

Load Balancing

Load balancing functionality is largely contained within the selection module.

There are currently four supported algorthms:

Consistent (Ketama) hashing
FNV hashing
Random
Round Robin

All of these algorithms support weighting of different backends. TODO: I'm unsure how weighting will play with health checks and service discovery

It is likely we will need/want more load balancing algorithms in the future. For example, NGINX supports quite a few.

To begin with, we will want to add support for all four currently supported algorithms, selected on a per-service basis. Configuration will need to include weighting cost for statically selected server lists.

Health Checks

Health check functionality is largely contained within the health_check module.

There are currently two supported methods of health check:

TCP Health checks, which creates a TCP (or TLS) connection in order to establish health
HTTP Health checks, which sends an HTTP(s) connection in order to establish health

Both methods have some basic shared configuration, such as hysteresis for healthy/unhealthy determination, as well as the connection information for each peer.

Additionally, HTTP health checks have a quite a few more options, including what to request from the server, how to validate the response, and additional checks like whether a TCP/TLS connection can be re-used (this makes checking more efficient, but can mask firewall or routing issues).

We will want to support both options, some effort is expected for determining reasonable configuration APIs for the health checks.

Service Discovery

Service discovery functionality is largely contained within the discovery module.

Currently, the only provided algorithm is "Static", which uses a fixed array of backends (e.g. "no discovery").

There does exist a ServiceDiscovery trait that allows for implementing and providing service discovery functionality.

We have a couple of required service discovery abilities in river's requirements. We may want to implement DNS-SD or polling of SRV records, though that may not occur during the initial implementation.

Summary

This tracking issue will likely cover multiple PRs for implementing this functionality. We'll need to design quite a bit of configuration surface for these new features, and it will likely be a breaking change from the existing "one listener" configuration file.

We should also cross check the requirements after implementation that pingora-load-balancing has sufficient API surface to allow for the things we've mentioned, such as DNS TTL tracking to invalidate hosts.

The text was updated successfully, but these errors were encountered:

jamesmunns · 2024-06-05T15:47:51Z

Notes on configuration file contents:

"Connectors" (general config, superceding the current "Connector" config)
- proxy_addr (addr+port, maybe also allow hostname+port?)
- tls_sni (optional, hostname)
- weight (integer? relative weighting?)
Selection
- "kind"/"flavor" (currently: Consistent/FNV/Random/RoundRobin)
- These algorithms don't currently have individual configurables, but it's likely future
  items will.
- We should probably define a default, maybe RoundRobin?
Health Checks
- "kind"/"flavor" (currently None/TCP/HTTP)
- General Settings
  - consecutive_successes (integer, "on" hysteresis)
  - consecutive_failures (integer, "off" hysteresis)
  - peer_options (a lot, see PeerOptions)
- Flavor: TCP
  - none? address/sni are repeated, but probably covered by connector options
- Flavor: HTTP
  - reuse_connection (fresh TCP/TLS on every check?)
  - req (request header, path and header k:v)
  - validator (Function for validating response).
    - TODO: We will need to define potential options for this!
  - port_override (integer - if we want to use another port for health check)
  - address/sni: not needed? covered by connector options?
  - proxy (TODO: I don't know what this is used for?)
  - client_cert_key (should we specify this as part of the Connector config? is this a reasonable override?)
  - scheme (HTTP/HTTPS - is this already covered by the TLS/SNI setting?)
Discovery
- Right now the only option is "static", we will want to reserve nested configuration like we will for health checks.

moderation · 2024-06-05T20:57:49Z

Cool to see work on load balancing. This is a key feature and I was surprised initially when I saw connector as a struct and not an array - https://github.com/memorysafety/river/blob/main/docs/toml-configuration.md?plain=1#L22-L24 👍

jamesmunns · 2024-06-07T11:53:39Z

I've been working on configuration a bit, as I think there will be quite a bit of configuration surface for this. You can see the changes here: #42 (comment)

jamesmunns · 2024-06-13T15:25:36Z

I've been looking at this the past couple of days, and I hadn't realized that the various pingora-load-balancing traits were not Object Safe.

I also took a bit of a look to see how straightforwards this would be to work around, but the ability for trait implementors to create new instances, generally for the purpose of updating via ArcSwap items, is pretty fundamental to the current API.

I have a chat scheduled today with the pingora engineers, but I can see the outcomes being one of the following:

Work with pingora team to make traits object safe

This would be a pretty significant API change for pingora-load-balancing, but would allow us to use dyn Trait to load the implementations at runtime, based on the content of the configuration file.

Write object safe workarounds for existing p-l-b traits

This would entail wrapping the existing trait implementors with a hand-rolled version of dynamic dispatch, rather than use dyn Trait directly. This could be an enum of the different options, or something more invasive or unsafe.

This would allow us to avoid changing p-l-b, but might incur additional performance costs, as well as maintenance overhead. This is likely not a blocker, but might be relevant as this dispatching would be required on a relatively "hot" path (ProxyHttp::upstream_peer()).

Don't use p-l-b

We could define our own traits that are similar to p-l-b, but are object safe, or work in a way that is suitable for River's needs.

This would make it more challenging to take newer algorithms and options from pingora, as well as making it challenging to upstream any we create for River.

Summary

I'll update after speaking with the pingora dev team.

mcpherrinm · 2024-06-19T17:58:33Z

I see you've started work, so I'm interested to hear how the Pingora discussion went and if you've got an idea which direction you're going (no pressure if you're not ready to share yet!)

jamesmunns · 2024-06-19T18:02:04Z

Hey @mcpherrinm! Thanks for the reminder!

The outcome was noted here: cloudflare/pingora#275 (comment)

But the short answer is "I was too zoomed in" - It actually wasn't necessary to make the P-L-B items object-safe, since we already have type erasure one level up, as pingora takes (type erased) dyn Service items.

jamesmunns · 2024-06-19T18:04:23Z

If anyone is interested in how I handle the "trickery" here, the biggest parts are:

First: Adding the generics to the RiverProxyService, here:

river/source/river/src/proxy/mod.rs

Lines 76 to 80 in 713dbdb

    
           impl<BS> RiverProxyService<BS> 
        
           where 
        
               BS: BackendSelection + Send + Sync + 'static, 
        
               BS::Iter: BackendIter, 
        
           {

Then: A function that picks the right "type" of service based on the appropriate selection algorithm, here:

river/source/river/src/proxy/mod.rs

Lines 51 to 74 in 713dbdb

    
           /// Create a proxy service, with the type parameters chosen based on the config file 
        
           pub fn river_proxy_service( 
        
               conf: ProxyConfig, 
        
               server: &Server, 
        
           ) -> Box<dyn pingora::services::Service> { 
        
               // Pick the correctly monomorphized function. This makes the functions all have the 
        
               // same signature of `fn(...) -> Box<dyn Service>`. 
        
               type ServiceMaker = 
        
                   fn(ProxyConfig, &Server, RequestSelector) -> Box<dyn pingora::services::Service>; 
        
               let (service_maker, selector): (ServiceMaker, RequestSelector) = 
        
                   match conf.upstream_options.selection { 
        
                       SelectionKind::RoundRobin => ( 
        
                           RiverProxyService::<RoundRobin>::from_basic_conf, 
        
                           null_selector, 
        
                       ), 
        
                       SelectionKind::Random => (RiverProxyService::<Random>::from_basic_conf, null_selector), 
        
                       SelectionKind::Fnv => (RiverProxyService::<FVNHash>::from_basic_conf, null_selector), 
        
                       SelectionKind::Ketama => ( 
        
                           RiverProxyService::<KetamaHashing>::from_basic_conf, 
        
                           null_selector, 
        
                       ), 
        
                   }; 
        
               service_maker(conf, server, selector) 
        
           }

mcpherrinm · 2024-06-19T18:09:49Z

Neat, thanks!

jamesmunns · 2024-06-24T18:42:26Z

Short progress update here:

I've landed initial support for most of Pingora Load Balancing features in Start work on load balancing crate support #45
There's still one open issue: How to get "extra metadata" for backends, as metioned in LoadBalancer and Backends interface are SocketAddr oriented, while ProxyHttp::upstream_peer() is HttpPeer oriented cloudflare/pingora#276
I've opened up a pingora PR to Add "Metadata" field to Backends cloudflare/pingora#304, to add this metadata
Next, I'll open a PR here USING that pingora PR, to see if this is suitable for holding onto information like TLS SNI, etc.

jamesmunns · 2024-07-02T17:00:12Z

I've gone ahead and merged #46, which corrected the loose ends in #45.

There are still quite a few configurables that need to be "wired up" for health checking and load balancing. I may go ahead and focus on static page hosting to be able to set up a "fleet" of river instances for easier testing.

jamesmunns added F-Upstream Functionality relating to the Upstream/Connector interfaces F-ServiceDiscovery Functionality relating to Service Discovery of Upstream Servers F-Configuration Functionality relating to configuration labels Jun 3, 2024

jamesmunns added this to the Kickstart Spike 2 milestone Jun 3, 2024

jamesmunns pinned this issue Jun 3, 2024

jamesmunns mentioned this issue Jun 5, 2024

Add support for KDL configuration #42

Merged

This was referenced Jun 11, 2024

LoadBalancer related traits are not object safe cloudflare/pingora#275

Closed

LoadBalancer and Backends interface are SocketAddr oriented, while ProxyHttp::upstream_peer() is HttpPeer oriented cloudflare/pingora#276

Open

jamesmunns mentioned this issue Jun 17, 2024

Start work on load balancing crate support #45

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue: Support for `pingora-load-balancing` #39

Tracking Issue: Support for `pingora-load-balancing` #39

jamesmunns commented Jun 3, 2024

jamesmunns commented Jun 5, 2024

moderation commented Jun 5, 2024

jamesmunns commented Jun 7, 2024

jamesmunns commented Jun 13, 2024

mcpherrinm commented Jun 19, 2024

jamesmunns commented Jun 19, 2024

jamesmunns commented Jun 19, 2024

mcpherrinm commented Jun 19, 2024

jamesmunns commented Jun 24, 2024

jamesmunns commented Jul 2, 2024

Tracking Issue: Support for pingora-load-balancing #39

Tracking Issue: Support for pingora-load-balancing #39

Comments

jamesmunns commented Jun 3, 2024

Load Balancing

Health Checks

Service Discovery

Summary

jamesmunns commented Jun 5, 2024

moderation commented Jun 5, 2024

jamesmunns commented Jun 7, 2024

jamesmunns commented Jun 13, 2024

Work with pingora team to make traits object safe

Write object safe workarounds for existing p-l-b traits

Don't use p-l-b

Summary

mcpherrinm commented Jun 19, 2024

jamesmunns commented Jun 19, 2024

jamesmunns commented Jun 19, 2024

mcpherrinm commented Jun 19, 2024

jamesmunns commented Jun 24, 2024

jamesmunns commented Jul 2, 2024

Tracking Issue: Support for `pingora-load-balancing` #39

Tracking Issue: Support for `pingora-load-balancing` #39