New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request : Load balancing - Proxy - Cluster #775

Open
dtruffaut opened this Issue Feb 11, 2016 · 10 comments

Comments

Projects
None yet
7 participants
@dtruffaut

dtruffaut commented Feb 11, 2016

h2o is an amazing tool !

Today it can challenge Varnish with the cache aware server push feature (http2-casper) that brings cache logic and memory consumption to the client. That is a great improvement.

Tomorrow, with the ability to proxify requests to a cluster of backends, it can even challenge load balancers like HAProxy or NginX.

Here is a proposal of a very basic and naive declaration of a cluster :

cluster: &back_cluster
  - http://127.0.0.1:8080
  - http://127.0.0.1:8081
  - http://127.0.0.1:8082

hosts:
  "example.com":
    listen:      
        ...
    paths:
      "/":
        proxy.reverse.url: *back_cluster

Notes:

  • Clusters IP can be internal or external
  • Clusters can be defined at multiple levels (global, host, ...)
  • Multiple clusters can defined (one for backend, one for backoffice, one for files, ...etc.)
  • Clusters be used with the reproxy directive

Bonus : we can even listen from a cluster :

cluster: &in_cluster
  - http://92.250.218.59:443
  - http://92.250.218.60:443
  - http://92.250.218.61:443

hosts:
  "example.com":
    listen: *in_cluster     
    ssl:
      certificate-file: /etc/letsencrypt/live/example.com/fullchain.pem
      key-file: /etc/letsencrypt/live/example.com/privkey.pem
      ...

...But that supposes to separate listen: from ssl:.

Here is an alternative, more extensible but also more verbose syntax:

cluster: &back_cluster
  another_option: ...
  nodes:
    - node:
        scheme: "http"
        host: 127.0.0.1
        port: 8080
        weight: 1
    - node:
        scheme: "http"
        host: 127.0.0.1
        port: 8081
        weight: 2
    - node:
        scheme: "http"
        host: 127.0.0.1
        port: 8082
        weight: 1

hosts:
  "example.com":
    listen:      
        ...
    paths:
      "/":
        proxy.reverse.url: *back_cluster

Note : I wrote "node" but it could be "server", "entry" or whatever more semantic.

Related issues :
#90
#764
#577

See also : NginX Upstream :
http://nginx.org/en/docs/http/ngx_http_upstream_module.html
http://nginx.org/en/docs/http/load_balancing.html
https://www.nginx.com/resources/admin-guide/load-balancer/

@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho Feb 15, 2016

Member

Thank you for the suggestion.

FWIW, you can configure your DNS server to return multiple IP addresses (in your case 92.250.218.59, 92.250.218.60, 92.250.218.61) in some fashion (e.g. round-robin), to balance the load between the upstream servers, since H2O resolves the address of the upstream server every time it needs to proxy a request, as briefly described in https://h2o.examp1e.net/configure/proxy_directives.html#proxy.reverse.url.

But I agree that we can improve this a lot by directly supporting features for load balancing within H2O.

Member

kazuho commented Feb 15, 2016

Thank you for the suggestion.

FWIW, you can configure your DNS server to return multiple IP addresses (in your case 92.250.218.59, 92.250.218.60, 92.250.218.61) in some fashion (e.g. round-robin), to balance the load between the upstream servers, since H2O resolves the address of the upstream server every time it needs to proxy a request, as briefly described in https://h2o.examp1e.net/configure/proxy_directives.html#proxy.reverse.url.

But I agree that we can improve this a lot by directly supporting features for load balancing within H2O.

@iceb0y

This comment has been minimized.

Show comment
Hide comment
@iceb0y

iceb0y May 13, 2016

Contributor

It will be nice to support load balancing over unix sockets.

Contributor

iceb0y commented May 13, 2016

It will be nice to support load balancing over unix sockets.

@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho May 13, 2016

Member

@iceboy-sjtu Can you please explain a bit more the reason you need it?

Do you have same application listening to different Unix sockets?

Member

kazuho commented May 13, 2016

@iceboy-sjtu Can you please explain a bit more the reason you need it?

Do you have same application listening to different Unix sockets?

@iceb0y

This comment has been minimized.

Show comment
Hide comment
@iceb0y

iceb0y May 16, 2016

Contributor

We're building a website where frontend workers listen to unix sockets.

We plan to use systemd to manage workers and looking for a reverse proxy to load balance between workers.

Contributor

iceb0y commented May 16, 2016

We're building a website where frontend workers listen to unix sockets.

We plan to use systemd to manage workers and looking for a reverse proxy to load balance between workers.

@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho May 16, 2016

Member

@iceboy-sjtu

We're building a website where frontend workers listen to unix sockets.
We plan to use systemd to manage workers and looking for a reverse proxy to load balance between workers.

Thank you for the clarification.

My understanding is that the general answer to load balance a set of server processes on a same machine is to create a single daemon (governed by systemd etc.), that binds to a Unix socket file and then forks multiple worker processes listening to the same unix socket. That way, idle worker processes can pick up incoming connections.

I'd suggest following that way, since it would improve responsiveness of your web site than the current approach.

Member

kazuho commented May 16, 2016

@iceboy-sjtu

We're building a website where frontend workers listen to unix sockets.
We plan to use systemd to manage workers and looking for a reverse proxy to load balance between workers.

Thank you for the clarification.

My understanding is that the general answer to load balance a set of server processes on a same machine is to create a single daemon (governed by systemd etc.), that binds to a Unix socket file and then forks multiple worker processes listening to the same unix socket. That way, idle worker processes can pick up incoming connections.

I'd suggest following that way, since it would improve responsiveness of your web site than the current approach.

@iceb0y

This comment has been minimized.

Show comment
Hide comment
@iceb0y

iceb0y May 18, 2016

Contributor

@kazuho Thank you very much. I've tried your approach. It works very well.

Contributor

iceb0y commented May 18, 2016

@kazuho Thank you very much. I've tried your approach. It works very well.

@gaoyichuan

This comment has been minimized.

Show comment
Hide comment
@gaoyichuan

gaoyichuan Feb 7, 2017

+1 for this.
Also we can implement more load balancing methods other than DNS round robin.
This article is well enough to have a read.

gaoyichuan commented Feb 7, 2017

+1 for this.
Also we can implement more load balancing methods other than DNS round robin.
This article is well enough to have a read.

@fmunteanu

This comment has been minimized.

Show comment
Hide comment
@fmunteanu

fmunteanu May 10, 2017

Hi @kazuho,

This is a road blocker for me to use H2O in production, instead of Nginx. I want to take advantage of the HTTP/2 server push feature H2O offers. Right now, I have this setup in Nginx that needs to be reproduced in H2O:

http {
	...
	upstream data {
	        server		127.0.0.1:8000;
	        server		127.0.0.1:8001;
	        server		127.0.0.1:8002;
	        server		127.0.0.1:8003;
	        keepalive	100;
	}
	...
        server {
        	...
        	location / {
        	        try_files	$uri @data;
        	}
        	...
        	location @data {
        	        proxy_cache		web;
        	        proxy_cache_key		$request_method$format$request_uri;
        	        proxy_cache_valid	200	1h;
        	        proxy_cache_valid	any	1m;
        	        proxy_pass		http://data;
        	        internal;
        	}
                ...
        }
}

What solution do you recommend?

fmunteanu commented May 10, 2017

Hi @kazuho,

This is a road blocker for me to use H2O in production, instead of Nginx. I want to take advantage of the HTTP/2 server push feature H2O offers. Right now, I have this setup in Nginx that needs to be reproduced in H2O:

http {
	...
	upstream data {
	        server		127.0.0.1:8000;
	        server		127.0.0.1:8001;
	        server		127.0.0.1:8002;
	        server		127.0.0.1:8003;
	        keepalive	100;
	}
	...
        server {
        	...
        	location / {
        	        try_files	$uri @data;
        	}
        	...
        	location @data {
        	        proxy_cache		web;
        	        proxy_cache_key		$request_method$format$request_uri;
        	        proxy_cache_valid	200	1h;
        	        proxy_cache_valid	any	1m;
        	        proxy_pass		http://data;
        	        internal;
        	}
                ...
        }
}

What solution do you recommend?

@jackjacek

This comment has been minimized.

Show comment
Hide comment
@jackjacek

jackjacek Mar 27, 2018

I like the solution with

&in_cluster/*in_cluster

now imagine if it could also check if each of proxy.reverse.url host is alive and/or retry in RR fashion given 5 seconds timeout or, in m dreams, even fingerprint two requests to distinct hosts and respond with quickest ;)

jackjacek commented Mar 27, 2018

I like the solution with

&in_cluster/*in_cluster

now imagine if it could also check if each of proxy.reverse.url host is alive and/or retry in RR fashion given 5 seconds timeout or, in m dreams, even fingerprint two requests to distinct hosts and respond with quickest ;)

@zenny

This comment has been minimized.

Show comment
Hide comment
@zenny

zenny May 15, 2018

+1 for the OT of @dtruffaut because h20 performance appears to throttle other proxy applications as evident from the benchmarks here: https://www.techempower.com/benchmarks/#section=data-r15&hw=ph&test=fortune

@kazuho san: no matter how fast h2o is as a server if one has to deploy slower frontend load balancer , the entire purpose of h2o as backend servergets defeated, right?! Thus, the feature is very desirable for production deployment.

zenny commented May 15, 2018

+1 for the OT of @dtruffaut because h20 performance appears to throttle other proxy applications as evident from the benchmarks here: https://www.techempower.com/benchmarks/#section=data-r15&hw=ph&test=fortune

@kazuho san: no matter how fast h2o is as a server if one has to deploy slower frontend load balancer , the entire purpose of h2o as backend servergets defeated, right?! Thus, the feature is very desirable for production deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment