How to improve gateway performance? #301

maoyunfei · 2018-04-25T02:56:44Z

I hava read the performance comparison from spring-cloud-gateway-bench.

Proxy	Avg Latency	Avg Req/Sec/Thread
gateway	6.61ms	3.24k
linkered	7.62ms	2.82k
zuul	12.56ms	2.09k
none	2.09ms	11.77k

According to the comparison result, although gateway is best compared with linkered and zuul, but it's performance only reach about 1/4 of none proxy.

I made some tests with different situation, such as different response time and response body size. I used wrk for tests. I found it seems that gateway's performance only influenced by response body size, with bigger than about 10kb size, it will drop rapidly.

So how to optimalize gateway to do better?

re6exp · 2018-04-25T08:20:22Z

Use third-party solutions. For example, https://varnish-cache.org.

maoyunfei · 2018-04-27T08:35:39Z

@re6exp Thank you for your comment. But In my business case, I can't use cache.

lhotari · 2018-05-01T16:45:14Z

@maoyunfei Do you use https in your use case?
For TLS/https connections, using netty-tcnative improves performance (lower latency & higher throughput). See reactor/reactor-netty#344 for details.

maoyunfei · 2018-05-02T01:56:07Z

@lhotari
I just used http.

blockmar · 2018-06-19T09:11:19Z

Documentation request.

Is it even possible to use netty-tcnative with the Gateway starter in some way? It is not documented anywhere in the docs and returns zero usable results on Google. I think this needs to be added to the documentation.

"Just use http" - is not a way forward for us and right now the performance of non-native TLS is holding us back from a production deploy.

But maybe it already works as described in reactor/reactor-netty#344 but I see no logs neither confirming or denying the fact.

I did basic benchmarks (using very lowtech ab) and I see no difference by just including the netty-native uber jar.

spencergibb · 2018-07-05T17:20:50Z

@maoyunfei any test run on a single machine will have contention problems. Can you provide a complete, minimal, verifiable sample that reproduces the slowdowns with increased response size? It should be available as a GitHub (or similar) project or attached to this issue as a zip file.

maoyunfei · 2018-07-10T09:22:24Z

@spencergibb I created a demo project on github that reproduces the issue, look at gateway-performance-test please!

thirunar · 2018-09-24T12:37:45Z

Is this still valid? We are evaluating zuul and gateway. Our case is also similar, the response size will be around 1-2MBs.

maoyunfei · 2018-09-25T03:29:04Z

@thirunar It's still valid. By the way, we turned to zuul2 finally.

dalegaspi · 2018-10-24T18:47:30Z

i would take this benchmark exercise with a huge grain of salt. the origin, test harness and proxy are all on the same box? that's no way close to how you're going to deploy it prod why would you load test in such manner? the test harness should be on one box, the proxy(ies) in another and the origin should be in yet another box.

i'm using spring boot 2.x with zuul with dynamic content and without any caching. the responses are 32k on average. the only difference is that i am using undertow container with okhttp and we have a custom pre filter that validates JWT from Redis so there is overhead but even with that i can get close to 70% TPS compared to just going directly to origin using wrk -t 10 -c 200 -d 30s similar to what is being done in the bench mark github project. also performed test with Apache Bench and JMeter with comparable results.

i dunno guys. Netflix is using Zuul 1 for their services at one point and they're dealing with video. they even admit that Zuul 2 has 25% net effect in througput. not sure how one can claim that it's not good enough for whatever it's going to be used for.

spencergibb · 2018-10-24T20:06:42Z

I'm going to close this.

I agree with the benchmark on one machine problem first of all.

Second running the benchmark with the latest releases (Finchley and Greenwich) does not yield the large drops in performance as on mentioned in the sample (which was running a RC).

I also threw in zuul1 on port 8083

# direct to app, 30000 char responses
$ wrk -t16 -c200 -d30s "http://localhost:8081/demo?delay=50&length=large"      
Running 30s test @ http://localhost:8081/demo?delay=50&length=large
  16 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    61.43ms   11.91ms 135.87ms   82.32%
    Req/Sec   195.54     36.09   274.00     61.71%
  93722 requests in 30.08s, 2.63GB read
Requests/sec:   3115.74
Transfer/sec:     89.49MB

# gateway, 30000 char responses
$ wrk -t16 -c200 -d30s "http://localhost:8082/proxy/demo?delay=50&length=large"
Running 30s test @ http://localhost:8082/proxy/demo?delay=50&length=large
  16 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    72.03ms   21.23ms 189.44ms   82.60%
    Req/Sec   167.31     42.80   242.00     60.98%
  80076 requests in 30.07s, 2.25GB read
Requests/sec:   2663.33
Transfer/sec:     76.50MB

# zuul, 30000 char responses
$ wrk -t16 -c200 -d30s "http://localhost:8083/proxy/demo?delay=50&length=large"
Running 30s test @ http://localhost:8083/proxy/demo?delay=50&length=large
  16 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   505.51ms  415.11ms   2.00s    70.26%
    Req/Sec    24.78     13.43    90.00     52.88%
  11310 requests in 30.09s, 325.30MB read
  Socket errors: connect 0, read 0, write 0, timeout 205
Requests/sec:    375.88
Transfer/sec:     10.81MB

# direct to app, 50000 char responses
$ wrk -t16 -c200 -d30s "http://localhost:8081/demo?delay=50&length=xlarge"     
Running 30s test @ http://localhost:8081/demo?delay=50&length=xlarge
  16 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    64.59ms   16.27ms 172.98ms   82.39%
    Req/Sec   185.65     42.94   265.00     53.17%
  89115 requests in 30.10s, 4.16GB read
Requests/sec:   2960.95
Transfer/sec:    141.52MB

# gateway, 50000 char responses
~% wrk -t16 -c200 -d30s "http://localhost:8082/proxy/demo?delay=50&length=xlarge"
Running 30s test @ http://localhost:8082/proxy/demo?delay=50&length=xlarge
  16 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    72.50ms   21.83ms 236.20ms   84.15%
    Req/Sec   166.38     43.79   242.00     62.42%
  79536 requests in 30.09s, 3.71GB read
Requests/sec:   2643.68
Transfer/sec:    126.36MB

# zuul, 50000 char responses
$ wrk -t16 -c200 -d30s "http://localhost:8083/proxy/demo?delay=50&length=xlarge"
Running 30s test @ http://localhost:8083/proxy/demo?delay=50&length=xlarge
  16 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   504.15ms  408.22ms   2.00s    70.64%
    Req/Sec    24.71     13.14    90.00     54.27%
  11434 requests in 30.10s, 547.20MB read
  Socket errors: connect 0, read 0, write 0, timeout 191
Requests/sec:    379.89
Transfer/sec:     18.18MB

dalegaspi · 2018-10-24T20:14:38Z

testing localhost aside...

i can't comment on the cloud gateway but there is something seriously wrong with your the zuul1 setup. 80% drop? ok...

spencergibb · 2018-10-24T20:19:24Z

There was no setup, a default zuul app from start.spring.io using tomcat with a single route.

dalegaspi · 2018-10-24T21:06:22Z

look, i'm not here to knock spring-cloud-gateway. i have no sound opinion of it because i've never used it. i'm sure it's great.

i find it suspicious, however, when someone tries to knock zuul 1 and i came to this thread to point out that this is not my experience at all. i'm compelled to write my findings here so people considering spring cloud 2.x + zuul wouldn't just rule it out completely upon reading this thread.

again, zuul 1 is from Netflix and have used it in production; i highly doubt they would have even promoted it let alone open source it if there's 80% drop in throughput when using. there are a number of online blog posts and articles pitting zuul 1 against nginx and it holds its own, if not perform better in some scenarios. that's all i'm saying.

spencergibb · 2018-10-24T21:09:15Z

I don't disagree. We went with zuul 1 for that reason. The vanilla experience is NOT optimized.

VinodKandula · 2018-11-29T18:33:28Z

@spencergibb @dalegaspi @maoyunfei We had similar performance issues with Spring Cloud Gateway(Finchley.SR1), please find comparison metrics below.

				   Spring Boot 1.5.4 + Zuul Gateway	Spring Boot 2.0.4 + Spring Cloud Gateway(Finchley.SR1)
		
Throughput(Req/Sec)				    460				152
Average Response Time(ms)			    107				323

Test Server Configuration: M4.xLarge AWS Instance — 4 Core CPU, 16GB of Memory

It's very confused state if spring cloud gateway can be used in prod, please provide your comments.

spencergibb · 2018-11-29T18:45:49Z

@VinodKandula can you share more than just metrics? What do the individual apps look like, how were they configured, how did you test them?

VinodKandula · 2018-12-01T18:13:59Z

@spencergibb
The overall system looks like the following

Config Server
Discovery Server (Eureka)
Zuul/Spring Gateway Server
Spring Data (JPA) Rest Repositories(CRUD) Service

All rest endpoints are fired via Zuul/Gateway which uses the discovery server to get the list of available instances.
JMeter is used for Performance tests. It is very straight forward to see the performance results between Zuul vs Sping Cloud Gateway.

kimmking · 2018-12-26T13:13:56Z

@spencergibb hi, spencer, can you retry your wrk test command with option --latency ? I find 99percent latency is 2-3 times than the case of direct access with -c200.

atverma91 · 2019-01-30T11:15:14Z

spring cloud gateway and zuul 1 both performance is very low ......
how we can increase spring cloud gateway performance

dalegaspi · 2019-01-30T14:08:28Z

we are using Spring Boot 2.x and Zuul 1.x and we found the performance really good. however, the default settings are just severely under-optimized. after considerable experimentation and research, i came up with these settings.

our use case has the following:

3 custom pre-filters: one retrieves JWT session from Redis and validate, one adds get parameters and headers, and another asynchronously sends messages via kafka (the 3rd filter is only applied to about 50% of traffic)
we are using sleuth (zipkin) that's configured to record 10% of traffic
the app is running in docker containers (ECS) fronted by an ALB
our (response) payloads are about ~30K on average

with the optimized settings and with those 3 filters and sleuth enabled running in ECS with ALB, there is less than 10% drop in throughput on several load tests (only 1 instance of app running in ECS on load tests)...honestly, this shouldn't really come as a surprise since Zuul 1.x is battle-tested and performs really well if configured correctly.

atverma91 · 2019-02-01T04:27:17Z

Thanks Dalegaspi....as u mentioned u performed load test with zuu1.x and spring boot 2 so can u explain
up to how much TPS u performed load test and 10% request dropped among how many request

dalegaspi · 2019-02-01T20:38:47Z

@atverma91 actually the recent tests shows there is no perceptible drop in throughput; if we turn on compression in Zuul we even perform better. This is the results in our latest test with compression enabled in Zuul.

type	ave latency	throughput	ave bytes	error
direct	610.11 ms	608.6/sec	28637.5	0.01%
zuul 1.x + spring boot 2.x	461.09 ms	746.2/sec	4225.5	0.03%

this is with JMeter, 50 threads and 15 minutes continuous run.

it's key that when you perform load tests that the JVM is warmed up and that the client is not on the same box as service; clients are not going to be running on the same box as your service it baffles me why some load tests insist on having the service and the benchmarking app on one box.

spencergibb · 2019-02-01T20:57:12Z

This isn't the place to discuss zuul performance without gateway. Please take the conversation offline.

maoyunfei closed this as completed Apr 25, 2018

maoyunfei reopened this Apr 27, 2018

blockmar mentioned this issue Jun 19, 2018

Document support (or lack of) for netty-tcnative #373

Open

spencergibb added the waiting for feedback label Jul 5, 2018

ryanjbaxter mentioned this issue Aug 2, 2018

Comparing API Gateway Performances: NGINX vs. ZUUL vs. Spring Cloud Gateway vs. Linkerd #466

Closed

spencergibb added feedback-provided and removed waiting for feedback labels Oct 1, 2018

spencergibb closed this as completed Oct 24, 2018

spencergibb removed the feedback-provided label Oct 24, 2018

yhilem mentioned this issue Nov 26, 2018

Verify large file upload #21

Closed

VinodKandula mentioned this issue Dec 3, 2018

Throughput problems when compared with Netflix Zuul and Nginx #124

Closed

kimmking mentioned this issue Feb 27, 2019

About performance of p99 latency in 200 concurrencies. #859

Closed

aCoder2013 mentioned this issue May 8, 2019

API网关异步化改造技术选型 aCoder2013/blog#34

Open

This comment has been minimized.

Sign in to view

spring-cloud locked as off-topic and limited conversation to collaborators Aug 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to improve gateway performance? #301

How to improve gateway performance? #301

maoyunfei commented Apr 25, 2018 •

edited

Loading

re6exp commented Apr 25, 2018

maoyunfei commented Apr 27, 2018 •

edited

Loading

lhotari commented May 1, 2018

maoyunfei commented May 2, 2018

blockmar commented Jun 19, 2018 •

edited

Loading

spencergibb commented Jul 5, 2018

maoyunfei commented Jul 10, 2018 •

edited

Loading

thirunar commented Sep 24, 2018

maoyunfei commented Sep 25, 2018

dalegaspi commented Oct 24, 2018 •

edited

Loading

spencergibb commented Oct 24, 2018

dalegaspi commented Oct 24, 2018

spencergibb commented Oct 24, 2018

dalegaspi commented Oct 24, 2018

spencergibb commented Oct 24, 2018

VinodKandula commented Nov 29, 2018

spencergibb commented Nov 29, 2018

VinodKandula commented Dec 1, 2018

kimmking commented Dec 26, 2018

atverma91 commented Jan 30, 2019

dalegaspi commented Jan 30, 2019 •

edited

Loading

atverma91 commented Feb 1, 2019

dalegaspi commented Feb 1, 2019

spencergibb commented Feb 1, 2019

This comment has been minimized.

How to improve gateway performance? #301

How to improve gateway performance? #301

Comments

maoyunfei commented Apr 25, 2018 • edited Loading

re6exp commented Apr 25, 2018

maoyunfei commented Apr 27, 2018 • edited Loading

lhotari commented May 1, 2018

maoyunfei commented May 2, 2018

blockmar commented Jun 19, 2018 • edited Loading

spencergibb commented Jul 5, 2018

maoyunfei commented Jul 10, 2018 • edited Loading

thirunar commented Sep 24, 2018

maoyunfei commented Sep 25, 2018

dalegaspi commented Oct 24, 2018 • edited Loading

spencergibb commented Oct 24, 2018

dalegaspi commented Oct 24, 2018

spencergibb commented Oct 24, 2018

dalegaspi commented Oct 24, 2018

spencergibb commented Oct 24, 2018

VinodKandula commented Nov 29, 2018

spencergibb commented Nov 29, 2018

VinodKandula commented Dec 1, 2018

kimmking commented Dec 26, 2018

atverma91 commented Jan 30, 2019

dalegaspi commented Jan 30, 2019 • edited Loading

atverma91 commented Feb 1, 2019

dalegaspi commented Feb 1, 2019

spencergibb commented Feb 1, 2019

This comment has been minimized.

maoyunfei commented Apr 25, 2018 •

edited

Loading

maoyunfei commented Apr 27, 2018 •

edited

Loading

blockmar commented Jun 19, 2018 •

edited

Loading

maoyunfei commented Jul 10, 2018 •

edited

Loading

dalegaspi commented Oct 24, 2018 •

edited

Loading

dalegaspi commented Jan 30, 2019 •

edited

Loading