Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of HashedWheelTimer in Netty 4.0.19 was way much better than 4.0.20 #2571

Closed
lokeshhctm opened this issue Jun 15, 2014 · 60 comments
Assignees
Labels
Milestone

Comments

@lokeshhctm
Copy link

We are not using SSL but use several HTTP calls for every incoming requests to out server. Now i see that after i switch to latest Netty 4.0.20, seeing 10% high latency than 4.0.19 which is deployed on other servers. and hence 4.0.20 is not able to support that much QPS. i end up getting 10% less QPS support in 4.0.20 in comparison to 4.0.19.

@lokeshhctm lokeshhctm changed the title Performanceof earlier build with Netty 4.0.19 was way much better than the client with 4.0.20 Performance of Netty 4.0.19 was way much better than 4.0.20 Jun 16, 2014
@normanmaurer
Copy link
Member

@lokeshhctm we will need more details to help here. In all our benchmarks 4.0.20.Final out-performs 4.0.19.Final.

@lokeshhctm
Copy link
Author

How can i get the 4.0.x JAR up-to different commits between 4.0.19 and 4.0.20? I am not able to checkout commit number mentioned in 4.0 branch.Please let me know the commands to checkout 4.0.x commits.
Once i get those, will deploy on one of the production boxes and see what commit is creating problem.
The problem is visible as soon as we deploy.

@normanmaurer
Copy link
Member

@lokeshhctm you will need to checkout the 4.0 branch and then reset to the specific commit.

@lokeshhctm
Copy link
Author

I tried that but not helping. I appreciate if you type commands please.

@slandelle
Copy link
Contributor

Check the commit history on the 4.0 branch: https://github.com/netty/netty/commits/4.0

Then git checkout + commit hash (grab commit hashes directly from Github interface: click on "Copy SHA" button over each commit) and build. Repeat to go back in time.

@lokeshhctm
Copy link
Author

I may be doing wrong something:

git clone -b 4.0 https://github.com/netty/netty.git
git checkout 1709113

Gettting an error "fatal: reference is not a tree: 1709113"

@normanmaurer
Copy link
Member

Use git reset —hard $sha
-- 
Norman Maurer

Am 17. Juni 2014 bei 17:41:52, lokeshhctm (notifications@github.com) schrieb:

I may be doing wrong somehthing:

git clone -b 4.0 https://github.com/netty/netty.git
git checkout 1709113

Gettting an error "fatal: reference is not a tree: 1709113"


Reply to this email directly or view it on GitHub.

@slandelle
Copy link
Contributor

Should be working though (it does for me)?! Which version of git do you use?

@trustin
Copy link
Member

trustin commented Jun 17, 2014

git bisect might be useful in this situation although in our case the range isn't very wide.

@lokeshhctm
Copy link
Author

I used git bisect and found the exact commit which has issue and we ran into performance issues:

Commit ID : 6248db5
Logs: "Minimize memory footprint of HashedWheelTimer and context-switching"

@normanmaurer
Copy link
Member

@lokeshhctm awesome! Could you give me some more details how the HashedWheelTimer is used here ? /cc @slandelle

@normanmaurer normanmaurer added this to the 4.0.21.Final milestone Jun 18, 2014
@lokeshhctm
Copy link
Author

To be very honest, i am not aware how it is being used. We are using Async HTTP Client to manage the connection pools and use the same to get back the response from Netty Provider (Netty NIO 4.0.x). What functionality internally uses this, don't know.

@normanmaurer
Copy link
Member

Ok I will figure it out with @slandelle then :) Stay tuned.

@slandelle
Copy link
Contributor

@lokeshhctm:

  • Did you override AHC default config?
  • How many AHC instances do you create?
  • How many requests/s do you want to send? You mentioned 12.000 rps, which is simply huge, is this from one single instance?

@normanmaurer Here's what we do (defaults):

  • create one HWT instance with default constructor
  • depending on config, for every request sent, we might create:
    • 1 timeout for request timeout (default is 60s) that gets cancelled if the response complete
    • 1 timeout for request idle timeout (disable by default) that gets cancelled every time we receive a chunk (then a new one is created) or the response completes or times out.

@lokeshhctm
Copy link
Author

  1. Yes, we override AHC default settings:
    AsyncHttpClientConfig config = new AsyncHttpClientConfig.Builder()
    .setIOThreadMultiplier(3)
    .setMaxConnectionsTotal(defaultMaxConnections)
    .setMaxConnectionsPerHost(totalConnections)
    .setAllowPoolingConnection(true)
    .setConnectionTimeoutInMs(defaultConnectionTimeout)
    .setRequestTimeoutInMs(defaultSocketTimeout)
    .setFollowRedirects(true)
    .build();
    We create one instance of AHC.
    Currently we send 12K but we tested with 4.0.19 and we can send even more and we would like to get that benefit. because our system will start sending more in future.

@slandelle
Copy link
Contributor

.setMaxConnectionsTotal(defaultMaxConnections)
.setMaxConnectionsPerHost(totalConnections)
.setConnectionTimeoutInMs(defaultConnectionTimeout)
.setRequestTimeoutInMs(defaultSocketTimeout)

Values?

@lokeshhctm
Copy link
Author

.setMaxConnectionsTotal(64000)
.setMaxConnectionsPerHost(1000)
.setConnectionTimeoutInMs(125)
.setRequestTimeoutInMs(125)

@slandelle
Copy link
Contributor

OK, so if I get it right, you're creating 12K timeouts/s. That's just huge.
@normanmaurer Will you be able to test something like this?

@normanmaurer
Copy link
Member

@slandelle I will try to write a micro benchmark and test 4.0.19.Final + later

@slandelle
Copy link
Contributor

@normanmaurer Let me know if I can help.

@lokeshhctm
Copy link
Author

@normanmaurer are you able to reproduce the issue?

@normanmaurer
Copy link
Member

@lokeshhctm @slandelle so far I was not able to reproduce it with a micro benchmark :( If there would be any way to you could provide me with a reproducer / gc logs / profiler snapshot etc please do ....

@normanmaurer
Copy link
Member

Even worse... current 4.0 branch out-performs the old HashedWheelTimer all time.

@normanmaurer
Copy link
Member

@slandelle could you somehow help me to pin this down ?

@slandelle
Copy link
Contributor

@normanmaurer I don't have more information than you have :(
@lokeshhctm Do you have any errors? Do you having timing out requests? How does the heap behave (like more GC, Old filled quicker)? Could you provide a heap dump?

@slandelle
Copy link
Contributor

@lokeshhctm Do you observe some request timeouts and then, do you trigger so special time-expensive operations in AHC Handler's onThrowable?

@normanmaurer
Copy link
Member

@lokeshhctm another thing that may give me some clue would be a GC log of running with 4.0.19.Final and the hwt_test branch.

@normanmaurer normanmaurer self-assigned this Jun 23, 2014
@lokeshhctm
Copy link
Author

i have GC logs for 4.0.19 and current. Let me know how will send you. Git is not allowing to attach any file other than image.

@normanmaurer
Copy link
Member

@lokeshhctm norman.maurer at googlemail dot com

@lokeshhctm
Copy link
Author

i just sent you GC logs on the email

@slandelle
Copy link
Contributor

4.0.19 : 900
4.0.20 : 750
4.0.21 (current) : 870

OK, so when you were talking about 12K QPS, that was for a whole cluster, not a single node, right?

@lokeshhctm
Copy link
Author

I am really sorry about this confusion. Actually QPS i mentioned above is the incoming QPS per box. With each incoming we send out several HTTP requests which is almost 10 per incoming request.
So the final QPS using netty build is:

4.0.19 : 9000
4.0.20 : 7500
4.0.21 (current) : 8700

@normanmaurer
Copy link
Member

@lokeshhctm seems like current does more GC pauses then 4.0.19.Final... This is quite interesting. Is it reproducible all the the time ?

@lokeshhctm
Copy link
Author

yes. the problem is reproducible all the time.

@normanmaurer
Copy link
Member

@lokeshhctm ok thanks... This may help me :) We will see

@normanmaurer
Copy link
Member

@lokeshhctm actually I missed that both logs cover different running times. Could you provide me the GC logs for both version while running them for the same time (like 5 mins).

@normanmaurer
Copy link
Member

How many OPS did you get with 4.0.20.Final?

Am 23.06.2014 um 20:25 schrieb lokeshhctm notifications@github.com:

performance less is like 900 QPS vs 870 QPS. but i will that ,may be because of timeout issue i see in latest build. I have production build and host we try but can't share because of legals.


Reply to this email directly or view it on GitHub.

@lokeshhctm
Copy link
Author

I just sent you the GC logs for 5 min interval for the same time. Currently on production i see:

Box with 4.0.19 - 8200 outgoing
Box with current build - 7800 outgoing

@normanmaurer
Copy link
Member

@lokeshhctm not much I can see from the logs... I guess I need a reproducer or maybe profiler snapshots. Sorry :(

@normanmaurer normanmaurer changed the title Performance of Netty 4.0.19 was way much better than 4.0.20 Performance of HashedWheelTimer in Netty 4.0.19 was way much better than 4.0.20 Jun 24, 2014
@lokeshhctm
Copy link
Author

i will try to write some reproducer and will update you

@normanmaurer
Copy link
Member

Thanks!

Am 24.06.2014 um 17:34 schrieb lokeshhctm notifications@github.com:

i will try to write some reproducer and will update you


Reply to this email directly or view it on GitHub.

@normanmaurer
Copy link
Member

@lokeshhctm any updates ?

@lokeshhctm
Copy link
Author

@normanmaurer i sent you an email. let me know if it is fine to talk

@lokeshhctm
Copy link
Author

@normanmaurer i tried after the fix you mentioned above but i still see issue, did you try with the sample i sent you on email?

@slandelle
Copy link
Contributor

@lokeshhctm Does this sample involves AHC? If so, I'd gladly have it too.

@lokeshhctm
Copy link
Author

Yeah it involves AHC. here you go.

/*

  • To change this license header, choose License Headers in Project Properties.
  • To change this template file, choose Tools | Templates
  • and open the template in the editor.
    */

import java.util.Timer;
import org.asynchttpclient.AsyncCompletionHandler;
import org.asynchttpclient.AsyncHttpClient;
import org.asynchttpclient.AsyncHttpClientConfig;
import org.asynchttpclient.AsyncHttpClientFactory;
import org.asynchttpclient.Request;
import org.asynchttpclient.RequestBuilder;
import org.asynchttpclient.Response;
import java.util.TimerTask;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicLong;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
*

  • @author lokeshmehta
    */
    public class TestNetty {

    private static final AtomicBoolean keepGoing = new AtomicBoolean(true);
    private static final Timer timer = new Timer("LocalTimer");

    private static final int readTimeOut = 500;
    private static final int conTimeOut = 500;
    private static final int qPS = 800;
    private static final int testRunSeconds = 60;

    private final AsyncHttpClient httpclient;

    private static final AtomicLong numRequests = new AtomicLong();
    private static final AtomicLong totalTime = new AtomicLong();

    public TestNetty() {
    AsyncHttpClientConfig config = new AsyncHttpClientConfig.Builder()
    .setIOThreadMultiplier(3)
    .setMaxConnectionsTotal(10000)
    .setMaxConnectionsPerHost(700)
    .setAllowPoolingConnection(true)
    .setConnectionTimeoutInMs(conTimeOut)
    .setRequestTimeoutInMs(readTimeOut)
    .setFollowRedirects(true)
    .build();
    httpclient = AsyncHttpClientFactory.getAsyncHttpClient(config);
    }

    public static void main(String[] args) {
    TestNetty t = new TestNetty();
    t.process();
    }

    private void process() {
    String[] urls = {
    "http://www.google.com", //
    "http://php.net",
    "http://www.facebook.com",
    "http://hhvm.com",
    "http://www.reddit.com",
    "http://www.wikipedia.org",
    "http://example.org", //
    "http://www.iana.org",
    "http://netflix.com", //
    "http://bing.com" //
    };
    timer.schedule(new LocalTimerTask(), testRunSeconds * 1000);
    while (keepGoing.get()) {
    for (int i = 0; i < qPS; i++) {
    for (String url : urls) {
    RequestBuilder builder;
    builder = new RequestBuilder("GET").setUrl(url);
    builder.setRequestTimeoutInMs(readTimeOut);
    Request request = builder.build();
    try {
    HttpObject hso = new HttpObject();
    httpclient.executeRequest(request, new AsyncCompletionCallback(hso));
    /System.out.println("sent");/
    } catch (IllegalStateException e) {
    Logger.getLogger(this.getClass().getName()).log(Level.SEVERE, "Exception during execution of HTTP: ", e);
    } catch (Exception e) {
    Logger.getLogger(this.getClass().getName()).log(Level.SEVERE, "Exception during execution of HTTP : ", e);
    }
    }
    }
    try {
    Thread.sleep(1000);
    } catch (InterruptedException ex) {

        }
    }
    
    timer.cancel();
    
    long reqs = numRequests.get();
    long t = totalTime.get();
    if (reqs > 0 && t > 0) {
        System.out.println("Requests: " + (reqs));
        System.out.println("Average Time : " + (t / reqs));
    }
    System.out.println("SHUT DOWN!!");
    
    System.exit(1);
    

    }

    private void done(final HttpObject hso) {
    hso.endTime = System.currentTimeMillis();
    numRequests.incrementAndGet();
    totalTime.addAndGet(hso.endTime - hso.startTime);

    }

    private class AsyncCompletionCallback extends AsyncCompletionHandler {

    private final HttpObject hso;
    
    public AsyncCompletionCallback(HttpObject hso) {
        this.hso = hso;
    }
    
    @Override
    public Response onCompleted(Response response) throws Exception {
        try {
            /*System.out.println("Response Length : " + response.getResponseBody().length());*/
        } catch (Exception e) {
            e.printStackTrace();
        }
        done(hso);
        return response;
    }
    
    @Override
    public void onThrowable(Throwable ex) {
        try {
            String msg = ex.toString();
        } catch (Exception e) {
            ex.printStackTrace();
        }
        done(hso);
    }
    

    }

    private class LocalTimerTask extends TimerTask {

    @Override
    public void run() {
        keepGoing.set(false);
    
    }
    

    }

    private class HttpObject {

    public HttpObject() {
        startTime = System.currentTimeMillis();
    }
    long startTime;
    long endTime;
    

    }
    }

@lokeshhctm
Copy link
Author

The issue has been fixed after the commit yesterday.

@normanmaurer
Copy link
Member

Nice :)

Am 12.07.2014 um 19:17 schrieb lokeshhctm notifications@github.com:

The issue has been fixed after the commit yesterday.


Reply to this email directly or view it on GitHub.

@normanmaurer
Copy link
Member

Fixed by 08426d5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants