-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
channel warm up per thread #1758
Comments
A few things:
|
Hi carl. I think this is not a bench test. I'm not concerned about reasonable latency now. What bothers me is the huge difference in latencies between first call of grpc and the following ones in the same thread. Usually the first one takes about 100ms and then the subsequent ones only use about 5ms to do the same thing( the difference is clear even if not measured in nanoseconds). And it seems not a problem with JIT since the test does not hit it at all. |
Could it be because the actual connection is only made when the first call On Mon, May 2, 2016 at 9:07 AM, Yize Li notifications@github.com wrote:
|
@makdharma, that seems very likely. @LeeYeeze, the point is that you aren't comparing "first RPC on a thread" and instead saying "gRPC is slow to start". Since each thread is started at the same time I'd expect them to each have similar performance. In order to show it was creation a new thread that was slow you'd need to create new threads throughout the duration of the test. |
@makdharma @ejona86 |
@LeeYeeze, I've reproduced what you see. I'm on Linux. Swapping my CPU governor from ondemand to performance didn't change behavior much. I modified your code, hitting the hello-world-server: import io.grpc.CallOptions;
import io.grpc.ManagedChannel;
import io.grpc.examples.helloworld.GreeterGrpc;
import io.grpc.examples.helloworld.HelloReply;
import io.grpc.examples.helloworld.HelloRequest;
import io.grpc.netty.NettyChannelBuilder;
import io.grpc.stub.ClientCalls;
import java.util.Date;
import java.util.concurrent.TimeUnit;
public class ThreadWarmUp {
private static final HelloRequest request = HelloRequest.newBuilder().setName("world").build();
public static class QueryThread extends Thread {
private String name;
private ManagedChannel channel;
public QueryThread(String name, ManagedChannel channel) {
this.name = name;
this.channel = channel;
}
public void run() {
final int type = 0;
for (int i = 0 ; i < 3; i++) {
long start = System.nanoTime();
try {
if (type == 0) {
ClientCalls.blockingUnaryCall(
channel, GreeterGrpc.METHOD_SAY_HELLO, CallOptions.DEFAULT, request);
} else if (type == 1) {
ClientCalls.futureUnaryCall(
channel.newCall(GreeterGrpc.METHOD_SAY_HELLO, CallOptions.DEFAULT), request).get();
}
long duration = TimeUnit.NANOSECONDS.toMicros(System.nanoTime() - start);
String show = "thread " + name + "\tcost " + duration + "µs";
System.out.println(show);
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
public static void main(String[] args) throws Exception {
ManagedChannel channel = NettyChannelBuilder.forAddress("127.0.0.1", 50051)
.usePlaintext(true).build();
GreeterGrpc.newBlockingStub(channel).sayHello(request);
for (int i = 0; i < 100; i++) {
QueryThread th = new QueryThread(String.valueOf(i), channel);
th.start();
Thread.sleep(100);
}
}
} It results in things like:
|
@ejona86 |
We have the similar project when using TensorFlow Serving with Java gRPC clients. Is there any way to fix that? @ejona86 |
Please let us know any fix on this issue or workaround. @ejona86 |
I wonder if the lastest grpc java framework have fixed this problem ? |
Any update on this? @ejona86 I'm also experiencing very similar issue. When using a gRPC channel and async stub to send data, the first few I'm looking forward to make all of them take as little time as possible. Do you have any suggestion on that? I also try to "warmup" the channel by sending a dummy message before the real work, but it doesn't seem to help. Am I doing it incorrectly? Thank you |
@zzxgzgz I also do same thing, but it's working . Try to construct a large dummy enough message for warmup. |
@inkinworld thank you for sharing. Let me try that. |
any reference implementation for constructing large dummy warmup message? |
Make a message with a |
Any reference implementation for constructing large dummy warmup message? |
Does anyone have any updates regarding this? My team still faces this issue and we haven't found any workaround... |
Hey @ejona86 you might be able to shed to some light on what we are observing (or if this is expected or not). We "pre-warm" all channels by sending a configurable number (10 in this case) of Health Checking RPCs where the timing look something like this:
We assume that the channel should be sufficiently pre-warmed in this case. I even tried to abuse the service field to construct a arbitrary large payload. But we still take a latency hit on the very first RPC using the same channel but with a different RPC method:
|
@tommyulfsparre, the first time you call a method in a JVM will be slower, because of the JIT. This issue is about the Netty ByteBuf allocation performance on a new thread. If, on the same thread, you did a similarly-sized RPC before the RPC you care about and still see a performance difference, then it seems less likely to be this issue. |
Thanks @ejona86, yes im well aware of the effect of the JIT 😃 . We do the warmup from a single thread but using the same channel, the RPC we actual care about is not guaranteed to run on the same thread so the ByteBuf allocation performance for new threads seems plausible. Do you think there is a way to do any form of pre-allocation here or other ways we could achieve the same things? We would happily trade some CPU/memory during startup if we could mitigate the initial latency hit. |
@tommyulfsparre, I still don't think you're seeing the same problem. You're seeing it take 90 ms longer. I reproduced ~600 µs in this issue. For a larger message that could be larger, but not 90 ms. Client classloading, server classloading, heap size not stabilized, and server-side caches (including auth) are much more likely for that level of delay. |
@ejona86 Agree although I was not as concerned about the absolut number as this was measuring a remote RPC but more the observation that the initial RPC (after pre-warming which should have included JIT and other things) seems significantly slower. I did try to reproduce the observation here by running the client after server preloading. Although I can't explain why this RPC is slower than subsequent once the diff isn't really a concern. |
Hi, all. I'm doing an experiment with grpc and find out that the first usage of grpc in a thread costs much more time than the subsequent ones doing the same thing. I tested it on a 2-core laptop and 24-core machines, and the same phenomena occurred on all of them. So I'm wondering if I'm using grpc wrong or something in grpc-java could be improved to avoid such situation. Below is my code and part of the report.
The text was updated successfully, but these errors were encountered: