-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OkHttpChannelBuilder.flowControlWindow(int) isn't working #6685
Comments
OS X and Kubernetes. We would encourage using Netty on those platforms. I assume you aren't because of the performance. ⅒th the performance is very surprising. That needs to be fixed. Would you mind filing a bug about OkHttp being 10x as fast as Netty? On the OkHttp side of things: I expect this is a flow-control accounting bug. Those are pretty annoying to track down... At ~100 bytes and ~500 messages... that probably stalls after 64 KB of data, which should be easier to pinpoint.
Networks where OkHttp tends to live just need 64 KB. The BDP changed from ~64KB to ~6.4MB. 6.4 MB is very high, and really only seen with expensive cross-continental connections. Using a window of 64 KB on a 6.4 MB BDP-link will slow you down to 64KB/6.4MB = 1% of what is available. As you mentioned we do know (this doc is referenced in the grpc-go blog post; I wrote it) how to auto-tune based on BDP, but we've not even turned that on for Netty. It is implemented (and was actually implemented before any other language), but not enabled. Netty defaults to 1MB window (because of common cloud datacenter networks need 1MB) which has apparently been fine for people (apparently people don't care about extra memory use in Java... 😢). |
Your assumption is correct, we switched to OkHttp for its speed. Point of clarification–we benchmarked it as 5x faster rather than 10x (200k msg/s for OkHttp vs. 40k msg/s for Netty). Opened issue #6696 for that. Thanks for the clarification around BDP. Couldn't find any info about its Java implementation besides the code itself. |
I tried to reproduce this in AbstractInteropTest and failed. It is unclear why. I reproduced it by just making my own binary. It hangs between 512-1024 messages which is ~51,200 and ~102,400 bytes. I see this and then it hangs (and changing window no longer hangs):
diff --git a/examples/build.gradle b/examples/build.gradle
index 56a80523f..ea070a027 100644
--- a/examples/build.gradle
+++ b/examples/build.gradle
@@ -34,6 +34,7 @@ dependencies {
// examples/advanced need this for JsonFormat
implementation "com.google.protobuf:protobuf-java-util:${protobufVersion}"
+ implementation "io.grpc:grpc-okhttp:${grpcVersion}"
runtimeOnly "io.grpc:grpc-netty-shaded:${grpcVersion}"
testImplementation "io.grpc:grpc-testing:${grpcVersion}"
@@ -112,6 +113,13 @@ task compressingHelloWorldClient(type: CreateStartScripts) {
classpath = startScripts.classpath
}
+task okhttpflowcontrol(type: CreateStartScripts) {
+ mainClassName = 'OkHttpFlowControl'
+ applicationName = 'okhttpflowcontrol'
+ outputDir = new File(project.buildDir, 'tmp')
+ classpath = startScripts.classpath
+}
+
applicationDistribution.into('bin') {
from(routeGuideServer)
from(routeGuideClient)
@@ -120,5 +128,6 @@ applicationDistribution.into('bin') {
from(hedgingHelloWorldClient)
from(hedgingHelloWorldServer)
from(compressingHelloWorldClient)
+ from(okhttpflowcontrol)
fileMode = 0755
}
diff --git a/examples/src/main/java/OkHttpFlowControl.java b/examples/src/main/java/OkHttpFlowControl.java
new file mode 100644
index 000000000..2f2e88b52
--- /dev/null
+++ b/examples/src/main/java/OkHttpFlowControl.java
@@ -0,0 +1,76 @@
+import io.grpc.ManagedChannel;
+import io.grpc.ManagedChannelBuilder;
+import io.grpc.Server;
+import io.grpc.ServerBuilder;
+import io.grpc.examples.manualflowcontrol.HelloReply;
+import io.grpc.examples.manualflowcontrol.HelloRequest;
+import io.grpc.examples.manualflowcontrol.StreamingGreeterGrpc;
+import io.grpc.okhttp.OkHttpChannelBuilder;
+import io.grpc.stub.ServerCallStreamObserver;
+import io.grpc.stub.StreamObserver;
+import java.net.InetSocketAddress;
+
+public final class OkHttpFlowControl {
+ public static void main(String[] args) throws Exception {
+ Server server = ServerBuilder.forPort(0)
+ .addService(new ForeverGreeting())
+ .build()
+ .start();
+ InetSocketAddress address = (InetSocketAddress) server.getListenSockets().get(0);
+ ManagedChannel channel = OkHttpChannelBuilder
+ .forAddress(address.getAddress().getHostAddress(), address.getPort())
+ .usePlaintext()
+ .flowControlWindow(131070 + 100)
+ .build();
+ StreamingGreeterGrpc.newStub(channel)
+ .sayHelloStreaming(new StreamObserver<HelloReply>() {
+ int count;
+ @Override public void onNext(HelloReply reply) {
+ count++;
+ if (Integer.bitCount(count) == 1) {
+ System.out.println("count: " + count);
+ }
+ }
+ @Override public void onCompleted() {}
+ @Override public void onError(Throwable t) {}
+ })
+ .onCompleted();
+ while (true) {
+ Thread.sleep(1000000);
+ }
+ }
+
+ static class ForeverGreeting extends StreamingGreeterGrpc.StreamingGreeterImplBase {
+ private static final HelloReply REPLY
+ = HelloReply.newBuilder().setMessage(String.format("%100s", "")).build();
+
+ @Override
+ public StreamObserver<HelloRequest> sayHelloStreaming(
+ final StreamObserver<HelloReply> responseObserverGeneric) {
+ final ServerCallStreamObserver<HelloReply> responseObserver =
+ (ServerCallStreamObserver<HelloReply>) responseObserverGeneric;
+ class Stream implements StreamObserver<HelloRequest> {
+ @Override public void onNext(HelloRequest request) {}
+ @Override public void onCompleted() {}
+ @Override public void onError(Throwable t) {}
+
+ public void onReady() {
+ while (!responseObserver.isCancelled() && responseObserver.isReady()) {
+ responseObserver.onNext(REPLY);
+ }
+ }
+ }
+ final Stream stream = new Stream();
+ responseObserver.setOnCancelHandler(new Runnable() {
+ @Override public void run() {}
+ });
+ responseObserver.setOnReadyHandler(new Runnable() {
+ @Override public void run() {
+ stream.onReady();
+ }
+ });
+ stream.onReady();
+ return stream;
+ }
+ }
+} |
This was really basic. We weren't informing the remote that we used different settings, so they were always using 64KB (so wouldn't send enough to be worth us sending a window update). Whelp. I guess that makes it clear nobody has configured the window option... Also, I see some suspicious behavior where the receive-side window size is being passed to the send-side diff --git a/okhttp/src/main/java/io/grpc/okhttp/OkHttpClientTransport.java b/okhttp/src/main/java/io/grpc/okhttp/OkHttpClientTransport.java
index b238b9237..291431780 100644
--- a/okhttp/src/main/java/io/grpc/okhttp/OkHttpClientTransport.java
+++ b/okhttp/src/main/java/io/grpc/okhttp/OkHttpClientTransport.java
@@ -608,7 +608,10 @@ class OkHttpClientTransport implements ConnectionClientTransport, TransportExcep
synchronized (lock) {
frameWriter.connectionPreface();
Settings settings = new Settings();
+ // TODO: WTH are persist and persist value??
+ settings.set(7, 0, initialWindowSize);
frameWriter.settings(settings);
+ frameWriter.windowUpdate(0, initialWindowSize - 64*1024);
}
} finally {
latch.countDown(); |
I was able to reproduce this with diff --git a/interop-testing/src/test/java/io/grpc/testing/integration/Http2OkHttpTest.java b/interop-testing/src/test/java/io/grpc/testing/integration/Http2OkHttpTest.java
index 927b0ed44..07578ea45 100644
--- a/interop-testing/src/test/java/io/grpc/testing/integration/Http2OkHttpTest.java
+++ b/interop-testing/src/test/java/io/grpc/testing/integration/Http2OkHttpTest.java
@@ -95,6 +95,7 @@ public class Http2OkHttpTest extends AbstractInteropTest {
int port = ((InetSocketAddress) getListenAddress()).getPort();
OkHttpChannelBuilder builder = OkHttpChannelBuilder.forAddress("localhost", port)
.maxInboundMessageSize(AbstractInteropTest.MAX_MESSAGE_SIZE)
+ .flowControlWindow(256*1024)
.connectionSpec(new ConnectionSpec.Builder(ConnectionSpec.MODERN_TLS)
.cipherSuites(TestUtils.preferredTestCiphers().toArray(new String[0]))
.build()) |
Thanks @ejona86, using that patch on OkHttpClientTransport has got things running smoother now. Although this may have uncovered another issue: specifically when doing the same test as above, but this time connecting via a Kubernetes nginx Ingress, the following error occurs on the client:
On the surface, the "window size overflow" looks correct:
What doesn't make sense to me is that when repeating the same setup but with Netty, there's no problem. It looks like both Netty and OkHttp are both doing the same "window size overflow" checks too, but the Netty client doesn't complain about the flow control window getting too large.
FYI I tried to clean out the logs before pasting them here since there were multiple subchannels showing repetitive logs, so it's possible I may have mucked something up on accident. |
Alright, ran it through a debugger and it looks like that's exactly what's happening in my previous comment. The client ignores the |
Took the liberty of drafting a PR with what I believe is the correct behavior |
…ning of connection Specifically, this addresses bugs that occur when the `OkHttpChannelBuilder.flowControlWindow(int)` setting is increased from its default value. Two changes: 1. On starting a connection, ensure the value of `OkHttpChannelBuilder.flowControlWindow(int)` is sent via Settings.INITIAL_WINDOW_SIZE. Also send a WINDOW_UPDATE after Settings to update the connection-level window. 2. Always initialize the `OutboundFlowController` with an initialWindowSize of 65335 bytes per the [http2 spec](https://http2.github.io/http2-spec/#InitialWindowSize) instead of using the inbound window size. Fixes grpc#6685
…ning of connection (v1.28.x backport) Specifically, this addresses bugs that occur when the `OkHttpChannelBuilder.flowControlWindow(int)` setting is increased from its default value. Two changes: 1. On starting a connection, ensure the value of `OkHttpChannelBuilder.flowControlWindow(int)` is sent via Settings.INITIAL_WINDOW_SIZE. Also send a WINDOW_UPDATE after Settings to update the connection-level window. 2. Always initialize the `OutboundFlowController` with an initialWindowSize of 65335 bytes per the [http2 spec](https://http2.github.io/http2-spec/#InitialWindowSize) instead of using the inbound window size. Fixes #6685 Backport of #6742
…ning of connection Specifically, this addresses bugs that occur when the `OkHttpChannelBuilder.flowControlWindow(int)` setting is increased from its default value. Two changes: 1. On starting a connection, ensure the value of `OkHttpChannelBuilder.flowControlWindow(int)` is sent via Settings.INITIAL_WINDOW_SIZE. Also send a WINDOW_UPDATE after Settings to update the connection-level window. 2. Always initialize the `OutboundFlowController` with an initialWindowSize of 65335 bytes per the [http2 spec](https://http2.github.io/http2-spec/#InitialWindowSize) instead of using the inbound window size. Fixes grpc#6685
What version of gRPC-Java are you using?
Tried this with all of:
What is your environment?
Both:
JDK: Zulu, java 8
What did you expect to see?
During server-side streaming RPC's, increasing the client's flow control window size should improve the rate at which the client can receive messages (particularly when the connection has some latency).
What did you see instead?
When using the OkHttp client specifically, the above expectation holds true as long as the flow control window size is somewhere between its default value and double the default value – the rate at which the client can receive messages scales linearly with the size of the flow control window. However, going even a single byte above that range results in the client receiving only a few messages, then stops receiving messages entirely. Even when the RPC stops due to deadline and a new streaming RPC is started over the same connection, no more messages are sent.
Steps to reproduce the bug
Though we first encountered this in our production environment, we've been able to recreate it in a controlled test environment like this:
Here's our protos:
Here's how the OkHttp channel is created-
Experiment results:
Other observations:
The text was updated successfully, but these errors were encountered: