fix #5036: addressing the handling of non-connection errors #5047

shawkins · 2023-04-12T19:31:09Z

Description

Addresses the additional concerns raised in #5036

This still needs to increase or allow setting the relevant limits in vertx and jetty.

cc @vietj

Type of change

Bug fix (non-breaking change which fixes an issue)
Feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change
Chore (non-breaking change which doesn't affect codebase;
test, version modification, documentation, etc.)

Checklist

Code contributed by me aligns with current project license: Apache 2.0
I Added CHANGELOG entry regarding this change
I have implemented unit tests to cover my changes
I have added/updated the javadocs and other documentation accordingly
No new bugs, code smells, etc. in SonarCloud report
I tested my code in Kubernetes
I tested my code in OpenShift

shawkins · 2023-04-17T13:33:16Z

Updated the max frame and message values for jetty and vertx using 2 MB - which roughly comes from accommodating the max size of a configmap https://kubernetes.io/docs/concepts/configuration/configmap/#motivation

The other option is to just set the values to unlimited - which seems to be what okhttp and jdk effectively do.

vmuzikar · 2023-04-17T13:43:58Z

@shawkins Thank you for providing the fix!

What are the implications of having unlimited frame size at the client side?

IMHO it'd make sense to align with other HTTP Clients (OkHttp) and set it to unlimited.

shawkins · 2023-04-17T14:09:55Z

What are the implications of having unlimited frame size at the client side?

I don't believe there is much risk, it just provides a more friendly error than an OOM should an unexpectedly large resources be retrieved. Given that OkHttp was used for years without explicit limits and we didn't have any issues that I'm aware of, I'm okay with unlimited - but without additional input from @vietj and @manusa I thought I'd start off with at least what could be a "principled" bound.

sunix · 2023-04-19T13:19:06Z

httpclient-jetty/src/main/java/io/fabric8/kubernetes/client/jetty/JettyHttpClientBuilder.java

@@ -39,6 +39,8 @@
    extends StandardHttpClientBuilder<JettyHttpClient, JettyHttpClientFactory, JettyHttpClientBuilder> {

  private static final int MAX_CONNECTIONS = Integer.MAX_VALUE;
+  // the max data in a config map is 1 MiB, so pad a little beyond that
+  private static final int MAX_WS_MESSAGE_SIZE = 1 << 21;


1 MiB ? or 2 ?

Updated the max frame and message values for jetty and vertx using 2 MB - which roughly comes from accommodating the max size of a configmap https://kubernetes.io/docs/concepts/configuration/configmap/#motivation

2097152 bytes

With some further searching using etcd as a keyword it looks like kubernetes sets the limit on the etcd server to 3 MB per request - kubernetes/kubernetes#105863
https://stackoverflow.com/questions/73549236/argo-and-kubernetes-request-entity-to-large-limit-is-3145728
Rather than the etcd default of 1.5 MB. So the 1 MB limit for a configmap is some other limit that is being enforced. After discussion I'll update this pr to be unlimited and note in the resolution the concern about protecting the client memory from attack, which in this case would imply the api server has been compromised in some way.

shawkins · 2023-04-19T14:06:06Z

@manusa updated to umlimited, and logged the issue of configuration as a separate concern #5062

shawkins · 2023-04-19T14:36:26Z

...client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatcherWebSocketListener.java

+      logger.debug("WebSocket error received", t);
+      manager.scheduleReconnect(state);
+    } else {
+      manager.close(new WatcherException("Could not process websocket message", t));


The downside to the approach is that we have not fully captured what the recoverable error types are, and we'll need to update the handling further. The alternative here is to log at an error level and still try to reconnect - we'll still potentially need to refine the handling further to avoid looping on something that is non-recoverable.

sonarcloud · 2023-04-27T16:54:50Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

54.5% Coverage
0.0% Duplication

oscerd approved these changes Apr 12, 2023

View reviewed changes

shawkins force-pushed the iss5036 branch 2 times, most recently from 78b8674 to d4824ca Compare April 17, 2023 13:30

shawkins marked this pull request as ready for review April 17, 2023 13:31

shawkins requested review from manusa, rohanKanojia and sunix as code owners April 17, 2023 13:31

shawkins force-pushed the iss5036 branch from d4824ca to 87b3c23 Compare April 17, 2023 15:23

sunix reviewed Apr 19, 2023

View reviewed changes

shawkins force-pushed the iss5036 branch from 87b3c23 to e162fb5 Compare April 19, 2023 13:58

shawkins commented Apr 19, 2023

View reviewed changes

sunix approved these changes Apr 21, 2023

View reviewed changes

metacosm approved these changes Apr 21, 2023

View reviewed changes

manusa added this to the 6.6.0 milestone Apr 27, 2023

manusa force-pushed the iss5036 branch from e162fb5 to 26fabb5 Compare April 27, 2023 14:44

manusa approved these changes Apr 27, 2023

View reviewed changes

fix fabric8io#5036: addressing the handling of non-connection errors

2df23d5

manusa force-pushed the iss5036 branch from 26fabb5 to 2df23d5 Compare April 27, 2023 16:18

manusa merged commit 4bd80e1 into fabric8io:master Apr 27, 2023
19 of 20 checks passed

shawkins mentioned this pull request May 19, 2023

fix #5152: expanding the error detection #5153

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #5036: addressing the handling of non-connection errors #5047

fix #5036: addressing the handling of non-connection errors #5047

shawkins commented Apr 12, 2023

shawkins commented Apr 17, 2023

vmuzikar commented Apr 17, 2023

shawkins commented Apr 17, 2023

sunix Apr 19, 2023

manusa Apr 19, 2023

shawkins Apr 19, 2023 •

edited

shawkins commented Apr 19, 2023

shawkins Apr 19, 2023

sonarcloud bot commented Apr 27, 2023

fix #5036: addressing the handling of non-connection errors #5047

fix #5036: addressing the handling of non-connection errors #5047

Conversation

shawkins commented Apr 12, 2023

Description

Type of change

Checklist

shawkins commented Apr 17, 2023

vmuzikar commented Apr 17, 2023

shawkins commented Apr 17, 2023

sunix Apr 19, 2023

Choose a reason for hiding this comment

manusa Apr 19, 2023

Choose a reason for hiding this comment

shawkins Apr 19, 2023 • edited

Choose a reason for hiding this comment

shawkins commented Apr 19, 2023

shawkins Apr 19, 2023

Choose a reason for hiding this comment

sonarcloud bot commented Apr 27, 2023

shawkins Apr 19, 2023 •

edited