Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WatchIT fails with vertx #4884

Closed
shawkins opened this issue Feb 15, 2023 · 7 comments · Fixed by #4900
Closed

WatchIT fails with vertx #4884

shawkins opened this issue Feb 15, 2023 · 7 comments · Fixed by #4900
Assignees
Milestone

Comments

@shawkins
Copy link
Contributor

Vertx detects the sleep as a blocked thread:

2023-02-15T05:10:19.8653996Z [INFO] Running io.fabric8.kubernetes.WatchIT
2023-02-15T05:10:22.9450836Z [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker - Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 2459 ms, time limit is 2000 ms
2023-02-15T05:10:23.9465620Z [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker - Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 3460 ms, time limit is 2000 ms
2023-02-15T05:10:24.9516144Z [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker - Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 4459 ms, time limit is 2000 ms
2023-02-15T05:10:25.9531841Z [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker - Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 5459 ms, time limit is 2000 ms
2023-02-15T05:10:25.9532949Z io.vertx.core.VertxException: Thread blocked
2023-02-15T05:10:25.9533352Z 	at java.base@17.0.6/java.lang.Thread.sleep(Native Method)
2023-02-15T05:10:25.9533797Z 	at app//io.fabric8.kubernetes.WatchIT$2.eventReceived(WatchIT.java:125)
2023-02-15T05:10:25.9534421Z 	at app//io.fabric8.kubernetes.WatchIT$2.eventReceived(WatchIT.java:111)
@shawkins
Copy link
Contributor Author

@manusa @vietj The thread checking message is not the full cause - the message will be repeated every 2 seconds while the watcher waits. Instead the attempt to patch is failing to get a response. In general this test is not great and likely needs to be removed, but there are a couple of take-aways:

  1. Should the patch fail? How limited is the thread availability to vertx by default?

  2. Even though the logic around watch succession is being further refined in fix #4675: adding an informer timeout handled directly in our client #4860 we're effectively relying on an assumption that the processing in the httpclient layer is inherently single threaded - that is while you are in an onMessage on a given watch, that if you call close or one is triggered from the server side that you won't see that event until after the thread returns from the onMessage call. While I believe this to be the case currently for all of our implementations, I'm not sure if it's spelled out as a requirement or if we'll need a e2e test that could try to do something similar.

@manusa
Copy link
Member

manusa commented Feb 21, 2023

This test needs refactoring, I'm not even sure what's specifically testing (it looks like the test verifies the full watch round trip).

1

Why is the Patch failing to get a response? Why isn't it either throwing Exception? Which brings to the question of should the Patch fail?
I think it should fail if neither a resource is created nor is modified (which depends on the Patch type).

2 ###

I don't think that users are aware of the consequences of blocking in the onMessage method. My gut feeling is that users do perform blocking operations in this context. AFAIR we were scheduling the execution of this callback within a Serial executor to avoid any of these pitfalls.

@shawkins
Copy link
Contributor Author

This test needs refactoring, I'm not even sure what's specifically testing (it looks like the test verifies the full watch round trip).

I think it was added because previous watch error handling lead to concurrent calls on the watcher.

I think it should fail if neither a resource is created nor is modified (which depends on the Patch type).

The problem is not with patch being applied, it appears that for vertx holding the watcher thread prevents the processing of the patch response.

I don't think that users are aware of the consequences of blocking in the onMessage method.

It is in the javadoc and FAQ.

My gut feeling is that users do perform blocking operations in this context.

Yes it came up again with #4803

AFAIR we were scheduling the execution of this callback within a Serial executor to avoid any of these pitfalls.

That was only used in places where the callbacks were assumed to be blocking, such as writing to outputstreams, so that the httpclient threads could continue. A watcher, nor things like execlisteners, do not have serialexecutor handling.

@manusa
Copy link
Member

manusa commented Feb 21, 2023

It is in the javadoc and FAQ.

I know, but the fact that it's documented doesn't mean that people are aware of it 😅

From a UX perspective (and considering the answer to my last assumption) it might be of interest for us to schedule the execution of the callbacks on an isolated Executor. I'm not sure of what might the drawbacks of this be, but it would certainly simplify a lot the usage of the client downstream.

@shawkins
Copy link
Contributor Author

I'm not sure of what might the drawbacks of this be, but it would certainly simplify a lot the usage of the client downstream.

Now that we have a centrally managed executor for the client we are in a better place to do this. When changes around threading were initially undertaken that wasn't available - so you would have ended up yet again with one off pools.

The drawbacks are - more code complexity on our side, and for a subset of those who do this eventual memory issues unless we also add a cap to serialexecutor queue length.

@shawkins
Copy link
Contributor Author

shawkins commented Feb 22, 2023

Places with calls on http threads:

  • Watcher
  • ExecListener
  • ResourceEventHandler actually that is all called through a SerialExecutor
  • Interceptor
  • A brave user directly using HttpClient

Places that are ok:

  • LeaderCallbacks

@shawkins
Copy link
Contributor Author

Looking more at the vertx execution it looks like submitting the patch ends up getting watch event handled on the same event loop (which has a single threaded execution) as will handle the patch response. @vietj is this expected?

@shawkins shawkins self-assigned this Feb 22, 2023
shawkins added a commit to shawkins/kubernetes-client that referenced this issue Feb 28, 2023
manusa pushed a commit to shawkins/kubernetes-client that referenced this issue Mar 2, 2023
@manusa manusa added this to the 6.5.0 milestone Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants