Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #4885: addressing a potential hang with jdk streams #4887

Merged
merged 2 commits into from
Feb 16, 2023

Conversation

shawkins
Copy link
Contributor

@shawkins shawkins commented Feb 15, 2023

Description

Fix #4885

Addresses PodIT hanging failure with jdk by checking for complete prior to waiting - the issue is that the request for more with jdk will be executed in the calling thread which already holds the lock and can modify the complete flag prior to the wait.

There are also some failures on the JobIT log - but I have not been able to reproduce that. There are a couple of changes here to help with that logic in general - one is better localize when more is requested. That should be done inline with the processing call, rather than in a completion handler because it looks like they can be executed in parallel. The other is that I don't see now anywhere in the jdk client that guarantees the supplied buffers won't change after be passed to consume. Since we already handled that case with jetty we should do the same here.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

- there is a possibility that the buffer gets modified after the work is
queued
- there is a race condition between the done handling and the async
buffer handling.  If the executor is shutdown before the task starts
running the buffers will be lost.
@shawkins
Copy link
Contributor Author

There are also some failures on the JobIT log - but I have not been able to reproduce that.

I think I see how that is possible now - the initial speculative change was wrong. We need to handle the done in the serialexecutor as well to ensure that the buffers that have already been sent are processed first - otherwise we're risking dropping some of that. This seems to be more prevalent with jdk / jetty than okhttp likely due to the threading model of how consume is handled - it's an async task for okhttp.

@manusa manusa added this to the 6.5.0 milestone Feb 16, 2023
Copy link
Member

@manusa manusa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx!

@sonarcloud
Copy link

sonarcloud bot commented Feb 16, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

54.5% 54.5% Coverage
0.0% 0.0% Duplication

@manusa manusa merged commit f701b7b into fabric8io:master Feb 16, 2023
@shawkins
Copy link
Contributor Author

@manusa things look better with the latest e2e run. There is a familiar looking podit failure with vertx, but on 1.25 - so that shouldn't be using containerd correct? https://github.com/fabric8io/kubernetes-client/actions/runs/4200862573/jobs/7287316314#step:5:1022

I'll see if that can be reproduced.

@manusa
Copy link
Member

manusa commented Feb 20, 2023

@manusa things look better with the latest e2e run. There is a familiar looking podit failure with vertx, but on 1.25 - so that shouldn't be using containerd correct? https://github.com/fabric8io/kubernetes-client/actions/runs/4200862573/jobs/7287316314#step:5:1022

I'll see if that can be reproduced.

Nope, that one is using Minikube on baremetal with Docker.

@shawkins
Copy link
Contributor Author

Nope, that one is using Minikube on baremetal with Docker.

It did spawn #4891

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JDK client hanging on PodIT
3 participants