Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix race condition in BatcherImpl flush #1200

Merged
merged 2 commits into from Oct 5, 2020

Conversation

igorbernstein2
Copy link
Collaborator

@igorbernstein2 igorbernstein2 commented Oct 5, 2020

Currently the following race condition exists:

T1 - awaitAllOutstandingBatches checks that numOfOutstandingBatches > 0
T2 - onBatchCompletion decrements numOfOutstandingBatches
T2 - flushLock.notifyAll()
T1 - flushLock.wait()

so T1 will wait indefinitely

The fix is quite simple: make sure that the there batches to wait for after acquiring the lock

@igorbernstein2 igorbernstein2 requested a review from Oct 5, 2020
@google-cla google-cla bot added the cla: yes label Oct 5, 2020
@codecov
Copy link

@codecov codecov bot commented Oct 5, 2020

Codecov Report

Merging #1200 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1200   +/-   ##
=========================================
  Coverage     79.06%   79.07%           
- Complexity     1197     1198    +1     
=========================================
  Files           205      205           
  Lines          5268     5270    +2     
  Branches        435      436    +1     
=========================================
+ Hits           4165     4167    +2     
  Misses          930      930           
  Partials        173      173           
Impacted Files Coverage Δ Complexity Δ
.../java/com/google/api/gax/batching/BatcherImpl.java 98.06% <100.00%> (+0.02%) 17.00 <0.00> (+1.00)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0976e20...ebad62f. Read the comment docs.

Copy link
Contributor

@miraleung miraleung left a comment

@vam PTAL

@vam-google vam-google self-requested a review Oct 5, 2020
Copy link
Contributor

@vam-google vam-google left a comment

LGTM, but how painful will it be to add a test for that?

@igorbernstein2
Copy link
Collaborator Author

@igorbernstein2 igorbernstein2 commented Oct 5, 2020

I think a test would be quite difficult, but I can try in a separate PR. This is currently a blocker for a customer, so I'd prefer to get this out asap.

Would it be possible to cut a release with this change either today or tomorrow?

@igorbernstein2
Copy link
Collaborator Author

@igorbernstein2 igorbernstein2 commented Oct 5, 2020

Added a stress test..wasnt as bad as I expected

Currently the following race condition exists:

T1 - awaitAllOutstandingBatches checks that numOfOutstandingBatches > 0
T2 - onBatchCompletion decrements numOfOutstandingBatches
T2 - flushLock.notifyAll()
T1 - flushLock.wait()

so T1 will wait indefinitely

The fix is quite simple: make sure that the there batches to wait for after acquiring the lock
@igorbernstein2 igorbernstein2 merged commit c6308c9 into googleapis:master Oct 5, 2020
8 checks passed
@igorbernstein2 igorbernstein2 deleted the batcher-race branch Oct 5, 2020
@miraleung miraleung mentioned this pull request Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants