Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: detected illegal job sequence during barrier enter/wait #3881

Merged
merged 2 commits into from
Sep 19, 2023

Conversation

atzoum
Copy link
Contributor

@atzoum atzoum commented Sep 18, 2023

Description

storeMessage#merge will be called when processor has picked up a number of jobs (maxLoopProcessEvents: 10K) that is greater than the configured sub-job size (subJobSize: 2K). Processor will split the picked up batch in more than one sub-jobs, it will process all sub-jobs using pipelining and will finally merge them before storing the output to the database.

Whenever more than one source is connected to a destination, this absence of the appropriate routerDestIDs in the merged store message can cause processor to write batches of jobs for the same destination in parallel. This, in turn, can cause router to pickup jobs out-of-order due to jobID races.

If processor picks jobs less than the configured sub-job size, then the original storeMessage will be used, which will include the appropriate routerDestIDs, thus jobs will be processed in-order.

One small mistake... thousands of brain cells damaged!

Linear Ticket

PIPE-291

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Patch coverage: 12.50% and project coverage change: -0.17% ⚠️

Comparison is base (fcd6676) 68.91% compared to head (f7e0589) 68.75%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3881      +/-   ##
==========================================
- Coverage   68.91%   68.75%   -0.17%     
==========================================
  Files         351      351              
  Lines       52801    52810       +9     
==========================================
- Hits        36390    36311      -79     
- Misses      14107    14182      +75     
- Partials     2304     2317      +13     
Files Changed Coverage Δ
processor/processor.go 87.51% <12.50%> (+<0.01%) ⬆️

... and 15 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@atzoum atzoum marked this pull request as ready for review September 19, 2023 04:35
Copy link
Collaborator

@fracasula fracasula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to have a test for this? It would also work as documentation making clear that it is a scenario that it exists (if I were asked to list possible scenarios and I would definitely not think of this one).

@atzoum
Copy link
Contributor Author

atzoum commented Sep 19, 2023

Wouldn't it make sense to have a test for this? It would also work as documentation making clear that it is a scenario that it exists (if I were asked to list possible scenarios and I would definitely not think of this one).

@fracasula added a test scenario for asserting storeMessage#merge behaviour

@atzoum atzoum force-pushed the fix.jobOrdering branch 2 times, most recently from e5e33b5 to 0ef75f9 Compare September 19, 2023 08:30
@atzoum atzoum changed the base branch from release/1.13.x to master September 19, 2023 09:29
@atzoum atzoum merged commit 7891da3 into master Sep 19, 2023
31 checks passed
@atzoum atzoum deleted the fix.jobOrdering branch September 19, 2023 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants