Obtain a lockfile before we write pickle data. #190

adamgoossens · 2021-08-11T04:11:01Z

Without this it's possible for two concurrent PSR runs to overwrite each other's pickle files on disk. This will result in the artifacts from one of those runs being lost further down the pipeline.

This ensures that we:

Obtain an exclusive lock around the pickle file to manage concurrent access. If we hold the lock, we can safely read and write the pickle file.
Once we hold the lock, re-read the pickle file from disk before writing out the new one, merging the on-disk data with our in-memory data.

codecov · 2021-08-11T04:12:08Z

Codecov Report

Merging #190 (539020e) into main (e41d9c9) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #190   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           76        76           
  Lines         3109      3140   +31     
=========================================
+ Hits          3109      3140   +31

Flag	Coverage Δ
pytests	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/ploigos_step_runner/results/step_result.py	`100.00% <100.00%> (ø)`
src/ploigos_step_runner/results/workflow_result.py	`100.00% <100.00%> (ø)`
src/ploigos_step_runner/step_runner.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e41d9c9...539020e. Read the comment docs.

itewk · 2021-08-11T12:11:55Z

@adamgoossens unless i am reading this wrong we are only going to be locking the file just before we write, but if we want to prevent one step overwriting another step, we have to:

lock
read
write
unlock

currently this can happen:

psr step 1 - read pickle
psr step 2 - read pickle
psr step 1 - update step results in memory
psr step 2 - update step results in memory
psr step 1 - get lock
psr step 2 - block on lock
psr step 1 - write pickle with psr step 1 updates but not psr step 2 updates since it was read before the lock
psr step 1 - release lock
psr step 2 - get lock
psr step 2 - write pickle with psr step 2 updates, but not psr step 1 updates since it was read before the lock and the psr 1 updates
psr step 2 - release lock

src/ploigos_step_runner/results/workflow_result.py

src/ploigos_step_runner/step_runner.py

tests/results/test_workflow_result.py

adamgoossens · 2021-08-11T23:51:36Z

The sequence of events is:

StepRunner acquires an exclusive lock on the pickle file.
WorkflowResult.write_to_pickle_file will read the on-disk pickle, add any in-memory StepResult objects that are missing, then pickle to disk.
The YAML file is also written to disk, whilst continuing to hold the pickle lock.
StepRunner releases the exclusive lock.

src/ploigos_step_runner/results/workflow_result.py

itewk · 2021-08-16T12:59:33Z

@adamgoossens its looking really good. just a couple nit pick things.

Without this it's possible for two concurrent PSR runs to overwrite each other's pickle files on disk. This will result in the artifacts from one of those runs being lost further down the pipeline. This ensures that we: 1) re-read the pickle file from disk before writing out the new one, merging the on-disk data with our in-memory data. 2) adds an exclusive lock around the pickle file to manage concurrent access. We also include a new StepResult.merge method that handles merging two StepResults together if they have the same step name, sub-step name and environment. The StepResult passed to merge takes priority for any duplicate artifact or evidence keys.

adamgoossens · 2021-08-17T06:23:53Z

@itewk the last of the nits was resolved. I also re-added the use of the .lock suffix as I discovered a bug - the open() call truncated the pickle file on disk when it was opened so we could acquire the lock, due to it being opened in write mode.

Otherwise I think the rest is done. Let me know :)

itewk · 2021-08-17T12:13:25Z

@adamgoossens thanks so much. Since this is a more involved/core change I would like @dwinchell to give it a look over too.

dwinchell

lgtm

adamgoossens mentioned this pull request Aug 11, 2021

Parallel step processing results in missing step artifacts/evidence later in the pipeline #187

Closed

adamgoossens force-pushed the support-concurrent-pickling branch from 4c848f5 to 3798823 Compare August 11, 2021 04:25

itewk assigned adamgoossens Aug 11, 2021

itewk added the enhancement New feature or request label Aug 11, 2021

itewk linked an issue Aug 11, 2021 that may be closed by this pull request

Parallel step processing results in missing step artifacts/evidence later in the pipeline #187

Closed

itewk reviewed Aug 11, 2021

View reviewed changes

src/ploigos_step_runner/results/workflow_result.py Outdated Show resolved Hide resolved

itewk reviewed Aug 11, 2021

View reviewed changes

src/ploigos_step_runner/step_runner.py Show resolved Hide resolved

itewk reviewed Aug 11, 2021

View reviewed changes

tests/results/test_workflow_result.py Show resolved Hide resolved

adamgoossens force-pushed the support-concurrent-pickling branch 2 times, most recently from eb7211e to ead0c87 Compare August 11, 2021 23:47

adamgoossens marked this pull request as draft August 13, 2021 00:09

adamgoossens force-pushed the support-concurrent-pickling branch 2 times, most recently from 075d5ef to a742e84 Compare August 15, 2021 04:06

adamgoossens marked this pull request as ready for review August 15, 2021 04:15

itewk reviewed Aug 16, 2021

View reviewed changes

src/ploigos_step_runner/results/workflow_result.py Outdated Show resolved Hide resolved

itewk reviewed Aug 16, 2021

View reviewed changes

src/ploigos_step_runner/results/workflow_result.py Show resolved Hide resolved

adamgoossens force-pushed the support-concurrent-pickling branch from a742e84 to 539020e Compare August 17, 2021 06:20

itewk approved these changes Aug 17, 2021

View reviewed changes

itewk requested a review from dwinchell August 17, 2021 12:12

dwinchell approved these changes Aug 18, 2021

View reviewed changes

itewk merged commit 0a3c13f into ploigos:main Aug 18, 2021

adamgoossens deleted the support-concurrent-pickling branch August 18, 2021 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Obtain a lockfile before we write pickle data. #190

Obtain a lockfile before we write pickle data. #190

adamgoossens commented Aug 11, 2021

codecov bot commented Aug 11, 2021 •

edited

Loading

itewk commented Aug 11, 2021

adamgoossens commented Aug 11, 2021 •

edited

Loading

itewk commented Aug 16, 2021

adamgoossens commented Aug 17, 2021

itewk commented Aug 17, 2021

dwinchell left a comment

Obtain a lockfile before we write pickle data. #190

Obtain a lockfile before we write pickle data. #190

Conversation

adamgoossens commented Aug 11, 2021

codecov bot commented Aug 11, 2021 • edited Loading

Codecov Report

itewk commented Aug 11, 2021

adamgoossens commented Aug 11, 2021 • edited Loading

itewk commented Aug 16, 2021

adamgoossens commented Aug 17, 2021

itewk commented Aug 17, 2021

dwinchell left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 11, 2021 •

edited

Loading

adamgoossens commented Aug 11, 2021 •

edited

Loading