New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement retry logic to fix LFS storage race conditions on Windows #3890
Conversation
67c173b
to
2770ea9
Compare
I believe that |
I've kicked off another run. We'll see how that works; trying to do that in the past has made everything fail, so hopefully things are working better now. |
This seems like a sane approach, BTW. The |
Well, this approach was described as an alternative in #3880 but I hoped to the last moment that more reliable stuff will work. However, cruel reality showed that it won't. We just have to admit that lock-free atomic file operations without |
2770ea9
to
16769d3
Compare
Force-pushed updated changed. Days-without-incidents counter reset to zero, need several days of testing again. |
Ouch. Turns out |
16769d3
to
1b31f6a
Compare
Okay, I believe we've tested this hard enough |
…currently on Windows" This reverts commit 0c8edfc.
Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that does not work for the *first* git-lfs process that will actually perform file move. Instead, this commit performs multiple attempts when working with files in LFS storage. Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages. However, they are not public, so we cannot use them.
1b31f6a
to
662a624
Compare
FYI: we're running for almost a month with this fix and haven't seen a single failure caused by |
That's great to hear. |
Testing showed that while race condition analysis in #3880 was correct, the way it tries to fix that
does not work for the first git-lfs process that will actually perform file move.
So, I revert #3880. Instead, this PR performs multiple attempts when working with files in LFS storage.
Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages. However, they are not public, so we cannot use them.
Marking this PR as draft because we need 2-3 days of internal testing before will be sure that it actually fixes the problem.
P.S. If you have better names for RobustXXX functions - I'm fully open for ideas.