Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement retry logic to fix LFS storage race conditions on Windows #3890

Merged
merged 2 commits into from Nov 5, 2019

Conversation

slonopotamus
Copy link
Contributor

@slonopotamus slonopotamus commented Oct 29, 2019

Testing showed that while race condition analysis in #3880 was correct, the way it tries to fix that
does not work for the first git-lfs process that will actually perform file move.

So, I revert #3880. Instead, this PR performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages. However, they are not public, so we cannot use them.

Marking this PR as draft because we need 2-3 days of internal testing before will be sure that it actually fixes the problem.

P.S. If you have better names for RobustXXX functions - I'm fully open for ideas.

@slonopotamus slonopotamus force-pushed the robust-io branch 3 times, most recently from 67c173b to 2770ea9 Compare October 29, 2019 12:07
@slonopotamus
Copy link
Contributor Author

I believe that t-fetch.sh failure on Ubuntu is not related to this PR. See #3891.

@bk2204
Copy link
Member

bk2204 commented Oct 29, 2019

I've kicked off another run. We'll see how that works; trying to do that in the past has made everything fail, so hopefully things are working better now.

@bk2204
Copy link
Member

bk2204 commented Oct 29, 2019

This seems like a sane approach, BTW. The Robust prefix is fine with me.

@slonopotamus
Copy link
Contributor Author

slonopotamus commented Oct 29, 2019

Well, this approach was described as an alternative in #3880 but I hoped to the last moment that more reliable stuff will work. However, cruel reality showed that it won't. We just have to admit that lock-free atomic file operations without SHARING_VIOLATION are impossible on Windows.

@slonopotamus
Copy link
Contributor Author

slonopotamus commented Oct 31, 2019

Force-pushed updated changed. Days-without-incidents counter reset to zero, need several days of testing again.

@slonopotamus
Copy link
Contributor Author

Ouch. Turns out errors.Is is a Go-1.13 feature. Looking how to achieve the same on older versions...

@slonopotamus
Copy link
Contributor Author

Okay, I believe we've tested this hard enough

Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.

Instead, this commit performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
@bk2204 bk2204 merged commit 5c5e200 into git-lfs:master Nov 5, 2019
@slonopotamus slonopotamus deleted the robust-io branch November 6, 2019 05:18
@slonopotamus
Copy link
Contributor Author

FYI: we're running for almost a month with this fix and haven't seen a single failure caused by SHARING_VIOLATION.

@bk2204
Copy link
Member

bk2204 commented Dec 2, 2019

That's great to hear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants