Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop replacing files in LFS storage when downloading them concurrently on Windows #3880

Merged
merged 3 commits into from
Oct 25, 2019

Conversation

slonopotamus
Copy link
Contributor

On Windows, there is no way to replace file atomically. Instead, MoveFileExA:

  1. Calls CreateFileA(access=Delete, shareMode=Delete)
  2. Calls SetRenameInformationFile to rename file
  3. Calls CloseFile

The problem is that if parallel process attempts to open destination file for reading between
steps 2 and 3, it will try to do that without shareMode=Delete and will hit a SHARING_VIOLATION error.

In practice, this race condition results in:

Smudge error: Error opening media file.: open .git\lfs\objects<sha>: The process cannot access the file because it is being used by another process.

There are two solutions here:

  1. Do not overwrite file if it already exists
  2. Retry reading from file if got a SHARING_VIOLATION

This commit implements option 1.

Fixes #2825 (for Windows, other OSes are race-free already). Also see #3813 and #3826.

For an unknown reason, sometimes libintl.h is missing on macOS when it
should not be.  Try harder to fix this by reinstalling the packages
which are already allegedly installed and forcing an unlink and re-link.
Apparently our Windows CI is no longer happy with us setting PATH since
it can't recognize "%PATH%" as something to be handled.  Since this
isn't strictly necessary, let's remove it.
@slonopotamus
Copy link
Contributor Author

FTR, here's an example program (Java) that reproduces described SHARING_VIOLATION race:

import java.io.File;
import java.io.FileInputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Main {

  public static void main(String[] args) {
    final int processors = Runtime.getRuntime().availableProcessors();
    final ExecutorService executorService = Executors.newFixedThreadPool(processors);

    final File tmpDir = new File("tmp");

    if (!tmpDir.isDirectory() && !tmpDir.mkdirs())
      System.exit(1);

    final Path dest = new File(tmpDir, "dest").toPath();

    for (int i = 0; i < processors; ++i)
      executorService.submit(() -> {
        while (true) {
          final Path file = File.createTempFile("bla", "", tmpDir).toPath();
          try {
            Files.move(file, dest, StandardCopyOption.ATOMIC_MOVE, StandardCopyOption.REPLACE_EXISTING);
          } catch (Exception e) {
            Files.delete(file);
          }

          TimeUnit.MILLISECONDS.sleep(10);

          try (FileInputStream in = new FileInputStream(dest.toFile())) { // Fails here
            // noop, we just want to open file
          } catch (Exception e) {
            e.printStackTrace();
            System.exit(3);
          }
        }
      });
  }
}

…y on Windows

On Windows, there is no way to replace file atomically. Instead, MoveFileExA:

1. Calls CreateFileA(access=Delete, shareMode=Delete)
2. Calls SetRenameInformationFile to rename file
3. Calls CloseFile

The problem is that if parallel process attempts to open destination file for reading between
steps 2 and 3, it will try to do that without shareMode=Delete and will hit a SHARING_VIOLATION error.

In practice, this race condition results in:

  Smudge error: Error opening media file.: open .git\lfs\objects\<sha>: The process cannot access the file because it is being used by another process.

There are two solutions here:

 1. Do not overwrite file if it already exists
 2. Retry reading from file if got a SHARING_VIOLATION

This commit implements option 1.

Fixes git-lfs#2825 (for Windows, other OSes are race-free already). Also see git-lfs#3813 and git-lfs#3826.
@slonopotamus
Copy link
Contributor Author

Rebased on top of #3878 to pass CI.

Copy link
Member

@bk2204 bk2204 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems fine. Thanks for the patch.

@bk2204 bk2204 merged commit 1fdc392 into git-lfs:master Oct 25, 2019
@slonopotamus slonopotamus deleted the rename-no-replace-v2 branch October 25, 2019 17:07
slonopotamus added a commit to slonopotamus/git-lfs that referenced this pull request Oct 29, 2019
Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.

Instead, this commit performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
slonopotamus added a commit to slonopotamus/git-lfs that referenced this pull request Oct 29, 2019
…eplace-v2"

This reverts commit 1fdc392, reversing
changes made to 9c781f3.
slonopotamus added a commit to slonopotamus/git-lfs that referenced this pull request Oct 29, 2019
Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.

Instead, this commit performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
slonopotamus added a commit to slonopotamus/git-lfs that referenced this pull request Oct 29, 2019
Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.

Instead, this commit performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
slonopotamus added a commit to slonopotamus/git-lfs that referenced this pull request Oct 29, 2019
Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.

Instead, this commit performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
slonopotamus added a commit to slonopotamus/git-lfs that referenced this pull request Oct 29, 2019
Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.

Instead, this commit performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
slonopotamus added a commit to slonopotamus/git-lfs that referenced this pull request Oct 31, 2019
Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.

Instead, this commit performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
slonopotamus added a commit to slonopotamus/git-lfs that referenced this pull request Oct 31, 2019
Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.

Instead, this commit performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
slonopotamus added a commit to slonopotamus/git-lfs that referenced this pull request Nov 5, 2019
Testing showed that while race condition analysis in git-lfs#3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.

Instead, this commit performs multiple attempts when working with files in LFS storage.

Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shared lfs.storage is not threadsafe
2 participants