Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: 'Access is denied' when renaming module cache directory #36568

Open
kiwionly opened this issue Jan 15, 2020 · 45 comments
Open

cmd/go: 'Access is denied' when renaming module cache directory #36568

kiwionly opened this issue Jan 15, 2020 · 45 comments

Comments

@kiwionly
Copy link

@kiwionly kiwionly commented Jan 15, 2020

What version of Go are you using (go version)?

13.6

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

Window 10, intel

go env Output
$ go env

What did you do?

I had a mod file which is 1.13

When I run go build, it download the dependency, however, it end with error like this. e.g.

cannot load golang.org/x/net/html: rename C:\Users\502380\go\pkg\mod\golang.org\x\net@v0.0.0-20190620200207-3b0461eec859.tmp-839788141 C:\Users\502380\go\pkg\mod\golang.org\x\net@v0.0.0-20190620200207-3b0461eec859:

What did you expect to see?

build should successful without any error.

What did you see instead?

#This is running in jetbrain goland

go: extracting golang.org/x/net v0.0.0-20190620200207-3b0461eec859
build command-line-arguments: cannot load golang.org/x/net/html: rename C:\Users\502380\go\pkg\mod\golang.org\x\net@v0.0.0-20190620200207-3b0461eec859.tmp-839788141 C:\Users\502380\go\pkg\mod\golang.org\x\net@v0.0.0-20190620200207-3b0461eec859: Access is denied.

@kiwionly kiwionly changed the title Cannot rename dependency mod from name@version.tmp-xxx to name@version Cannot build: when rename dependency mod from name@version.tmp-xxx to name@version, fail Jan 15, 2020
@cagedmantis cagedmantis changed the title Cannot build: when rename dependency mod from name@version.tmp-xxx to name@version, fail cmd/go: Cannot build: when rename dependency mod from name@version.tmp-xxx to name@version, fail Jan 15, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Jan 16, 2020
@cagedmantis cagedmantis changed the title cmd/go: Cannot build: when rename dependency mod from name@version.tmp-xxx to name@version, fail cmd/go: cannot build when rename dependency mod from name@version.tmp-xxx to name@version Jan 16, 2020
@cagedmantis

This comment has been minimized.

Copy link
Contributor

@cagedmantis cagedmantis commented Jan 16, 2020

@kiwionly Could you provide an example of the code that produced the error, preferably as a Go playground link?

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Jan 16, 2020

Seems like the same issue as microsoft/vscode-go#2897. We've never been able to reproduce this.

Could you post a listing of the directory C:\Users\502380\go\pkg\mod\golang.org\x\?
Are you able to reproduce this or did it just happen once? How about after clearing the module cache with go clean -modcache?

@jayconrod jayconrod changed the title cmd/go: cannot build when rename dependency mod from name@version.tmp-xxx to name@version cmd/go: ' Jan 16, 2020
@jayconrod jayconrod changed the title cmd/go: ' cmd/go: 'Access is denied' when renaming module cache directory Jan 16, 2020
@kiwionly

This comment has been minimized.

Copy link
Author

@kiwionly kiwionly commented Jan 17, 2020

@cagedmantis @jayconrod I am doing go build with module using Jetbrain Goland, some how command prompt also able to produce the error.

Note that my application contain a lot of dependency, and i need to rerun > go build , sometime it success, sometime it will fail with error message below ( depend on the package, some package will have issue, then I run go build again, no more issue/)

Here is the error when running in command prompt using go get, but actually the error is same ( except go build is loading a lot of dependency compare to go get below, but github.com/go-playground/locales is the one that always fail )

C:\Users\502380\GolandProjects\myproject>go get github.com/go-playground/universal-translator
go: finding github.com/go-playground/universal-translator v0.17.0
go: downloading github.com/go-playground/universal-translator v0.17.0
go: extracting github.com/go-playground/universal-translator v0.17.0
go: downloading github.com/go-playground/locales v0.13.0
go: extracting github.com/go-playground/locales v0.13.0
..\..\go\pkg\mod\github.com\go-playground\universal-translator@v0.17.0\errors.go:7:2: rename C:\Users\502380\go\pkg\mod\github.com\go-playground\locales@v0.13.0.tmp-022886503
 C:\Users\502380\go\pkg\mod\github.com\go-playground\locales@v0.13.0: Access is denied.
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Jan 22, 2020

For the rename of that directory, we use a separate lockfile:

unlock, err := lockVersion(mod)

// Check whether the directory was populated while we were waiting on the lock.
fi, err = os.Stat(dir)

if err := os.Rename(tmpDir, dir); err != nil {

So the only possibilities I can see for this failure are:

  1. A bug in our lockedfile.Mutex or os.Rename implementation is causing the file-locks to not actually synchronize the other I/O operations.
  2. Something (perhaps Goland, or perhaps an antivirus scanner?) is accessing one or both of the directories to be renamed without holding the associated file lock.
  3. A bug in the Windows kernel or filesystem implementation is causing the file-locks to not actually synchronize the other I/O operations.

Of course, it is also possible that I have missed something.

For cause (1), we should fix our implementation.

For cause (2) or (3), we should ensure that a bug is filed against the appropriate other project, and then we could perhaps use robustio.Rename instead of os.Rename if we believe that will help. However, I would want some way to empirically reproduce and measure the problem and the mitigation to verify that it actually does help.

CC @jstarks @alexbrainman @zx2c4 @networkimprov in case they have any insights into how to debug this.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Jan 22, 2020

@bcmills is this the relevant WinAPI call for lockedfile.Mutex? How is it used here?
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-lockfileex

Since the error is intermittent, it seems like retrying os.Rename() might be the solution. Unrelated directory updates could have saturated the NTFS journal (@jstarks is that a reasonable guess?)

A way to test that might be a set of goroutines, each repeatedly renaming a separate file between two unique names in the same directory; there's no possibility of name collisions, only concurrent directory updates.

PS: If that's the problem, I don't think it's a bug in NTFS per se...

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Jan 22, 2020

Yes, lockedfile.Mutex does use LockFileEx internally. (The implementation is in src/cmd/go/internal/lockedfile/internal/filelock/filelock_windows.go.)

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Jan 22, 2020

What would it mean to saturate the NTFS journal? If the program is outpacing the buffering capacity of the filesystem, I would expect windows.MoveFileEx to block until the buffer clears far enough to enqueue the operation...

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Jan 22, 2020

My guess about NTFS may be wrong. It's the only reason I could imagine for unexplained rename failures. My test concept above might prove it...

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Jan 22, 2020

Some info on NTFS journaling is here, tho not linked to a particular Windows version. It describes "Log Full", but isn't clear about the end result for filesystem ops which encounter it. Some are retried, but not all? I can't tell.

http://www.ntfs.com/transaction.htm

@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Jan 23, 2020

CC @jstarks @alexbrainman @zx2c4 @networkimprov in case they have any insights into how to debug this.

I don't have any good ideas.

I am not familiar with that code. And I don't have free time to learn the code to debug it.

Alex

@bcmills bcmills removed the DevExp label Jan 23, 2020
@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Jan 28, 2020

@bcmills here's a test app which repeatedly renames a set of files in a single directory. It defaults to 10 concurrent renaming loops, and a command line argument lets you pick the quantity.

I tried this using Go 1.13.3 on Win7 with 10, 33, 100, and 250 loops, but saw no errors. Maybe it behaves differently on Win10, or a higher loop count is required? Maybe it should also write files, not just change directory entries? Or maybe it disproves my hypothesis :-)

I watched the Task Manager while it ran, and noticed that SearchIndexer.exe also consumes a lot of cpu time.

package main

import ( "fmt"; "os"; "strconv" )

const kDir = "zwinrnd/"

func main() {
   var err error
   aGr := 10
   if len(os.Args) > 1 {
      aGr, err = strconv.Atoi(os.Args[1])
      if err != nil { panic(err) }
   }
   err = os.MkdirAll(kDir, 0700)
   if err != nil { panic(err) }
   for a := 0; a < aGr; a++ {
      go func(c int) {
         cA, cB := 'a', 'b'
         cFd, err := os.Create(kDir + fmt.Sprintf("%d-%c", c, cA))
         if err != nil && !os.IsExist(err) { panic(err) }
         cFd.Close()
         for {
            err = os.Rename(kDir + fmt.Sprintf("%d-%c", c, cA),
                            kDir + fmt.Sprintf("%d-%c", c, cB))
            if err != nil {
               fmt.Println(err)
            } else {
               if cA == 'a' { cA, cB = 'b', 'a'
               } else       { cA, cB = 'a', 'b' }
            }
         }
      }(a)
   }
   select {}
}
@qmuntal

This comment has been minimized.

Copy link

@qmuntal qmuntal commented Feb 11, 2020

I see this error many times in my Windows 10 workstation when downloading dependencies that are not cached locally, and it also happen to all my colleagues. In our case it seems to be related to the on-access scanner enterprise antivirus detecting as a threat the go modules cache renames. It is happening since the adoption of modules on Go 1.11.

The only solution I've found is to create an scanning exception for the go/pkg/mod folder that I activate when this error occurs.

@alfmos

This comment has been minimized.

Copy link

@alfmos alfmos commented Feb 11, 2020

I don’t know if this can be useful. I posted on vscode-go a way to reproduce the issue.
microsoft/vscode-go#2897

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Feb 12, 2020

@alfmos, do you have repro steps that do not involve indirection through VSCode? Ideally we should have a regression test that we can check in to the main go repository and run on the Go project's builders, which means that it cannot depend on going through a specific third-party IDE.

@alfmos

This comment has been minimized.

Copy link

@alfmos alfmos commented Feb 18, 2020

@bcmills I've just tried from cmd and I get the error.

go version: 1.13.7 windows/amd64
go env:
C:\Users\alfonso moscato>go env
set GO111MODULE=on
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\alfonso moscato\AppData\Local\go-build
set GOENV=C:\Users\alfonso moscato\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=d:\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=NUL
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=d:\temp\go-build604923562=/tmp/go-build -gno-record-gcc-switches

Steps to reproduce the issue:
C:\Users\alfonso moscato>go get -v github.com/mdempsky/gocode
go: finding github.com/mdempsky/gocode latest
go: finding golang.org/x/tools latest
go: extracting golang.org/x/tools v0.0.0-20200216192241-b320d3a0f5a2
build github.com/mdempsky/gocode: cannot load golang.org/x/tools/go/gcexportdata: rename d:\go\pkg\mod\golang.org\x\tools@v0.0.0-20200216192241-b320d3a0f5a2.tmp-116032742 d:\go\pkg\mod\golang.org\x\tools@v0.0.0-20200216192241-b320d3a0f5a2: Access is denied.

Tell me what data you need and I'll collect them.
I could let you inspect my pc from remote if needed

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Feb 18, 2020

I spent some more timing looking at this today. Based on the MoveFileEx documentation and various threads on Windows Q&A sites and mailing list, I think the most likely explanation for this issue is that another process (probably antivirus or search indexer) is opening files in the module directory before it's completely extracted.

Here's our normal sequence of operations at a high level:

  • os.Stat the module version directory, for example, %GOPATH%\pkg\mod\golang.org\pkg\mod@v0.2.0. If it exists, we assume it's complete.
  • Download the module version's zip file, if it's not already in the cache.
  • os.OpenFile on the module version's lock file, possibly creating it.
  • LockFileEx to lock the lock file.
  • os.Stat the module version directory again. If it exists, another process got the lock and extracted the module zip first, so we assume the directory is complete.
  • os.RemoveAll on temporary module version directories with a ".tmp-" suffix. These were created by other processes which failed. We hold the lock for this version, so these directories should not be actively used. Errors are ignored here.
  • Create a temporary module version directory with a ".tmp-" suffix. Extract the module zip file there.
  • os.Rename the temporary module version directory to its final location. This calls MoveFileEx, which is where we're encountering the error.
  • Make the directory and its contents read-only.
  • UnlockFileEx on the lock file.
  • Close on the lock file.

We're getting ERROR_ACCESS_DENIED on the os.Rename call. I've verified this error is reported when os.Rename is called on a directory when another process has a file open in that directory. So while I can't confirm this is what's happening, it seems likely. Note that by default, the module cache is not in a hidden directory: %USERPROFILE%\pkg\mod will probably be read by search indexers, backup tools, and anything else that watches the user's home directory.

I thought about an alternate flow where we extract a module version to its final location, then create a special file indicating everything has been extracted. However, if we encounter an error extracting the module, we would need to call os.RemoveAll on the partial directory the next time, and that would fail if other processes have files open. So it's just a different kind of flaky.


I don't think we can fix this, but some mitigations are possible.

First, we can check if the error is ERROR_ACCESS_DENIED and add some information to the error message for this specific case.

Second, we can retry os.Rename a few times before reporting an error. It looks like cmd/go/internal/robustio.Rename can already do this.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Feb 19, 2020

@alfmos I think you're suggesting to use os.Link() to make the directory appear with the correct name, then os.Remove() to clear the temporary name. That method would work for a file, but not a directory.

@alfmos

This comment has been minimized.

Copy link

@alfmos alfmos commented Feb 19, 2020

Sorry, I assumed there was os.Copy, that actually doesn't exist.
My point is, apart from the function to use, to copy of all files from temp directory to the final one, instead of renaming the directory.
This would eliminate the access denied problem, as the access would be read-only.
Of course this woud waste some space because there would be two copies of the files before the temp directory is deleted, but wouldn't stop the installation if there is another process reading the file.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Feb 19, 2020

If the copy failed for any reason, you'd be left with a corrupt module. Using os.Rename() prevents that.

If calling os.File.Sync() doesn't fix it, the solution is to retry os.Rename() until it works.

@alfmos

This comment has been minimized.

Copy link

@alfmos alfmos commented Feb 19, 2020

If the copy failed for any reason, you'd be left with a corrupt module. Using os.Rename() prevents that.
You're right, I didn't think of that.

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Feb 19, 2020

@networkimprov Could you explain more about why os.File.Sync would help with the ERROR_ACCESS_DENIED problem? I'm not that familiar with the pitfalls of NTFS and the Windows API, so I feel like I'm missing something. Coming from Linux / Darwin, I'd expect that to help avoid corruption in the case of power loss, but it doesn't seem like it would guard against other processes opening files.

I wrote a couple small test programs: one extracts a small module zip, then renames and removes the directory. The other repeatedly walks the directory and reads files. I was able to get an ERROR_ACCESS_DENIED pretty quickly.

I tried adding os.File.Sync in golang.org/x/mod/zip, but I still saw ERROR_ACCESS_DENIED in this case.

A quick benchmark showed that adding the os.File.Sync made a combined unzip-and-remove operation take 2.78x as long as before. So I'd rather not add it unless it's absolutely necessary.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Feb 19, 2020

Which op returned access-denied, remove or rename? I assume you're calling os.RemoveAll()? I'm not sure why you're testing that.

I suggested os.File.Sync() because I use it and haven't seen such errors, tho maybe I haven't tested my app widely enough.

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Feb 19, 2020

Which op returned access-denied, remove or rename? I assume you're calling os.RemoveAll()? I'm not sure why you're testing that.

os.Rename fails. os.RemoveAll is just there to clean up the temporary directory after a successful rename.

I suggested os.File.Sync() because I use it and haven't seen such errors, tho maybe I haven't tested my app widely enough.

I don't understand why os.File.Sync would guard against ERROR_ACCESS_DENIED though. Have you seen ERROR_ACCESS_DENIED without using os.File.Sync?

We haven't been able to reproduce this problem with the go command, so it seems fairly rare, or at least dependent on other software running on the system we don't know about. I've only been able to reproduce the error with synthetic examples.

@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Feb 20, 2020

Based on the MoveFileEx documentation and various threads on Windows Q&A sites and mailing list, I think the most likely explanation for this issue is that another process (probably antivirus or search indexer) is opening files in the module directory before it's completely extracted.

That sounds reasonable. Especially given that you have small example program that reproduces the problem.

I googled for MoveFileEx ERROR_ACCESS_DENIED and I found

https://habr.com/ru/post/450700/

While it is not exactly the same issue, but similar. It talks about

The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED.

But the article gave me an idea. The author used Process Monitor

https://docs.microsoft.com/en-us/sysinternals/downloads/procmon

to monitor all syscalls that are happening to his file. We could do the same.

@alfmos you could use Process Monitor to log all accesses to files with path starting with d:\go\pkg\mod\golang.org\x\tools@v0.0.0 or similar. This will tell you which processes (Go tools and other programs) access yor file in question and how. Especially the bit where the failure occurs. Please try it.

I don't think we can fix this,

Why not? The problem is that you use MoveFileEx, and it does not work for your use case. Surely there are different ways to design your cache storage process - the way that does not require moving files.

Alex

@alfmos

This comment has been minimized.

Copy link

@alfmos alfmos commented Feb 20, 2020

Logfile_access_denied.xlsx
access_denied
I used Process Monitor and I can confirm that files are opened for scan by KAspersky antivirus and Cybereason ransomfree.
As a countercheck, I disabled both and the installation went fine.
I have attached a process monitor printscreen and an Excel file with all accesses to files.
Regarding the solution, maybe it's only a matter of waiting.
Both tools open new files for scan when they are created, but just once.
After the scan is ended files can be moved without issues.

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Feb 20, 2020

@alfmos Thanks! It's really good to confirm this is the root cause.

@alexbrainman You're right, there's probably a way to do this that doesn't involve renaming.

In an earlier comment, I was thinking about extracting the zip file in place (while holding a lock, which we already do), then creating an empty file (let's call it a .extracted file) to indicate the directory is complete and the zip has been fully extracted.

If we later find that the module was only partially extracted by an earlier run (that is, the directory is present but the .extracted file is not), it's prudent to remove the directory before unzipping the module again. zip.Unzip requires this at the moment, though it could be changed if needed. I was worried that os.RemoveAll would fail if any files were open in that directory, and that seems to be the case, but robustio.RemoveAll works well enough. I didn't see any errors using that in a synthetic test.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Feb 20, 2020

The “extract in place” approach unfortunately has an awkward migration path: folks' existing module caches will not include .extracted files — so older versions of the go command will not know to wait for them, and at the version transition every archive will be re-extracted.

(Maybe that's worth the fix, though.)

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Feb 20, 2020

Yeah, I'm not wild about the migration.

Maybe we could compare a directory to the zip file and check that it's fully extracted instead of deleting and re-extracting it. It's not clear that's actually less file I/O though.

@alfmos

This comment has been minimized.

Copy link

@alfmos alfmos commented Feb 20, 2020

Yeah, I'm not wild about the migration.

Maybe we could compare a directory to the zip file and check that it's fully extracted instead of deleting and re-extracting it. It's not clear that's actually less file I/O though.

Why not a hash function? This would guarantee also against data corruption.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Feb 20, 2020

Hashing is very expensive compared to stat.

We do want to ensure that the go command doesn't introduce data corruption in the course of normal operation, but we intentionally do not make it go out of its way to detect corruption introduced by other sources.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Feb 20, 2020

Before starting the unzip, you could create a marker file, then delete it after the unzip. You might need to loop the delete tho. The easiest fix is probably to loop os.Rename.

@nelidi

This comment has been minimized.

Copy link

@nelidi nelidi commented Feb 21, 2020

I've been struggling with the same issue for a couple of days. First I failed to build a go project on WSL (Ubuntu) but after that the same occurred on the Windows itself.
I use another antivirus software than the mentioned above but still notice that its process occupies some files under /pkg/mod directory while go mod command is in place.
Unfortunately I can't disable it, so I am looking forward to the solution, or at least another workaround.

@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Feb 23, 2020

In an earlier comment, I was thinking about extracting the zip file in place (while holding a lock, which we already do), then creating an empty file (let's call it a .extracted file) to indicate the directory is complete and the zip has been fully extracted.

I suspect the problem here is that we re-create file / directory with the same path after it has been removed. For example

  1. we create c:\a.txt
  2. we delete c:\a.txt
  3. we create c:\a.txt again (or move to c:\a.txt - it does not matter what we do)

If external antivirus program opens c:\a.txt between 1 and 2 and keeps the file open until after 3, then 2 still succeeds. But the file is not deleted immediately - delete operation is delayed until all programs (including antivirus) close the file. And the file will still be opened by anti-virus at 3. That is why 3 fails - because this file is meant to be deleted, but now it has been created again - Windows is confused, so it just fails creation.

I am pretty sure, this is exactly what is happening in #25965. #25965 happens while we re-creating executable file. Same exe file is deleted, and then re-created with the same name. If any anti-virus opens the file, we will have the same problem.

But at least here, we can avoid the problem by using different design. Maybe we could download and extract zip file into a randomly named directory. And rename it to proper name only at the very end. Would this work?

Or maybe we could download into directories with random names, and use some index file to refer to this directory.

Or something different.

I was worried that os.RemoveAll would fail if any files were open in that directory, and that seems to be the case, but robustio.RemoveAll works well enough. I didn't see any errors using that in a synthetic test

I suspect, if you avoid recreating files / directories, you won't even need robustio.RemoveAll.

Hashing is very expensive compared to stat.

Windows implementation of stat is far from trivial. And syscalls it uses might be quite expensive to run.

Alex

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Feb 24, 2020

But at least here, we can avoid the problem by using different design. Maybe we could download and extract zip file into a randomly named directory. And rename it to proper name only at the very end. Would this work?

That's what cmd/go currently does for module cache directories.

I think the problem is that antivirus and other tools are walking these directories and opening files as they are created, before they are renamed to their final locations.

Or maybe we could download into directories with random names, and use some index file to refer to this directory.

That would work. It makes the module cache a bit more opaque for users though. I'd like to try creating a marker file (.incomplete or .extracted) first.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Feb 24, 2020

First we should retry the rename a good number of times; that's proven to work in other cases.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 25, 2020

Change https://golang.org/cl/220978 mentions this issue: cmd/go/internal/modfetch: retry rename for unzipped directories

gopherbot pushed a commit that referenced this issue Feb 26, 2020
No test because this is difficult to reproduce, and such a test would
always be flaky.

Updates #36568

Change-Id: I8170410a7729ecc6f90baf8005444d6b1241185e
Reviewed-on: https://go-review.googlesource.com/c/go/+/220978
Run-TryBot: Jay Conrod <jayconrod@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Feb 26, 2020

Ok, in the interest of trying a simple solution before a complicated solution, https://golang.org/cl/220978 retries the os.Rename a few times with a short timeout.

Anyone who's affected by this, please try out this fix (maybe using gotip) and let me know if you still see the issue.

@qmuntal

This comment has been minimized.

Copy link

@qmuntal qmuntal commented Feb 26, 2020

In my case (using MCAfee), gotip does retries on failure but still fails after one second.

The command I've executed is gotip get golang.org/x/tools/gopls@latest.

image

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Feb 26, 2020

@qmuntal Too bad. Thanks for confirming though.

Time to try the more complicated solution.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 26, 2020

Change https://golang.org/cl/221157 mentions this issue: cmd/go: extract module zip files in place

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Feb 26, 2020

The rename is retried only 6 times? If retrying longer would delay progress on other modules, could you spin up a goroutine to retry os.Rename() and then wait on them before exiting?

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Feb 26, 2020

It's retried at random intervals for up to 500 ms. The timeout is arbitrary, but we don't want to block indefinitely.

Downloads are already done in parallel using a worker pool, so this shouldn't delay progress on other modules.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Feb 26, 2020

500ms probably isn't enough time for an anti-malware tool to vet every file in a large directory.

EDIT: I'd probably keep trying for 30s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.