New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dep init freeze for hours #947

Closed
cpapidas opened this Issue Aug 4, 2017 · 50 comments

Comments

Projects
None yet
@cpapidas
Copy link

cpapidas commented Aug 4, 2017

What version of Go (go version) and dep (git describe --tags) are you using?

go version go1.8 linux/amd64
dep latest version

What dep command did you run?

$ dep init -v
No Output

What did you expect to see?

The new files
*Gopkg.toml
*Gopkg.lock
*vendor(folder)

What did you see instead?

Nothing the command stuck there for hours and I have to press ctr+c to exit and have back the terminal.

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented Aug 4, 2017

hi! thanks for the issue. this is quite odd.

did you run this on an existing project, or something new? if the former, what if any tool were you using previously? is this reproducible?

no places in dep jump to mind where it could hang prior to generating any output. unless...@matjam, maybe a lockfile thing?

strace output would also probably help enormously to clear this up.

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 5, 2017

might be.

maybe I should add some stderr logging if its waiting for >60 seconds .. at least until you figure out the notification framework thing.

@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 5, 2017

I have the same problem! I use go in docker container and it is a very simple program.

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented Aug 5, 2017

@matjam yeah, i'll relent and break my rule a bit more - we can add some print statements around the lock. probably fire one right away if it decides we have to wait, then...maybe one every 10s or so thereafter?

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 5, 2017

@sdboyer okay, I'll give you a PR with something that does that. I'll use 15s if only to be contrary.

We are just assuming that it's the locking thing, but, it's the most likely culprit.

I did do some research; the locking library is mostly correct in it's behaviour; there is a potential race there, but I've not been able to quantify what the real risk is of hitting it.

What is the addage? Race conditions are so rare that they always happen? Something like that.

matjam added a commit to matjam/dep that referenced this issue Aug 5, 2017

Added warning message to the lockfile waiting routine so that if we are
waiting for something, we at least print a message to stderr about it.

Hopefully will make situations like what is described in golang#947 obvious.
@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented Aug 7, 2017

ok, we have the warning message printing now when the lockfile is busy. @jonahfang, @cpapidas - could you update and see if you're still experiencing the indefinite hang?

@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 7, 2017

@sdboyer It keep saying and still hang:

waiting for lockfile /go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

@jonahfang OK, can we get some more details about the system?

  • What kernel/distribution?
  • Can you show us the mount options (output of mount)?
  • Is this being run inside a container?
  • is there an existing sm.lock file?

@sdboyer Its this code in the lockfile package:

	// return value intentionally ignored, as ignoring it is part of the algorithm
	_ = os.Link(tmplock.Name(), name)

	fiTmp, err := os.Lstat(tmplock.Name())
	if err != nil {
		return err
	}
	fiLock, err := os.Lstat(name)
	if err != nil {
		// tell user that a retry would be a good idea
		if os.IsNotExist(err) {
			return ErrNotExist
		}
		return err
	}

So it's failing to link.

@nightlyone any ideas?

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

@jonahfang also, can you try the following

$ go get github.com/nightlyone/lockfile
$ cd $GOPATH/src/github.com/nightlyone/lockfile
$ go test

you should see output like this:

$ go test
PASS
ok  	github.com/nightlyone/lockfile	0.011s
@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 7, 2017

@matjam , the test result of nightlyone/lockfile as follows:

/go/src/github.com/nightlyone/lockfile $ go test
Error locking lockfile:  link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.397504553 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestBasicLockUnlock (0.00s)
--- FAIL: TestRogueDeletion (0.00s)
        lockfile_test.go:129: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.695546947 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRogueDeletionDeadPid (0.00s)
        lockfile_test.go:158: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.451082950 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRemovesStaleLockOnDeadOwner (0.01s)
        lockfile_test.go:202: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.451935720 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestInvalidPidLeadToReplacedLockfileAndSuccess (0.01s)
        lockfile_test.go:231: unexpected error: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.088922970 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
FAIL
exit status 1
FAIL    github.com/nightlyone/lockfile  0.043s
@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

@jonahfang yeah something is wrong with that filesystem.

still need to see output of "mount"

@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 7, 2017

I am being run inside a container, there is not an existing sm.lock file. I use image registry.cn-hangzhou.aliyuncs.com/fangzx/govim_1.0.

@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 7, 2017

I run docker image in MacBookPro using Docker Toolbox:

docker run \
  --rm \
  -it \
  --name godemo \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v $PWD:/go \
  -v $PWD/.vim:/home/dev \
  registry.cn-hangzhou.aliyuncs.com/fangzx/govim_1.0
@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

My guess is your container doesn't allow writes to the filesystem for whatever reason?

@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 7, 2017

But I use glide and godep well.

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

But when it's not in a container, it works fine. So...

@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 7, 2017

I use glide and godep in the same docker container well.

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

That really means nothing, just because other things work, doesn't mean the container isn't broken, or the way you're using it isn't broken, or that there isn't something wrong with your kernel, or ...

That "operation not permitted" message is coming from the kernel. It's a response to a system call for doing file operations on /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.397504553

if the kernel is saying "no" that's a problem with either the container you're using, or docker, or something else on your system.

@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 7, 2017

My docker container use ubuntu:14.04,and filesystem mounted througth VirtualBox volume.

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

It's because it's trying to do a hard link inside the container. That container, or your system, doesn't allow that, so it's failing.

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

@sdboyer I think we need to change the way we do locking, this is another issue that's being caused by the hard link in the lockfile package; there was the ReFS issue #913 and now this.

@jonahfang are you running Windows 10 with ReFS for the underlying filesystem by any chance?

@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 7, 2017

No, I use mac only.

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

Well, the container, or something, is not allow hard links in the filesystem, so that's why it's failing.

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 7, 2017

@jonahfang Can you please grab this library and test it in your container. You should get the following output:

$ go get gopkg.in/check.v1
$ go get github.com/theckman/go-flock
$ cd $GOPATH/src/github.com/theckman/go-flock
$ go test
OK: 7 passed
PASS
ok  	github.com/theckman/go-flock	0.012s
@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 7, 2017

@matjam go test passed!

@jonahfang

This comment has been minimized.

Copy link

jonahfang commented Aug 12, 2017

@matjam Any progress of this issue?

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Aug 12, 2017

I need someone with ReFS on Windows to test the same module before I can recommend to @sdboyer that we switch to go-flock.

You'll have to find a workaround in the meantime. Not running inside a the container that you're using, for example. Other containers seem to handle links just fine.

@JesseRhoads

This comment has been minimized.

Copy link

JesseRhoads commented Aug 18, 2017

Was there a patch for this? I have this exact problem, running a docker container inside an ubuntu trusty vagrant box on a mac.
I can create files inside the container just fine, it just won't do it via go dep.
It does the same thing at the vagrant box level as well. Vagrant mounts the source directory so I don't think it can hard link.

@christianrondeau

This comment has been minimized.

Copy link

christianrondeau commented Aug 20, 2017

Update :

$ proot --link2symlink dep ensure -add github.com/src-d/go-git
all dirs lacked any go code

And nothing in the vendor folder... So, better but not a workaround yet.

@detailyang

This comment has been minimized.

Copy link

detailyang commented Aug 25, 2017

Well, It looks like the linkat syscall get the EPERM as the following:

linkat(AT_FDCWD, "/shared/gopath/pkg/dep/sm.lock.735551337", AT_FDCWD, "/shared/gopath/pkg/dep/sm.lock", 0) = -1 EPERM (Operation not permitted)

/shared (vboxsf) is mounted by virtualbox and my host machine is the Mac osx.
It's recommended to set gopath local file system:)

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Sep 3, 2017

@cpapidas Would appreciate it if you could test this PR and confirm that it works for you.

@cpapidas

This comment has been minimized.

Copy link

cpapidas commented Sep 7, 2017

@matjam I had tried your PR and I always get the same error.

$ dep init
waiting for lockfile /root/gopath/pkg/dep/sm.lock
waiting for lockfile /root/gopath/pkg/dep/sm.lock

also if I run the dep with "official" master version I get the following error

$ dep init
waiting for lockfile /root/gopath/pkg/dep/sm.lock: Lockfile created, but doesn't exist
@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Sep 7, 2017

@cpapidas can you give us information about what system you're running dep on?

Is this being run inside a container?

@cpapidas

This comment has been minimized.

Copy link

cpapidas commented Sep 8, 2017

Yes, I run this inside a docker container. All the golang configuration (GOPATH, etc ...) exists only in the container.

docker version Docker version 17.06.2-ce, build cec0b72

Also, have in mind that I run the docker from mac OS 10.12 or Vagrant 1.9.3.

dockerfile

FROM ubuntu:16.04

WORKDIR /root

RUN apt-get update \
  && apt-get install curl git -y \
  && cd ~ \
  && curl -O https://storage.googleapis.com/golang/go1.8.linux-amd64.tar.gz \
  && tar xvf go1.8.linux-amd64.tar.gz \
  && chown -R root:root ./go \
  && mv go /usr/local

ENV GOROOT /usr/local/go
ENV GOPATH /root/gopath
ENV GOBIN /root/gopath/bin
ENV PATH $PATH:$GOROOT/bin:$GOPATH/bin

COPY run /usr/local/bin/run
RUN chmod +x /usr/local/bin/run

CMD ["/usr/local/bin/run"]

The run file just builds and run the golang program.

Please if you want anything else, let me know.

@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Sep 8, 2017

@cpapidas Can you give me the output of "docker info"?

@cpapidas

This comment has been minimized.

Copy link

cpapidas commented Sep 8, 2017

$ docker info

Containers: 7
 Running: 0
 Paused: 0
 Stopped: 7
Images: 51
Server Version: 17.06.2-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 67
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-77-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.797GiB
Name: ubuntu-xenial
ID: BAX7:PGP6:LPUZ:7QVL:VGGP:YVLI:34R5:Y3SE:SSSC:BCKK:2DWS:ZDYM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
@matjam

This comment has been minimized.

Copy link
Contributor

matjam commented Sep 8, 2017

Okay. So, apparently aufs doesn't do either fnctl style flocking or hard linking in a standard way.

The notes in the "incompatible" section are relevant, if not particularly clear: http://aufs.sourceforge.net/aufs.html

@sdboyer we will need to do something else. We can't use a global lock file like this as a way to arbitrate who gets to write changes.

@nightlyone

This comment has been minimized.

Copy link

nightlyone commented Sep 20, 2017

Please note that the bug seems to only reproduce on non-Linux docker hosts. So I believe it is specific to vboxfs and not to aufs.

@ibrasho ibrasho added the area: init label Sep 20, 2017

@christianrondeau

This comment has been minimized.

Copy link

christianrondeau commented Sep 20, 2017

@nightlyone actually I can reproduce in Termux, which is not running Docker.

@jack-chung

This comment has been minimized.

Copy link

jack-chung commented Sep 28, 2017

I'm facing the same problem with Mac OS (host) -> Vagrant (virtualbox) -> Ubuntu 16 (guest). No docker. The $GOPATH folder inside ubuntu is a virtual synced folder from the host OS.

Running dep on the same folder directly on the host OS works.

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented Sep 29, 2017

if you're running into this problem, a workaround should be to use the latest tip and set DEPNOLOCK=1 in the environment.

just, make very sure that you don't run two dep processes at once. it's reasonably likely to corrupt your GOPATH/pkg/dep/sources cache 💥 🤷‍♂️

@JesseRhoads

This comment has been minimized.

Copy link

JesseRhoads commented Sep 30, 2017

The DEPNOLOCK=1 workaround fixes the problem in virtualbox+docker.
Thanks @ayang64 and @sdboyer.

@dch

This comment has been minimized.

Copy link

dch commented Oct 30, 2017

FWIW ran into the same issue reported here while building github.com/sachaos/todoist on FreeBSD with zfs filesystem which is extremely common on FreeBSD. Might this be a stdlib bug?

I added my notes in case this helps future people landing on this erro, work-around works.

> uname -a
FreeBSD wintermute.skunkwerks.at 12.0-CURRENT FreeBSD 12.0-CURRENT #17 r325010+e50db1d5c304(master): Thu Oct 26 13:59:17 UTC 2017     root@wintermute:/usr/obj/usr/src/sys/GENERIC  amd64

> go version
go version go1.9.1 freebsd/amd64

> dep version
dep:
 version     : devel
 build date  : 
 git hash    : 
 go version  : go1.9.1
 go compiler : gc
 platform    : freebsd/amd64
> dep ensure -v
dep ensure
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist

requested test repos

> go get github.com/nightlyone/lockfile
>  cd $GOPATH/src/github.com/nightlyone/lockfile
> go test
Error locking lockfile:  link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.431102591 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestBasicLockUnlock (0.00s)
--- FAIL: TestRogueDeletion (0.00s)
	lockfile_test.go:129: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.949387529 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRogueDeletionDeadPid (0.00s)
	lockfile_test.go:158: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.448157908 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRemovesStaleLockOnDeadOwner (0.00s)
	lockfile_test.go:202: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.959225638 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestInvalidPidLeadToReplacedLockfileAndSuccess (0.00s)
	lockfile_test.go:231: unexpected error: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.332173640 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
FAIL
exit status 1
FAIL	github.com/nightlyone/lockfile	0.004s

>  go get gopkg.in/check.v1
> go get github.com/theckman/go-flock
> cd $GOPATH/src/github.com/theckman/go-flock
> go test
OK: 7 passed
PASS
ok  	github.com/theckman/go-flock	0.003s

results when using workaround:

> env DEPNOLOCK=1 dep ensure -v
Gopkg.lock was already in sync with imports and Gopkg.toml
(1/22) Wrote golang.org/x/sys@master
(2/22) Wrote golang.org/x/text@master
(3/22) Wrote github.com/magiconair/properties@v1.7.3
(4/22) Wrote github.com/spf13/afero@master
(5/22) Wrote gopkg.in/yaml.v2@v2
(6/22) Wrote github.com/spf13/pflag@v1.0.0
(7/22) Wrote github.com/spf13/viper@v1.0.0
(8/22) Wrote github.com/stretchr/testify@v1.1.4
(9/22) Wrote github.com/hashicorp/hcl@master
(10/22) Wrote github.com/fatih/color@v1.5.0
(11/22) Wrote github.com/davecgh/go-spew@v1.1.0
(12/22) Wrote github.com/satori/go.uuid@v1.1.0
(13/22) Wrote github.com/mitchellh/mapstructure@master
(14/22) Wrote github.com/mattn/go-colorable@v0.0.9
(15/22) Wrote github.com/pelletier/go-toml@v1.0.1
(16/22) Wrote github.com/pmezard/go-difflib@v1.0.0
(17/22) Wrote github.com/mattn/go-isatty@v0.0.3
(18/22) Wrote github.com/pkg/browser@master
(19/22) Wrote github.com/spf13/cast@v1.1.0
(20/22) Wrote github.com/fsnotify/fsnotify@v1.4.2
(21/22) Wrote github.com/spf13/jwalterweatherman@master
(22/22) Wrote github.com/urfave/cli@v1.20.0

\o/ thanks!

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented Oct 30, 2017

oi. yeah, another reason to swap out the lib we use to do this 😢

@RichyHBM

This comment has been minimized.

Copy link

RichyHBM commented Mar 13, 2018

This issue is still occurring, has there been any progress at to change locking mechanism?

This is happening on linux, no container

@JesseRhoads

This comment has been minimized.

Copy link

JesseRhoads commented Mar 13, 2018

@RichyHBM Did you try export DEPNOLOCK=1 in the shell before running go dep as a workaround?
That is what we have done until the mechanism changes.

@RichyHBM

This comment has been minimized.

Copy link

RichyHBM commented Mar 13, 2018

Yes, that works as expected but it shouldnt be the fix

@Napat

This comment has been minimized.

Copy link

Napat commented Apr 27, 2018

Lock mechanism still problem with docker(golang:1.10-alpine) on Windows10Pro(hyper-v enable).
Btw workaround is ok.

@jims

This comment has been minimized.

Copy link

jims commented Jun 18, 2018

I get this on windows on an exFAT filesystem as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment