Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
cmd/go: "file exists" errors when trying to fetch modules #36447
This happens sporadically on
It's happened on a CI build job three times in the past week, for a job that runs twice per hour. So, roughly, about 1% of the time. I haven't been able to reliably reproduce the error, nor do we run these jobs with Go tip.
Unfortunately, this is happening with a piece of internal end-to-end testing, so its source and build jobs are not public.
Here is the log, since it doesn't contain any sensitive info:
It should be noted that this
I tried to do some debugging, but failed to find the cause so far. Here is a summary:
I'd be surprised if our setup was to blame, because another of our CI pipelines does run many
From the filenames involved in the error, it seems likely that the failing call is this one:
That seems to imply one of the following possibilities:
Either way, given the information we have so far this seems more likely to be a bug in the underlying filesystem than in the
Could you try running
Note that the concurrency strategy for
(We use file-locking in the module cache because idempotent writes would be significantly less efficient in many cases, and because it is otherwise difficult to signal that a downloaded module is complete and ready for use. In contrast, within
This was my initial suspicion, but we're using a pretty recent stable Docker on the most recent Ubuntu LTS, with an ext4 disk. It doesn't get more standard and stable than this, I think.
That's a good point. Though the other CI builds could do concurrent module fetches, if the cache isn't up to date. It's this build that's causing problems that doesn't have any concurrent steps whatsoever. Which is why I'm extra confused.
I realised this issue wouldn't have much actionable for you, but I still filed it in case you saw something that I didn't. And in case others would find it useful in the future, if they encounter the same error.
I'll give those
Ok, wow, this is beyond embarassing. The CI config was buggy; someone had messed with it while I was away on vacation, and they removed the dependency between the "run
I did look at that twice, but of course, I'm only human :(
Apologies for the noise and the waste of time. This is definitely a filesystem data race that's entirely our fault.