Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to take the filesystem lock on .vcpkg-root #12286

Closed
AlexAltea opened this issue Jul 6, 2020 · 13 comments · Fixed by #12227
Closed

Failed to take the filesystem lock on .vcpkg-root #12286

AlexAltea opened this issue Jul 6, 2020 · 13 comments · Fixed by #12227
Assignees
Labels
category:vcpkg-bug The issue is with the vcpkg system (including helper scripts in `scripts/cmake/`) info:manifests This PR or Issue pertains to the Manifests feature

Comments

@AlexAltea
Copy link
Contributor

AlexAltea commented Jul 6, 2020

Our CI relies on a global vcpkg installation for each platform. Build jobs happen in unique directories where we run vcpkg install to deal with any necessary dependencies. After the merging of #11757 we started using manifest files, which worked perfectly (thanks @strega-nil!).

Describe the bug

The manifest PR (#11757) added a filesystem lock (.vcpkg-root) which multiple build scripts might try to acquire simultaneously. The current timeout of 1.5-ish seconds is not enough since the lock seems acquired until the vcpkg install command finishes. See:

System::printf("Waiting to take filesystem lock on %s...\n", path.u8string());
auto wait = std::chrono::milliseconds(100);
// waits, at most, a second and a half.
while (wait < std::chrono::milliseconds(1000))
{
std::this_thread::sleep_for(wait);
if (system_try_take_file_lock() || ec)
{
return res;
}
wait *= 2;
}

Calling vcpkg install in directories with manifest files shouldn't modify the global %VCPKG_ROOT%/installed folder (afaik!), but place everything in ./vcpkg_installed instead. Thus, it seems unnecessary to lock vcpkg globally for minutes at a time.

Environment

  • OS: Any
  • Compiler: Any

To reproduce

Steps to reproduce the behavior:

  1. Simultaneously run two instances of ./vcpkg install in different repositories.
  2. One of the instances will fail to acquire the filesytem lock on .vcpkg-root:
    $ vcpkg install
    Waiting to take filesystem lock on /Users/ci/vcpkg/.vcpkg-root...
    Failed to take the filesystem lock on /Users/ci/vcpkg/.vcpkg-root:
        Resource busy
    

Expected behavior

I think there's two possible solutions to this problem (or even a combination of both):

  1. Adding --lock-timeout <seconds> (or similar) argument for any vcpkg command that might try to take the lock, so that multiple vcpkg install invocations could execute in series for a sufficiently high timeout.
  2. Only acquire the lock when writing to files under %VCPKG_ROOT%. When installing dependencies from vcpkg.json manifest files, everything will be written to vcpkg_installed folder, so there's no need to lock vcpkg globally while that happens.
@MVoz
Copy link
Contributor

MVoz commented Jul 6, 2020

2020-07-06_211050

@strega-nil
Copy link
Contributor

The reason we do this is so that we lock the buildtrees and downloads and such, which are shared globally -- would adding a --wait-for-lock flag work for you? Or do you need a more fine-grained locking mechanism?

@MVoz
Copy link
Contributor

MVoz commented Jul 6, 2020

why don't immediately do with the parameter, lock by default, and don't block with the parameter

@strega-nil
Copy link
Contributor

@voskrese because that is a disaster waiting to happen -- if you both try to write to buildtrees at the same time, then you're gonna get bad failures.

@AlexAltea
Copy link
Contributor Author

would adding a --wait-for-lock flag work for you?

Yes, this would solve the issue. Currently we are working around this issue by manually wrapping vcpkg install commands with flock in Linux/macOS and System.Threading.Mutex in Windows (Powershell), but this feels rather hacky.

Or do you need a more fine-grained locking mechanism?

Are there any plans to cache built vcpkg packages? (Judging by #12091 it seems so).
If so, then a simple --wait-for-lock would be enough, as in N simultaneous CI jobs, only 1 job would actually build dependencies and the remaining N-1 would wait for the lock and immediately return the cached dependencies.

If there's no caching then --wait-for-lock "solves the error" but at the expense of increasing build times up to N times.

@strega-nil
Copy link
Contributor

Cool, then I'll add that feature in #12227 :)

strega-nil added a commit to strega-nil/vcpkg that referenced this issue Jul 6, 2020
@PhoebeHui PhoebeHui added the category:vcpkg-bug The issue is with the vcpkg system (including helper scripts in `scripts/cmake/`) label Jul 7, 2020
@strega-nil strega-nil added the info:manifests This PR or Issue pertains to the Manifests feature label Jul 8, 2020
@christophe-calmejane
Copy link
Contributor

@strega-nil Since I merged the manifest code, I too have issues with our CI. We use and mix some complex ports and custom toolchain that leads to a reentrant call to vcpkg install (on purpose) but always on the same process. Would it be possible for the lock to be reentrant, or delay the lock for when it's actually needed?
My reentrant call immediately returns because the sub-package I try to install is guaranteed to always be installed already, so it's just a matter or letting vcpkg check for the package to be installed then return.

@strega-nil
Copy link
Contributor

Hmm... alright, yeah, I think I'll take the lock out for now in non-manifest builds until we can lock cleanly.

@christophe-calmejane
Copy link
Contributor

Thanks, although I do have a need for a lock on vcpkg database to prevent multiple install from different process, but I’d rather tell my users to not run multiple CI at the same time than it failing all the time :)

Implementing a reentrant file lock might be challenging, as it’s usually not supported by system APIs, but it should be doable.
Could you add a reference to this ticket when you have merged your « fix »?
Thanks a lot

@strega-nil
Copy link
Contributor

strega-nil commented Jul 9, 2020

@christophe-calmejane I'm not concerned about it right now, because I'm keeping the file lock for users of the new manifest mode; this means that existing users will not be broken, while new users using the new mode will write their code as if the file lock exists (and hopefully we won't need reentrant locks)

@ras0219-msft
Copy link
Contributor

From googleapis/google-cloud-cpp#4487:

docker .... --volume /tmpfs/src/github/google-cloud-cpp/cmake-out/home/ci-ubuntu-18.04-quickstart-cmake:/h ....
...
Waiting to take filesystem lock on /h/vcpkg-quickstart/.vcpkg-root...
Failed to take the filesystem lock on /h/vcpkg-quickstart/.vcpkg-root:
Device or resource busy
Exiting now.

It looks like the file locking APIs we use are unavailable in a mapped docker volume. This looks like a straightforward case of "we should offer an environment variable to not take the lock". Users in docker can pass --env VCPKG_SKIP_LOCKING=1.

@ras0219-msft
Copy link
Contributor

ras0219-msft commented Jul 9, 2020

Edit: Reading comprehension issue on my part :)

My reentrant call immediately returns because the sub-package I try to install is guaranteed to always be installed already, so it's just a matter or letting vcpkg check for the package to be installed then return.

In this case, it would be best to avoid the call to vcpkg and instead just look for the files in the installed directory. This should be as simple as passing something like HOST_TOOLS=${CURRENT_INSTALLED_DIR}/../<host-triplet>/ from the portfile to your inner build, which can then search for files within (without calling vcpkg).


Below is my original post

@christophe-calmejane JFYI, reentrant calls to vcpkg install on the same install tree will corrupt the database. Future invocations of vcpkg could crash or think packages don't exist. This is because the database is not synchronized to disk during port builds. We'd strongly recommend that either:

  1. The port checks-with-failure and the install lifted out and run before the port's build
  2. The port clones/embeds an inner copy of vcpkg with different root, packages, buildtrees, and installed directories. This ensures no corruption.

Ideally, the install could be listed as a dependency, but I assume you're doing this because of a cross-compilation scenario?

strega-nil added a commit that referenced this issue Jul 9, 2020
@christophe-calmejane
Copy link
Contributor

@ras0219-msft Well, it's not that simple as I'm mixing multiple build systems together :)
I actually considered looking at the files directly but I didn't want to hardcode anything in the main build system (that is calling vcpkg), like for exemple the path to installed folder, just in case it changes in the future. I mean, it's safer to let vcpkg do it's job and not having to understand its underlying/internal/private stuff, that's why I call the binary directly.
I also considered parsing the output of a vcpkg list but I don't like it either (same reason, in case the output changes).
I'll probably have to think about it at some point anyway, thanks for your input :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:vcpkg-bug The issue is with the vcpkg system (including helper scripts in `scripts/cmake/`) info:manifests This PR or Issue pertains to the Manifests feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants