Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with windows file locking rustup.exe + any proxies when they are executed #2441

Open
rbtcollins opened this issue Jul 27, 2020 · 21 comments
Labels
Milestone

Comments

@rbtcollins
Copy link
Contributor

rbtcollins commented Jul 27, 2020

Overview

During self update, rustup needs to replace rustup and the various proxies for cargo, rustc etc.

If it cannot do this, then respectively:

  • future invocations of rustup will be still be running the older version and it will attempt to self-update again
  • any improvements made to the proxy code paths will not be reflected in the live proxies on disk (and conversely any bugs will remain)

However, on Windows, when a binary is executed, it gets a read lock taken out on it. This prevents deleting (or replacing) the file until the process has exited. Renaming seems possible.

Today we have code to spawn a separate process from rustup to perform these replacements, but this will fail if any rust or rustup process is running concurrent with the self-update.

Specifically the following cases cause errors:

  • processes that are already running like rls running under an IDE
  • rust processes started immediately after 'rustup self update' exits

What do these errors look like?

There are two cases:

A) Failed updates of rustup.exe / proxies

error: could not copy file from 'C:\Users\<username>\.cargo\bin\rustup-init.exe' to 'C:\Users\<username>\.cargo\bin\rustup.exe'
info: caused by: Access is denied. (os error 5)

B) Failed removal of the updater binary

error: could not remove 'setup' file: 'C:\Users\<username>\.cargo\bin/rustup-init.exe'
error: caused by: Access is denied. (os error 5)

Design space

We need to cater for processes that are already running - they need to be shut down before the updates in step A can proceed.

We need to cater for processes that start up after parent process exits: short lived ones (e.g. rustc, rustup version) can be allowed to complete, but long running processes (e.g. rls) need to be shut down just like processes that were already running.

We can't prevent the OS starting the processes (not even by renaming, since even a rename may race with that), so its more a matter of tolerating processes that startup and having a way to tell them to exit rapidly. Whatever 'process should exit' mechanism will act to guard against case B.

Finally, we need to cater for file locks coming from other utilities like mcafee or various endpoint security products that may (hopefully temporarily) lock any of these files that we need to manipulate; but the same retry mechanisms should provide some resilience; but possibly having some interface to signal that a given process is holding a lock will be needed: that can be left for a separate ticket though, as it will, unlike this ticket, not be a systemic problem.

Final finally we need to deal with having two instances of the installer executing at once - but perhaps that can be a separate enhancement.

@kinnison
Copy link
Contributor

I think it's also worth noting that whatever solution we come up with here either needs to also work on Linux/Mac/*NIX in general, or else be isolated to windows-only pathways.

I think the description of this issue is good.

@stefanmohl
Copy link

It seems that this issue is causing problems on Cygwin as well, at least guessing so in #1508

@steveklabnik
Copy link
Member

I am trying to rustup update nightly and getting

info: checking for self-updates
info: downloading self-update
error: could not remove 'rustup-bin' file: 'C:\Users\steve\.cargo\bin\rustup.exe'
error: caused by: permission denied

I believe that this is the right ticket?

@kinnison
Copy link
Contributor

kinnison commented Dec 4, 2020

Hi Steve, yes this is the right ticket. Are you certain there're no IDEs or anything open which could be using rustup, rls, rustc, etc. Even via another program such as rust-analyzer? If you're sure and still getting that, then rebooting may be your only hope, and then run rustup self update on a freshly booted system. Windows' file locking makes our update process a bit fraught though we have been thinking about ways to try and sort it out.

@steveklabnik
Copy link
Member

So, I did make sure to close out Code.

However, thinking about it... I do have one hunch. I am using nushell as my shell, installed via Cargo. Could that cause this problem?

@kinnison
Copy link
Contributor

kinnison commented Dec 4, 2020

Unless nushell was running cargo or rustup or similar, not as far as I know, no.

@kinnison
Copy link
Contributor

kinnison commented Dec 4, 2020

One idea, run task manager and look through the process list for any of rustc cargo rls rustup etc. ?

@doxxx
Copy link

doxxx commented Dec 4, 2020

Process Explorer has a feature where you can search open handles to see which process has a file locked.

@steveklabnik
Copy link
Member

Mysteriously, this morning when I tried to get back to this, it worked. Hm. I'll give all that a try if I run into it again in the future, thanks / sorry.

@kinnison
Copy link
Contributor

kinnison commented Dec 8, 2020

No worries, this part of rustup on Windows is very difficult to predict sometimes. Especially with background virus scanners etc. which might decide to hold a handle on the binary for a bit at any time.

@workingjubilee
Copy link
Contributor

@rustbot label: +O-windows

@rbtcollins
Copy link
Contributor Author

Ok so I think we can perhaps experiment with a few things.

Here's a proposed design, and why:

Lets assume we need to update the proxies from time to time: any optimisation such as having non-updated proxies can wait, since we will have to update them sometimes.

Basic approach

  • Proxies are taught how to self-exit when an update is in progress
  • Proxies will refuse to start up when an update is in progress
  • Proxies will attempt to trigger a graceful exit of their child (e.g. close stdin and stdout): we might try a few different means over time; we don't need to kill rustc, but we do want to kill rls (doesn't exit on its own) and cargo (may run for long periods).
  • rustup marks an update in process before it starts writing to any toolchain or proxy
  • rustup marks updates as finishes on exit

Marking updates

We don't need strong atomicity guarantees for the update marking mechanism, as it can safely run a smaller window than the lockout of concurrent update/install operations.

If we just write a file, we may leak it on panics or poweroffs. We can recover automatically if we treat such a file already existing as normal and delete on completion. Alternatively we could use a file as a means to lock out concurrent updates, tackling (some of) #988 at the same time. A file with a pid written into it would allow detection of panic-exits: a file without <pid\n> isn't valid, indicating an early panic.

Proposed first iteration: a file with a pid in it; if the file exists without a pid, or the pid is for a non-existing or non-rustup-init|rustup process, then delete it, otherwise error.

@ChrisDenton
Copy link
Contributor

However, on Windows, when a binary is executed, it gets a read lock taken out on it. This prevents renaming (or replacing) the file until the process has exited.

Surely renaming will work so long as it stays within the same filesystem?

@rbtcollins
Copy link
Contributor Author

The Windows NT Dynamic Link Loader opens files with FILE_SHARE_READ(https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew#parameters for the sematics of that). Renaming a file requires FILE_SHARE_DELETE access, which is not possible if an earlier opener has opened the file without FILE_SHARE_DELETE.
Renames are done by opening a file read-write and using https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfileinformationbyhandle to alter the status of the open file.

TL;DR this is not POSIX, And using FILE_FLAG_POSIX_SEMANTICS via https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-reopenfile doesn't undo the earlier lock modes as far as I know, it only allows applications that opt in to the semantics to have them.

@ChrisDenton
Copy link
Contributor

Ok but this stackoverflow answer says (my bold):

Deleting will fail if any other process has the file opened without delete sharing, renaming is never a problem

And indeed if I try it, it works (Windows 10 21H1):

// old.rs
#![cfg(windows)]

use std::{env, fs, io};

fn main() -> io::Result<()> {
    let mut new_exe = env::current_exe()?;
    new_exe.set_file_name("new.exe");
    fs::rename(&env::current_exe()?, &new_exe)?;
    
    println!("success!");
    Ok(())
}

@kinnison
Copy link
Contributor

@ChrisDenton Does this work if the rename call comes from not that process? i.e. renaming myself might be okay if I'm the only execution context with it; but if there's more than one process from the same exe does it still work?

@ChrisDenton
Copy link
Contributor

ChrisDenton commented Aug 26, 2021

@kinnison Yes, it appears to work.

// wait.rs -- stays open while it waits for input
use std::io::{self, Read};
fn main() {
    io::stdin().bytes().next();
}
// rename.rs -- renames wait.exe to new.exe
use std::{fs, io};
fn main() -> io::Result<()> {
    fs::rename("wait.exe", "new.exe")?;
    Ok(())
}

Though I've not tried it on a clean system so I can't rule out the possibility I've changed some setting somewhere that enables this behaviour.


UPDATE: I tried on a Windows 10 laptop (Home edition) that is not used by a developer. Same result.

@kinnison
Copy link
Contributor

kinnison commented Sep 4, 2021

Interesting, thanks Chris. @rbtcollins do you think we can use the above to make changes to how we install updates, or should we also try and get tests on older Windows versions?

@rbtcollins
Copy link
Contributor Author

Theres certainly something interesting going on here. We could get some research on slower-ring releases, because I don't recall this being how it used to be. Deleting the file is still not possible, at least without using FILE_DISPOSITION_POSIX_SEMANTICS. We could move files to the recycle bin perhaps.

image

As to how we use this: the question I think is going to be over failure modes. If a rustup proxy is in the recycle bin when it spawns a child (race condition obviously) will that break? If someone updates rustup multiple times, will that break? What about rustc / cargo - if cargo.exe is in recycle bin with a random file name, and goes to spawn a new rustc, will that work or will it perhaps consider files in recycle bin - and perhaps open up a security vulnerability at the same time.

There is obviously a complexity tradeoff here, but leaving a running process, no longer in the directory it thought it was in, still trying to spawn new processes doesn't seem great.

And the situation on linux is similarly indeterminate - sure, an unlinked but still referenced executable can be forked(), but its behaviour once the containing directory is removed is also going to be indeterminate.

So I think to avoid unexpected cases cropping up it is worth having some complexity - making sure we actually kill process groups off, rather than inviting undefined behaviour

@ChrisDenton
Copy link
Contributor

I looked into this more since my last post. It seems the executable loader is using CreateFileMapping (well, the NT equivalent) to map the executable. which makes sense. From testing this, it appears this means there's a special "no delete" lock on the file which (unlike missing FILE_SHARE_DELETE) still allows renames. This kind of lock appears to be unique to file mappings.

Sample program
use std::{ffi::c_void, fs, io::{self, Write}};
use std::os::windows::io::AsRawHandle;

fn main() -> io::Result<()> {
    {
        // Create a file with some contents and close the handle.
        let mut f = fs::File::create("test.txt")?;
        f.write_all(b"hello")?;
    }
    // Reopen the file for reading.
    let f = fs::File::open("test.txt")?;
    
    // Create the file mapping.
    unsafe {
        let h = CreateFileMappingW(
            f.as_raw_handle() as _,
            std::ptr::null(),
            PAGE_READONLY,
            0,
            0,
            std::ptr::null(),
        );
        if h == 0 {
            return Err(io::Error::last_os_error());
        }
    }
    print!("Delete the file... ");
    match fs::remove_file("test.txt") {
        Ok(_) => println!("Success!"),
        Err(e) => println!("{}", e),
    }
    print!("Rename the file... ");
    match fs::rename("test.txt", "something.else") {
        Ok(_) => println!("Success!"),
        Err(e) => println!("{}", e),
    }
    Ok(())
}

const PAGE_READONLY: u32 = 2;
#[link(name="kernel32")]
extern "system" {
    fn CreateFileMappingW(
        hFile: usize,
        lpFileMappingAttributes: *const c_void,
        flProtect: u32,
        dwMaximumSizeHigh: u32,
        dwMaximumSizeLow: u32,
        lpName: *const u16
    ) -> usize;
}
Output
Delete the file... Access is denied. (os error 5)
Rename the file... Success!

A process can tell if its file name has been renamed by using QueryFullProcessImageNameW to get the new file path (though weirdly it won't notice if a parent directory is renamed). This differs from std::env::current_exe which will only tell you the name that was used when it was first loaded (it uses GetModuleFileNameW under the hood). Though I'm not sure how useful that information is.

But yeah, I would agree leaving a process running after it's been pseudo-removed isn't great. I guess you could use a special "pending deletion" directory that rustup controls but that seems like it would have its own issues.

@rbtcollins
Copy link
Contributor Author

Yeah. so in Cygwin we used the underlying NT APIs to simulate POSIX behaviour, but there's a raft of corner cases :/

kylewillmon added a commit to phylum-dev/cli that referenced this issue Apr 25, 2023
Self-update on Windows is hard... And rustup occasionally fails to
self-update. There is a [related issue][1] to fix this behavior, but
until then, using `--no-self-update` is a decent workaround.

[1]: rust-lang/rustup#2441
kylewillmon added a commit to phylum-dev/cli that referenced this issue Apr 25, 2023
Self-update on Windows is hard... And rustup occasionally fails to
self-update. There is a [related issue][1] to fix this behavior, but
until then, using `--no-self-update` is a decent workaround.

[1]: rust-lang/rustup#2441
Javagedes added a commit to microsoft/mu_devops that referenced this issue Mar 13, 2024
By default `rusup` performs a self-update when installing a new rustc
toolchain, Sometimes, the files are locked by windows, and the self
update can fail as described here:
rust-lang/rustup#2441

Since this is a pipeline run, we have no need to update rustup, and can
rely on the version provided by the runner. By disabling the self
update, the file conflict `The process cannot access the file because it
is being used by another process` no longer appears.
@rami3l rami3l added this to the On Deck milestone Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants