Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for CVE-2022-21658 is unsafe on some file systems #94335

Closed
jclulow opened this issue Feb 24, 2022 · 10 comments · Fixed by #94446
Closed

Fix for CVE-2022-21658 is unsafe on some file systems #94335

jclulow opened this issue Feb 24, 2022 · 10 comments · Fixed by #94446
Assignees
Labels
A-io Area: std::io, std::fs, std::net and std::path C-bug Category: This is a bug. O-solaris Operating system: Solaris O-unix Operating system: Unix-like P-critical Critical priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@jclulow
Copy link
Contributor

jclulow commented Feb 24, 2022

The fix for CVE-2022-21658 (#93112, 54e22eb) appears to have introduced a regression that can cause file system corruption on some UNIX systems.

The new code will attempt to use unlinkat(2) in any case where it does not know what type of file a directory entry represents. Some systems do not provide that information along with the entry names returned while reading a directory, and even on systems which provide a type hint, that hint may be DT_UNKNOWN. In cases where we do not know the type of the entry, the code tries first to unlink it as if it were something other than a directory:

None => match cvt(unsafe { unlinkat(fd, child.name_cstr().as_ptr(), 0) }) {
// type unknown - try to unlink
Err(err)
if err.raw_os_error() == Some(libc::EISDIR)
|| err.raw_os_error() == Some(libc::EPERM) =>
{
// if the file is a directory unlink fails with EISDIR on Linux and EPERM everyhwere else
remove_dir_all_recursive(Some(fd), Path::new(&child.file_name()))?;
}
result => {
result?;
}
},

The code assumes that unlinkat() on a directory will fail with EPERM, or on Linux with the non-standard EISDIR, but this is not always the case. POSIX does suggests EPERM as a failure in some, but not all cases:

[EPERM]
The file named by path is a directory, and either the calling process does not have appropriate privileges, or the implementation prohibits using unlink() on directories.

(See unlink in The Open Group Base Specifications Issue 7, 2018 edition)

Implementations are not required to prohibit unlink() on directories, and indeed UFS file systems on present day illumos and presumably at least some Solaris systems allow unlink() of a directory if the user has sufficient privileges. Other platforms may as well.

Unfortunately unlinking a directory on UFS causes a sort of file system corruption that requires an unmount and a trip through fsck to correct. Any code that uses std::fs::remove_dir_all() on UFS has since started causing that kind of corruption, which took a bit of work to track down.

I think the fix is probably relatively simple, and has the benefit of not touching the code path where we believe we know what sort of entry we are removing:

None => match cvt(unsafe { unlinkat(fd, child.name_cstr().as_ptr(), libc::AT_REMOVEDIR) }) {
    // type unknown - try to rmdir
    Err(err)
        if err.raw_os_error() == Some(libc::EEXIST)
            || err.raw_os_error() == Some(libc::ENOTEMPTY) =>
    {
        // this is a directory that contains files to remove
        remove_dir_all_recursive(Some(fd), Path::new(&child.file_name()))?;
    }
    Err(err) if err.raw_os_error() == Some(libc::ENOTDIR) {
        // this is some other type of entry that we can just unlink
        cvt(unsafe { unlinkat(fd, child.name_cstr().as_ptr(), 0) })?;
    }
    result => {
        result?;
    }
}

In short, we should try to rmdir() first, rather than unlink() first, as that always fails in a safe way that we can detect.

@jclulow jclulow added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Feb 24, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 24, 2022
@Urgau
Copy link
Member

Urgau commented Feb 24, 2022

cc @hkratz (as you're the one how wrote the new code)

@inquisitivecrystal inquisitivecrystal added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-untriaged Untriaged performance or correctness regression. labels Feb 25, 2022
@hkratz
Copy link
Contributor

hkratz commented Feb 25, 2022

@Urgau Thanks for the ping

@rustbot claim

@apiraino
Copy link
Contributor

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-critical

@rustbot rustbot added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Feb 25, 2022
@the8472 the8472 added A-io Area: std::io, std::fs, std::net and std::path O-unix Operating system: Unix-like T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 25, 2022
@nagisa
Copy link
Member

nagisa commented Feb 27, 2022

Was this issue reported upstream? Filesystems corrupting themselves seem like a poor failure mode in any case, and in some ways is a violation of the sandboxing that the kernel ought to provide for the user-space applications.

@jclulow
Copy link
Contributor Author

jclulow commented Feb 28, 2022

Was this issue reported upstream? Filesystems corrupting themselves seem like a poor failure mode in any case, and in some ways is a violation of the sandboxing that the kernel ought to provide for the user-space applications.

It was reported upstream, and we're working on a fix -- but that fix will obviously only be in new versions of the operating system.

To expand further on the corruption: the file system becomes inconsistent and requires repair, and the damage such as it is can only be caused by privileged users; i.e., by root or a user with root equivalent privileges. I am not yet aware of a way that it would cause actual data loss, just an interruption to service, but anything is possible.

Allowing the linking and unlinking of directories is a regrettable, but seemingly intentional feature we are saddled with through our UNIX heritage. I'm working to disable and eventually remove the feature in the OS -- although we strongly value backwards compatibility, in this case I can't really see a legitimate use case or find a consumer that would be broken by doing so, but we won't know for sure until we try.

While this specific case is unfortunate, and we can and will work on our end to try to mitigate the impact, the fact remains that POSIX explicitly documents and allows for implementations where unlink() on a directory does not return EPERM. For the code to be completely correct on all POSIX systems, it cannot make that assumption.

In addition, working around the issue in new versions of the OS doesn't fix existing deployments out there already. Enterprise customers are often reluctant to promptly update their operating system, even though they may update software built and deployed on it -- software that may be written in Rust. I myself hit this issue by upgrading only the Rust compiler to a new stable version, and trying to build and use Rust-based software we have to build UFS ramdisks on illumos.

We aim to fix the OS for the future, but the critical regression in Rust needs to be fixed as well for the present. We appreciate everybody who has looked at this so far -- please let me know how I can help expedite the process!

@hkratz
Copy link
Contributor

hkratz commented Feb 28, 2022

@jclulow Thanks again for the report.

The linked PR #94446 should fix the issue. Apparently only Linux and FreeBSD guarantee that trying to unlink() a directory fails. Macos, NetBSD and OpenBSD allow it, at least in theory. After looking around a bit I found that it is even explicitly documented for Illumos and Solaris. So it is not a bug but a feature there...

After changing that in illumos/illumos-gate@ad8f9d9 you might want to adapt the man page as well.

@cuviper
Copy link
Member

cuviper commented Feb 28, 2022

Unfortunately unlinking a directory on UFS causes a sort of file system corruption that requires an unmount and a trip through fsck to correct. Any code that uses std::fs::remove_dir_all() on UFS has since started causing that kind of corruption, which took a bit of work to track down.

We can try to avoid this case by default, as #94446 is switching to try unknown as a directory first, but you should note that this is still going to be racy. If there's an unprivileged adversary like the CVE supposes, then they can try to race changes between "try (and fail) as a directory" to "unlink as a file", to make it unlink a directory after all and induce corruption.

IOW, it sounds like it's not safe for a privileged process to unlink paths at all if they may be changed by lesser privileges .

@hkratz
Copy link
Contributor

hkratz commented Feb 28, 2022

It should not really cause corruption according to the Illumos man page, just orphaned directories:

If the path argument is a directory and the filesystem supports unlink()
and unlinkat() on directories, the directory is unlinked from its parent
with no cleanup being performed. In UFS, the disconnected directory will
be found the next time the filesystem is checked with fsck(1M). The
unlink() and unlinkat() functions will not fail simply because a
directory is not empty. The user with appropriate privileges can orphan a
non-empty directory without generating an error message.

Of course this can lead to resource exhaustion since the disk space and the inodes cannot be reclaimed until the next fsck.

@cuviper
Copy link
Member

cuviper commented Feb 28, 2022

The user with appropriate privileges can orphan a
non-empty directory without generating an error message.

And that user is only root or otherwise root-like?
i.e. it's not just talking about regular access privileges?

@hkratz
Copy link
Contributor

hkratz commented Mar 1, 2022

The user with appropriate privileges can orphan a
non-empty directory without generating an error message.

And that user is only root or otherwise root-like? i.e. it's not just talking about regular access privileges?

For Illumos the process must have root privileges or PRIV_SYS_LINKDIR needs to be in the process privilege set (Illumos privileges(5) man page). Solaris has at some point abolished this privilege and apparently does not allow unlinking directories anymore (Solaris privileges(5) man page).

@m-ou-se m-ou-se added the O-solaris Operating system: Solaris label Mar 2, 2022
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Mar 5, 2022
…fix, r=cuviper

UNIX `remove_dir_all()`: Try recursing first on the slow path

This only affects the _slow_ code path - if there is no `dirent.d_type` or if it is `DT_UNKNOWN`.

POSIX specifies that calling `unlink()` or `unlinkat(..., 0)` on a directory is allowed to succeed:
> The _path_ argument shall not name a directory unless the process has appropriate privileges and the implementation supports using _unlink()_ on directories.

This however can cause dangling inodes requiring an fsck e.g. on Illumos UFS, so we have to avoid that in the common case. We now just try to recurse into it first and unlink() if we can't open it as a directory.

The other two commits integrate the Macos x86-64 implementation reducing redundancy. Split into two commits for better reviewing.

Fixes rust-lang#94335.
@bors bors closed this as completed in 3e1e9b4 Mar 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: std::io, std::fs, std::net and std::path C-bug Category: This is a bug. O-solaris Operating system: Solaris O-unix Operating system: Unix-like P-critical Critical priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants