Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glibc .init_array usage for std::env::args is not safe #105999

Closed
sdroege opened this issue Dec 21, 2022 · 12 comments · Fixed by #106001
Closed

glibc .init_array usage for std::env::args is not safe #105999

sdroege opened this issue Dec 21, 2022 · 12 comments · Fixed by #106001
Labels
C-bug Category: This is a bug.

Comments

@sdroege
Copy link
Contributor

sdroege commented Dec 21, 2022

Currently in library/std/src/sys/unix/args.rs, Rust is hooking into the glibc .init_array extension for retrieving argc/argv even in case Rust does not manage main() and is e.g. just loaded into a process as a cdylib.

While this is great and convenient, it's unfortunately not implemented in a safe way. Various C commandline argument parsers are modifying argv while parsing, so the information that gets passed into .init_array might not be correct anymore.

Examples of such parsers are

  • glibc's own getopt parser, noting " The default is to permute the contents of argv while scanning it so that eventually all the non-options are at the end. This allows options to be given in any order, even with programs that were not written to expect this. "
  • GLib's GOptionContext parser, which removes all already handled arguments from argv and only leaves others (removed ones are set to NULL and moved to the end).
  • Qt does the same thing in QCoreApplication in QCoreApplicationPrivate::processCommandLineArguments(). It shuffles around arguments in argv and removes (by setting to nullptr) arguments it handled already.

This means that at least argc can easily be too big, and Rust would read beyond the (new) end of argv. This caused crashes in practice because of calling strlen() on NULL when creating an CStr around such an "removed" argument.

Now the question is how this should be handled in Rust. I see three options here

  1. Remove the .init_array extension usage and handle glibc like all other libcs
  2. Copy all arguments in .init_array. This means everybody has to pay for that even if they don't use arguments, and this is theoretically still not safe if the shared library is loaded at the same time the argv array is modified and a partially written pointer value is read
  3. Handle NULL pointers in argv by skipping over them. This means we would lose the exact length information of the args iterator, and this is still theoretically not safe for the same reason as 2. See Stop at the first NULL argument when iterating argv #106001

This code was added in 2019 by #66547 (CC @leo60228)

I'd be happy to provide a PR implementing either solution once there is some agreement how to move forward here. I've created #106001 as a proposed fix for this, which implements option 3.

Meta

rustc --version --verbose:

rustc 1.66.0 (69f9c33d7 2022-12-12)
binary: rustc
commit-hash: 69f9c33d71c871fc16ac445211281c6e7a340943
commit-date: 2022-12-12
host: x86_64-unknown-linux-gnu
release: 1.66.0
LLVM version: 15.0.2
Backtrace

Program received signal SIGSEGV, Segmentation fault.
__strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
79		VPCMP	$0, (%rdi), %YMMZERO, %k0
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-12.fc37.x86_64 elfutils-libelf-0.188-3.fc37.x86_64 elfutils-libs-0.188-3.fc37.x86_64 libblkid-2.38.1-1.fc37.x86_64 libmount-2.38.1-1.fc37.x86_64 libunwind-1.6.2-5.fc37.x86_64 libzstd-1.5.2-3.fc37.x86_64 pcre2-10.40-1.fc37.1.x86_64 xz-libs-5.2.5-10.fc37.x86_64 zlib-1.2.12-5.fc37.x86_64
(gdb) backtrace full
#0  __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
#1  0x00007ffff7713281 in core::ffi::c_str::CStr::from_ptr () at library/core/src/ffi/c_str.rs:286
#2  std::sys::unix::args::imp::clone::{closure#0} () at library/std/src/sys/unix/args.rs:146
#3  core::iter::adapters::map::map_fold::{closure#0}<isize, std::ffi::os_str::OsString, (), std::sys::unix::args::imp::clone::{closure_env#0}, core::iter::traits::iterator::Iterator::for_each::call::{closure_env#0}<std::ffi::os_str::OsString, alloc::vec::spec_extend::{impl#1}::spec_extend::{closure_env#0}<std::ffi::os_str::OsString, core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>, alloc::alloc::Global>>> () at library/core/src/iter/adapters/map.rs:84
#4  core::iter::traits::iterator::Iterator::fold<core::ops::range::Range<isize>, (), core::iter::adapters::map::map_fold::{closure_env#0}<isize, std::ffi::os_str::OsString, (), std::sys::unix::args::imp::clone::{closure_env#0}, core::iter::traits::iterator::Iterator::for_each::call::{closure_env#0}<std::ffi::os_str::OsString, alloc::vec::spec_extend::{impl#1}::spec_extend::{closure_env#0}<std::ffi::os_str::OsString, core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>, alloc::alloc::Global>>>>
    () at library/core/src/iter/traits/iterator.rs:2414
#5  core::iter::adapters::map::{impl#2}::fold<std::ffi::os_str::OsString, core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}, (), core::iter::traits::iterator::Iterator::for_each::call::{closure_env#0}<std::ffi::os_str::OsString, alloc::vec::spec_extend::{impl#1}::spec_extend::{closure_env#0}<std::ffi::os_str::OsString, core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>, alloc::alloc::Global>>> () at library/core/src/iter/adapters/map.rs:124
#6  core::iter::traits::iterator::Iterator::for_each<core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>, alloc::vec::spec_extend::{impl#1}::spec_extend::{closure_env#0}<std::ffi::os_str::OsString, core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>, alloc::alloc::Global>> () at library/core/src/iter/traits/iterator.rs:831
#7  alloc::vec::spec_extend::{impl#1}::spec_extend<std::ffi::os_str::OsString, core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>, alloc::alloc::Global> () at library/alloc/src/vec/spec_extend.rs:40
#8  alloc::vec::spec_from_iter_nested::{impl#1}::from_iter<std::ffi::os_str::OsString, core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>> () at library/alloc/src/vec/spec_from_iter_nested.rs:62
#9  alloc::vec::spec_from_iter::{impl#0}::from_iter<std::ffi::os_str::OsString, core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>> () at library/alloc/src/vec/spec_from_iter.rs:33
#10 alloc::vec::{impl#18}::from_iter<std::ffi::os_str::OsString, core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>> ()
    at library/alloc/src/vec/mod.rs:2757
#11 core::iter::traits::iterator::Iterator::collect<core::iter::adapters::map::Map<core::ops::range::Range<isize>, std::sys::unix::args::imp::clone::{closure_env#0}>, alloc::vec::Vec<std::ffi::os_str::OsString, alloc::alloc::Global>> () at library/core/src/iter/traits/iterator.rs:1836
#12 std::sys::unix::args::imp::clone () at library/std/src/sys/unix/args.rs:144
#13 std::sys::unix::args::imp::args () at library/std/src/sys/unix/args.rs:129
#14 std::sys::unix::args::args () at library/std/src/sys/unix/args.rs:19
#15 std::env::args_os () at library/std/src/env.rs:792
#16 0x00007ffff7713151 in std::env::args () at library/std/src/env.rs:757
[...]

Also

(gdb) info registers
rax            0x0                 0
rbx            0x8                 8
rcx            0x6b6e6973656b6166  7741240753840152934
rdx            0x8                 8
rsi            0x6b6e6973656b6166  7741240753840152934
rdi            0x0                 0
rbp            0x0                 0x0
rsp            0x7fffffffc398      0x7fffffffc398
r8             0x7ffff7cacce0      140737350651104
r9             0x40                64
r10            0x0                 0
r11            0x20                32
r12            0x55555556e890      93824992340112
r13            0x4                 4
r14            0x5                 5
r15            0x5555558748e0      93824995510496
rip            0x7ffff7c4093c      0x7ffff7c4093c <__strlen_evex+28>

@RalfJung
Copy link
Member

RalfJung commented Dec 21, 2022

Arguably swapping two entries of argv is still a lot less dangerous than setting one to NULL -- it seems totally legitimate to assume that there are no NULL pointers in argv, and getopt does not insert any NULL there either. So what GLib does here is certainly questionable, more so than what getopt does.

and this is theoretically still not safe if the shared library is loaded at the same time the argv array is modified and a partially written pointer value is read

That would be a data race. However, just like multi-threaded programs cannot mutate the process environment or write to the global environ pointer (setenv is inherently unsafe if there are other threads; also see the discussion in #90308), I would argue that multi-threaded programs cannot mutate argv. Global state such as argv or environ is inherently read-only in a concurrent program; there is just no way to make mutation safe.

So, I think for concurrent programs we can already assume that there are no concurrent mutations, and if there are then that is a bug in the code that performs the mutation. What remains is the question whether we should adjust the args handling to be able to deal with NULL pointers in argv.

@sdroege
Copy link
Contributor Author

sdroege commented Dec 21, 2022

The problem here is that this behaviour of GLib exists since literally decades, and I'm sure we'll find other commandline parsers that are doing similar things if we search a bit.

My first reaction also was that this feels like something to fix in GLib, but that doesn't seem to be a practical resolution because of the above. Skipping over NULLs seems to be a relatively cheap mitigation if we decide that concurrent modifications of argv are simply wrong (who even does that! also seems less likely than setenv()).

@RalfJung
Copy link
Member

I can't judge how common it is to add NULLs there, and don't have an opinion on whether we should support that -- let's see what @rust-lang/libs says. If you know of another library doing this that could be useful, though GLib doing it for decades on its own already means this is quite wide-spread.

I hope GLib does this before any threads are spawned, because otherwise I'm afraid that is just inherently doomed.

@sdroege
Copy link
Contributor Author

sdroege commented Dec 21, 2022

If you know of another library doing this that could be useful

I'll search a bit later.

I hope GLib does this before any threads are spawned, because otherwise I'm afraid that is just inherently doomed.

An application could surely do that, but that's just asking for problems and I wouldn't really worry about that.


The following would work around this FWIW. Maybe should be conditional on glibc.

diff --git a/library/std/src/sys/unix/args.rs b/library/std/src/sys/unix/args.rs
index a342f0f5e85..4231f0b4070 100644
--- a/library/std/src/sys/unix/args.rs
+++ b/library/std/src/sys/unix/args.rs
@@ -142,9 +142,16 @@ fn clone() -> Vec<OsString> {
             let argv = ARGV.load(Ordering::Relaxed);
             let argc = if argv.is_null() { 0 } else { ARGC.load(Ordering::Relaxed) };
             (0..argc)
-                .map(|i| {
-                    let cstr = CStr::from_ptr(*argv.offset(i) as *const libc::c_char);
-                    OsStringExt::from_vec(cstr.to_bytes().to_vec())
+                .filter_map(|i| {
+                    let ptr = *argv.offset(i) as *const libc::c_char;
+                    // Some C commandline parsers are replacing already handled arguments in `argv`
+                    // with `NULL` so it's necessary to skip over them here.
+                    if ptr.is_null() {
+                        None
+                    } else {
+                        let cstr = CStr::from_ptr(ptr);
+                        Some(OsStringExt::from_vec(cstr.to_bytes().to_vec()))
+                    }
                 })
                 .collect()
         }

@bjorn3
Copy link
Member

bjorn3 commented Dec 21, 2022

nit: s/Same/Some

@sdroege
Copy link
Contributor Author

sdroege commented Dec 21, 2022

If you know of another library doing this that could be useful

I'll search a bit later.

Qt does the same thing in QCoreApplication in QCoreApplicationPrivate::processCommandLineArguments(). It shuffles around arguments in argv and removes (by setting to nullptr) arguments it handled already.

@sdroege
Copy link
Contributor Author

sdroege commented Dec 21, 2022

boost's program_options library is not affected because it always copies each string directly into a vector and then operates on that.

@the8472
Copy link
Member

the8472 commented Dec 21, 2022

Another approach would be parsing /proc/self/cmdline. Doesn't work if proc isn't available, but so do things like current_exe.

@bjorn3
Copy link
Member

bjorn3 commented Dec 21, 2022

/proc/self/cmdline is created by the kernel reading from the original location of argv (as specified by the auxiliary vector). If you edit argv, so will the commandline reported by the kernel.

@joshtriplett
Copy link
Member

Skipping over NULL seems less safe; would it make sense to just stop at the first NULL?

@joshtriplett
Copy link
Member

@sdroege

The problem here is that this behaviour of GLib exists since literally decades, and I'm sure we'll find other commandline parsers that are doing similar things if we search a bit.

To the best of my knowledge, it's completely valid to parse argv by stopping at the first NULL; if an argument parser is replacing an argument with NULL, it should not expect that arguments after that point will be seen. I've seen plenty of C programs that just do for (char **arg = argv; *arg; arg++) or equivalent.

@sdroege
Copy link
Contributor Author

sdroege commented Dec 23, 2022

For the commandline parsers from GLib and Qt that would be sufficient, yes. Considering the below, it seems OK to do that and there shouldn't really be any risk of missing useful arguments.

I've seen plenty of C programs that just do for (char **arg = argv; *arg; arg++) or equivalent.

TIL that argv is NULL-terminated. Thanks for pointing that out :)

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Feb 11, 2023
… r=ChrisDenton

Stop at the first `NULL` argument when iterating `argv`

Some C commandline parsers (e.g. GLib and Qt) are replacing already handled arguments in `argv` with `NULL` and move them to the end. That means that `argc` might be bigger than the actual number of non-`NULL` pointers in `argv` at this point.

To handle this we simply stop iterating at the first `NULL` argument.

`argv` is also guaranteed to be `NULL`-terminated so any non-`NULL` arguments after the first `NULL` can safely be ignored.

Fixes rust-lang#105999
@bors bors closed this as completed in e97203c Feb 11, 2023
thomcc pushed a commit to tcdi/postgrestd that referenced this issue May 31, 2023
Some C commandline parsers (e.g. GLib and Qt) are replacing already
handled arguments in `argv` with `NULL` and move them to the end. That
means that `argc` might be bigger than the actual number of non-`NULL`
pointers in `argv` at this point.

To handle this we simply stop iterating at the first `NULL` argument.

`argv` is also guaranteed to be `NULL`-terminated so any non-`NULL`
arguments after the first `NULL` can safely be ignored.

Fixes rust-lang/rust#105999
thomcc pushed a commit to tcdi/postgrestd that referenced this issue May 31, 2023
…enton

Stop at the first `NULL` argument when iterating `argv`

Some C commandline parsers (e.g. GLib and Qt) are replacing already handled arguments in `argv` with `NULL` and move them to the end. That means that `argc` might be bigger than the actual number of non-`NULL` pointers in `argv` at this point.

To handle this we simply stop iterating at the first `NULL` argument.

`argv` is also guaranteed to be `NULL`-terminated so any non-`NULL` arguments after the first `NULL` can safely be ignored.

Fixes rust-lang/rust#105999
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants