Skip to content

Transpiling a copy of an unspecified value introduces UB #1210

@purplesyringa

Description

@purplesyringa

To the best of my knowledge, the following code, if it compiles, does not have UB:

#include <stdint.h>

int main() {
    int32_t *p = malloc(sizeof(int32_t));
    *p;
}

Reasoning (I'm using https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf as the reference standard):

  • [7.22.3.4] malloc allocates an object whose value is indeterminate
  • [6.5] When using an object without a declared type for anything other than memcpy/memmove/typed copy, the effective type is the lvalue type
  • [3.19.2] An indeterminate value is either an unspecified value or a trap representation
  • [6.2.6.1] A trap representation is an object representation that doesn't represent a value of the object type
  • [6.2.6.2] An integer object representation is split into value bits and padding bits, where only the latter can affect the representation being trapping
  • [7.20.1.1] int32_t designates an signed integer type with width 32 and no padding bits
  • As a consequence, int32_t no trap representations, and *p is an unspecified value
  • [3.4.4] Unspecified behavior (not UB!) includes, among other things, the use of an unspecified value
  • [J.2] specifically mentiones that reading a trap representation from a non-char lvalue is UB, not an indeterminate value in general

c2rust transpiles this to:

#![allow(dead_code, mutable_transmutes, non_camel_case_types, non_snake_case, non_upper_case_globals, unused_assignments, unused_mut)]
extern "C" {
    fn malloc(_: libc::c_ulong) -> *mut libc::c_void;
}
pub type __int32_t = libc::c_int;
pub type int32_t = __int32_t;
unsafe fn main_0() -> libc::c_int {
    let mut p: *mut int32_t = malloc(::core::mem::size_of::<int32_t>() as libc::c_ulong)
        as *mut int32_t;
    *p;
    return 0;
}
pub fn main() {
    unsafe { ::std::process::exit(main_0() as i32) }
}

which does has UB, because in Rust, reading an uninitialized value is undefined behavior (Miri, but it's also kinda common sense).

A more narrow version is this problem is this memcpy implementation, which is perfectly legal in C:

#include <stddef.h>

void my_memcpy(unsigned char* dst, unsigned char* src, size_t n) {
    for (size_t i = 0; i < n; i++) {
        dst[i] = src[i];
    }
}

but not when transpiled with c2rust;

#![allow(dead_code, mutable_transmutes, non_camel_case_types, non_snake_case, non_upper_case_globals, unused_assignments, unused_mut)]
pub type size_t = libc::c_ulong;
#[no_mangle]
pub unsafe extern "C" fn my_memcpy(
    mut dst: *mut libc::c_uchar,
    mut src: *mut libc::c_uchar,
    mut n: size_t,
) {
    let mut i: size_t = 0 as libc::c_int as size_t;
    while i < n {
        *dst.offset(i as isize) = *src.offset(i as isize);
        i = i.wrapping_add(1);
        i;
    }
}

I've demonstrated the problem with int32_t first to show that this is a wide problem, not specific to character types.

I'm not sure how to solve this. Wrapping everything in MaybeUninit would work, but that's quite unwieldy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions