Skip to content

Bounds safe interfaces

David Tarditi edited this page Oct 3, 2018 · 10 revisions

Overview

The Checked C extension is designed so that programs can be changed to use checked pointers and arrays in an incremental fashion. A programmer can change just a few lines at a time and still have a working program. This matches the way that software is developed and maintained. However, incremental conversion is problematic for libraries. Changing argument types and return types to be checked types could break existing unconverted code that uses a library. In addition, some libraries may be external and impossible to modify.

We solve this problem by introducing bounds-safe interfaces. A bounds-safe interface describes the expected behavior and requirements with respect to checked types and bounds of existing code. A programmer provides an alternate view of existing functions, members, and global variables that gives checked types and bounds to use in place of existing unchecked types. Code with a bounds-safe interface is simultaneously checked and unchecked.

When an entity with a bounds-safe interface is used in a checked scope, the type and bounds given by the bounds-safe interface are used. In an unchecked scope, its type and bounds depend on what is expected in the context where it is used. Informally, if an unchecked type is expected, the type is the original (unchecked) type. If a checked type is expected, the type in the bounds-safe interface is used. We describe the precise rules for what type in this section.

Writing bounds-safe interfaces

Interface types

To add a bounds-safe interface to an existing entity, a programmer declares (or redeclares) the entity with additional information. The most common annotation is the itypeannotation, which gives an alternate checked type for a declaration. For example, strcmp is redeclared as:

int strcmp(const char *src1 : itype(_Nt_array_ptr<const char>),
           const char *src2 : itype(_Nt_array_ptr<const char>));

This allows strcmp to be called with checked pointers to null-terminated arrays of characters. However, strcmp cannot be called with checked pointers to arrays of characters. These might not be null-terminated and passing such an argument could cause a buffer overrrun:

void f(_Nt_array_ptr<const char> arg1, _Nt_array_ptr<const char> arg2) {
  if (strcmp(arg1, arg2)) // OK,
      ...
}

void g(_Array_ptr<const char> arg1, _Array_ptr<const char> arg2) {
  if (strcmp(arg1, arg2)) // Error.
      ...
}

Here are examples of other standard C library functions annotated with itype declarations:

size_t strlen(const char *s : itype(_Nt_array_ptr<const char>));

int atoi(const char *s : itype(_Nt_array_ptr<const char>));

double modf(double value, double *iptr : itype(_Ptr<double>));

int fclose(FILE *stream : itype(_Ptr<FILE>));

FILE *tmpfile(void) : itype(_Ptr<FILE>);

The itype annotation can also be used with structure members. Suppose a structure has a buffer of integers and pointer to an array of characters:

struct S {
  int buf[50];
  char *name;
};

It can be modified to have the following bounds-safe interface:

struct S {
  int buf[50]: itype(_Checked int[50]);
  char *name : itype(_Nt_array_ptr<char>);
};

On Linux, stdin can be given this bounds-safe interface:

FILE *stdin : itype(_Ptr<FILE>);

Bounds declarations

If an entity with an unchecked pointer type is actually a pointer to an array, it can be given a bounds declaration. Consider strncpy, for example. Its original type is:

char *strncpy(char * restrict dest, const char * restrict src, size_t n);

It can be given the bounds-safe interface:

char *strncpy(char * restrict dest : count(n),
              const char * restrict src : count(n),
              size_t n) : bounds(dest, (_Array_ptr<char>)dest + n);

When strncpy is called with checked pointers, the source and destination pointers must have at least n characters available.

The bounds declaration bounds(dest, (_Array_ptr<char>)dest + n) declares the return bounds for strncpy. The return bounds follows the parameter list so that it can refer to parameters. strncpy returns the destination pointer, so the bounds for its return value are the bounds for the destination pointer.

For brevity, bounds declarations by themselves imply interface types. If the original type was an unchecked pointer to T, the interface type is _Array_ptr<T>. If it is an unchecked array T[len], the interface type is T checked[len]. The implied interface type for dest in strncpy is _Array_ptr<char>.

A bounds declaration can be combined with an interface type in the case where the implied type is not the right checked type. Combined declarations are needed for nested pointers: int ** might need to have the interface type _Array_ptr<_Ptr<int>>. They are also needed for_Nt_array_ptr interface types that have a bounds other than count(0).

The function fread can be given the bounds-safe interface:

size_t fread(void * restrict p : byte_count(size * nmemb),
            size_t size, size_t nmemb,
            FILE * restrict stream : itype(restrict _Ptr<FILE>));

The function memcpy can be given the bounds-safe interface:

void *memcpy(void * restrict dest : byte_count(n),
             const void * restrict src : byte_count(n),
             size_t n) : bounds(dest, (_Array_ptr<char>) dest + n);

Note that with this bounds-safe interface, dest and src both have the interface type _Array_ptr<void>. This means that calls to memcpy may not preserve type safety. We can provide a bounds-safe interface that does better than this, which we describe later.

Function types

A programmer can also provide bounds-safe interfaces for function types. This is done the same way as for function declarations, by providing bounds-safe interfaces for parameters and the return value. For example, qsort takes a comparison function and uses it to sort an array of values:

void qsort(void *base, size_t nmemb, size_t size,
           int ((*compar)(const void *, const void *)));

compar is a pointer to a function that takes two void * pointers and returns an integer. The pointers are pointers to elements of the array. The function type should be read from inside (closest to the identifer) outward.: (*compar) means that compar is a pointer to ..., where ... is the function type int (const void *, const void *). qsort can be given the following bounds-safe interface:

void qsort(void *base : byte_count(nmemb * size),
           size_t nmemb, size_t size,
           int ((*compar)(const void *, const void *)) :
             itype(_Ptr<int (_Ptr<const void>, _Ptr<const void>)>));

In this case, compar has a bounds-safe interface type that is a _Ptr to a function type that takes two void pointer arguments and returns an integer.

Note that we can do even better and replace the _Ptr<void> types with type-safe constructs.

Rules for typing in unchecked scopes

In the Checked C extension, implicit conversions between different kinds of pointer types are allowed at assignments, function calls, and return statements.

Unchecked to checked conversions

When an expression with unchecked pointer type is converted implicitly to an checked pointer type, the expression must meet any target bounds requirements for the checked pointer. Bounds-safe interfaces are used during inference of bounds for the expression. If a variable with unchecked pointer type occurs in the expression and the variable has a bounds-safe interface that declares bounds, those bounds are trusted and assumed to be true.

Ths code is allowed:

void f(char *buf : count(n)) {
   _Array_ptr<char> c : count(n) = buf;

}

At the assignment to c, buf is cast implicitly to an _Array_ptr. It has the bounds count(n), which is the same as the bounds declared forc. This code is not allowed:

void g(char *buf) {
   _Array_ptr<char> d : count(n) = buf;  // error.

}

At the assignment to d, buf is cast implicitly to an _Array_ptr. However, it has no bounds declared, so it fails to meet the bounds requirement for d of count(n). The code is rejected by the compiler.

Checked to unchecked conversions

A checked pointer can be converted implicitly to an unchecked ponter only when a bounds-safe interface is present. Here are the rules:

  • If the left-hand side of an assignment is a variable with a bounds-safe interface or a member reference with a bounds-safe interface, and the right-hand side expression has a checked pointer type, the right-hand side expression is converted implicitly to the unchecked pointer type.
  • Similarly, at calls, the function type of the function being called is determined. If a parameter has a bounds-safe interface, and the corresponding argument has a checked type, the argument is converted implicitly to the unchecked type of the parameter.
  • At return statements, if the return value for the enclosing function has a bounds-safe interface, and the expression in the return statement has a checked type, the expression is converted to an unchecked return type.

In all these cases, if bounds are declared by the bounds-safe interface, the converted expression must meet them. This enforces that checked pointers meet the bounds requirements of functions, variables, or members.

For example, this prevents memcpy from being called with _Array_ptr values that are do not point to enough data. The following code is correct:

int a _Checked[3] = {0, 1, 2}
int b _Checked[3] = {3, 4, 5};
memcpy(a, b, sizeof(int) * 3); // correct.

while the following code is rejected:

int c _Checked[2] = { 6, 7 };
memcpy(a, c, sizeof(int*) * 3); // error.