Skip to content

Proposed extension changes to improve backward compatibility

Sulekha Kulkarni edited this page Sep 7, 2021 · 8 revisions

Motivation

One of the benefits of C is that C compilers are available for almost all platforms. You can write portable C code that can run almost anywhere. This is important for open-source code bases.

Because of this, some Checked C users want to compile their Checked C code with exisitng compilers that do not understand Checked C. They will have the benefits of Checked C when using a compiler that understands the extension. Their code will run elsewhere but be less secure.

We propose a set of changes to Checked C to allow the C preprocessor to be used to remove Checked C extensions. You would modify your C files to include a header file that results in Checked C annotations and changes being removed when the compiler does not support Checked C. In this design, we will name the header file optcheckedc.h. Another name would fine.

Types

Pointer types

The Checked C extension has 3 new pointer types: _Array_ptr, _Nt_array_ptr, and _Ptr. It also adds checked array types. The pointer types use syntax based on C++ templates because is is easier to understand.

_Array_ptr<int> p = 0;

declares a pointer to an array of integers where accesses via the pointer are bounds checked.

C has type qualifiers that change the meaning of a type: _Atomic, const, volatile, and restrict. This can be applied to a pointer type by following the * with a list of qualifiers:

int *p const = ...

We propose adding 3 new qualifiers for pointer types:

_Array _Nt_array _Single

These provide an alternate way of declaring the new pointer types:

int *p _Array = ...

declares an array_ptr to an integer. It is equivalent to

_Array_ptr<int> p = ...

The idea of using type qualifiers this way was first proposed by the Deputy system from Berkeley (Jeremy Condit et al., Dependent Types for Low-Level Programming, ESOP '07)

In the case where a C compiler does not support Checked C, the type qualifiers are #defined to an empty string:

#define _Array
#define _Nt_array
#define _Single

Arrays

It is easy to handle checked array types. They are marked by prefixing the dimension with he keyword _Checked example int arr _Checked[5]. In the header file, when a compiler does not support Checked C, we #define the keyword _Checked to the empty string:

#define _Checked

This will also eliminate the _Checked keyword used for _Checked blocks.

Generic types

For generic type applications, we will use a macro to wrap the type arguments. It will discard and the type arguments when Checked C is not available:

#define _TyArg(...) 

and maps it to the Checked C syntax when Checked C is available

#define _TyArg(...) <__VA_ARGS__>

As an example,

malloc<int>(e)

becomes

malloc _TyArg(int) (e)

We propose a two-part approach for generic function definitions and generic struct definitions. These bind type variables that can appear in code. When Checked C is not available, _For_any can be a macro that discards its arguments:

#define _For_any(...) 

The programmer specifies each type variable T bound by the _For_any construct as _TyVar(T). When Checked C is not available, _TyVar(T) can be a macro that is defined as:

#define _TyVar(T) void

As an example, when Checked C is not available,

_For_any(T) f(_TyVar(T) * _Single) { ... }

becomes

f(void *) { ... }

The same approaches can be used with _Itype_for_any. Two examples of generic function declarations with bounds-safe interfaces are given below with both the current and the new syntax:

Example 1:

Current syntax:

_Itype_for_any(T) void *malloc(size_t size) : itype(_Array_ptr<T>) byte_count(size);

New syntax:

_IType_for_any(T) void *malloc(size_t size) _IType(_TyVar(T) * _Array) _Byte_count(size);

Example 2:

Current syntax:

_Itype_for_any(T) void *bsearch(const void *key : itype(_Ptr<const T>),
                                const void *base : itype(_Array_ptr<const T>) byte_count(nmemb * size),
                                size_t nmemb, size_t size,
                                int ((*compar)(const void *, const void *)) :
                                  itype(_Ptr<int(_Ptr<const T>, _Ptr<const T>)>)
                               ) : itype(_Ptr<T>);

New syntax:

 _IType_for_any(T) void *bsearch(const void *key _IType(_TyVar(T) * _Single const),
                                 const void *base _IType(_TyVar(T) * _Array const) _Byte_count(nmemb * size),
                                 size_t nmemb, size_t size,
                                 int ((*compar)(const void *, const void *))
                                   _IType(int (*)(_TyVar(T) * _Single const, _TyVar(T) * _Single const) _Single)
                                ) _IType(_TyVar(T) * _Single);

Existential types

  • To erase an existential type, we define a macro #define Exists(T, inner_type) inner_type. Note that we already have the macro #define _TyArg (T) .
  • With the above definitions, the macro expansion of Exists(T, struct cb_info _TyArg(T)) results in struct cb_info.
  • Next, we define a macro #define _Pack(expr, exist_type, subst_type) expr
  • With this definition, _Pack(info, _Exists(A, struct cb_info _TyArg(A)), T) expands to info.
  • We define #define _Unpack(T) .

In the example below we show how the existential types are removed when Checked C is not available, using the above macro definitions.

#define _TyArg(T)
#define _Exists(T, inner_type) inner_type
#define _Pack(expr, exist_type, subst_type) expr
#define _For_any(T)
#define _Unpack(T)
#define _TyVar(T) void

#define event_id int
#define MAX_CB 10

struct cb_info _For_any(T) {
    _TyVar(T) *data;
    void (*cb)(event_id id, _TyVar(T) *data);
};

struct map_entry {
    event_id id;
    _Exists(T, struct cb_info _TyArg(T)) info;
};

int num_entries = 0;
struct map_entry cb_map[MAX_CB];

_For_any(T) void reg_cb(event_id id, struct cb_info _TyArg(T) info) {
    _Exists(A, struct cb_info _TyArg(A)) opaque_info = _Pack(info, _Exists(A, struct cb_info _TyArg(A)), T);
    struct map_entry entry;
    entry.id = id;
    entry.info = opaque_info;
    cb_map[num_entries] = entry;
    num_entries += 1;
}

void handle(event_id id) {
    for (int i = 0; i < num_entries; i++) {
        if (cb_map[i].id == id) {
            _Unpack (T) struct cb_info _TyArg(T) cb_inf = cb_map[i].info;
            cb_inf.cb(id, cb_inf.data);
        }
    }
}

becomes

struct cb_info {
    void *data;
    void (*cb)(int id, void *data);
};

struct map_entry {
    int id;
    struct cb_info info;
};

int num_entries = 0;
struct map_entry cb_map[10];

 void reg_cb(int id, struct cb_info info) {
    struct cb_info opaque_info = info;
    struct map_entry entry;
    entry.id = id;
    entry.info = opaque_info;
    cb_map[num_entries] = entry;
    num_entries += 1;
}

void handle(int id) {
    for (int i = 0; i < num_entries; i++) {
        if (cb_map[i].id == id) {
            struct cb_info cb_inf = cb_map[i].info;
            cb_inf.cb(id, cb_inf.data);
        }
    }
}

Bounds annotations

Checked C has bounds annotations that are added to declarations using : bounds-exp, where bounds-exp can be count(e1), bounds(e1, e2), byte_count(e1), or bounds(any). The : lets the parser know that certain identifiers have a context-sensitive meaning, so that we do not have to use an underscore to prefix the name.

To avoid the need for the :, we can introduce four new keywords: _Any, _Bounds, _Byte_count, and _Count. These can be the names of bounds expressions. We can then allow a declarator to be followed by a bounds expression, instead of requiring a :.

When Checked C is not supported, we can introduce macros that discard the keywords:

#define _Any
#define _Bounds(e1, e2)
#define _Byte_count(e1)
#define _Count(e1)

We can introduce a new keyword _Itype for itype annotations also.

Dynamic check expressions

For dynamic check expressions, when a compiler does not support Checked C, we can have a macro that evaluates the argument but does nothing with it.

#define _Dynamic_check(e1) ((void)(e1))

An alternative is to make use of the assert macro:

#define _Dynamic_check(e1) ((e1) ? (void)0 : assert(0))

Bounds cast expressions

_Dynamic_bounds_cast and _Assume_bounds_cast expressions have the form op<T>(e, bounds-expression). The type argument is a problem a problem for non-Checked C compiler. We can introduce variadic macros that take 2 or 3 arguments. For now, we proposed just using an _M as a suffix to indicate that this is a macro. We are open to other suggestions.

_Dynamic_bounds_cast_M(T, e1, ...)
_Assume_bounds_cast_M(T, e1, ...)

When Checked C is supported, they would map to their corresponding syntax forms:

#define _Dymamic_bounds_cast_M(T, e1, ... ) _Dynamic_bounds_cast<T>(e1, __VA_ARGS__)
#define _Assume_bounds_cast_M(T, e1, ...) _Assume_bound_cast<T>(e1, __VA_ARGS__)

When it is not supported, they would become a C-style cast:

#define _Dynamic_bounds_M(T, e1, ...) ((T) e1)
#define _Assume_bounds_cast_M(T, e1, ...) ((T) e1)

Where clauses

For where clauses, we can allow optional parentheses to surround the where clause:

_Where (e1 _And _e2)

When a compiler does not support Checked C, we can have macro that discards the body of the where clause:

#define _Where(e)